Introduction

Vulvovaginal symptoms, collectively termed vulvovaginitis, are one of the most common reasons for gynecological consultation1. Most women will experience at least one episode of vaginal infection during their lifetime, characterized by discharge, itch, burning, or odor2. Vulvovaginitis is caused by various infectious and non-infectious conditions that may present as an acute, short-term complaint, or as a chronic disorder. Vaginitis conditions, and more so in their recurrent or chronic forms, have a negative impact on women’s health-related quality of life3 contributing to a major negative impact on self-esteem and sexual relationships4. Mixed vaginitis, where at least two types of vaginitis are present, is reported in 4.4% to 35% of evaluated patients5.

Diagnosis of the specific cause of vaginitis based on clinical presentation is limited, due to the similarity of symptoms and signs, such as itching, erythema, malodor, and vaginal discharge. Therefore, to obtain an accurate diagnosis additional procedures or diagnostic tests are required. Clinical guidelines suggest that pH and wet mount microscopy of fresh vaginal samples constitute the in-clinic standard practice for the diagnosis of vaginitis2,6,7,8. However, the actual level of use of these two measures is low. Hillier et al. reported that among women with vulvovaginitis symptoms, microscopy was conducted in only 17.4% of 281 visits9. An earlier study showed that microscopic assessment of vaginal discharge was not performed in 37% of 150 clinic visits10, and 42% of 50 different physicians did not perform microscopy as part of the evaluation of vaginitis10. Additionally, in >90% of office visits, pH measurement of vaginal discharge was not performed10. A review of 149,934 American patients’ files found that over 60% did not include procedure codes for any form of vaginitis diagnostic testing11. High rates of antifungal and antibacterial use were found, suggesting common empiric treatment, and likely over-prescription of antifungal and antibacterial medications11. A study from the Netherlands found that only 16% (61/380) of GPs reported “always” or “often” using microscopy to diagnose vulvovaginal candidiasis, while only 7.9% (30/380) reported “always” or “often” using culture for the same purpose12. Other reports found high levels of misdiagnosis of bacterial vaginosis (BV) and vulvovaginal candidiasis (VVC), regardless of utilization of microscopy, suggesting inadequate use of microscopy as a possible explanation13. Underutilization of these simple in-clinic tests often leads to inadequate treatment9,10,14, with up to 47% of patients receiving one or more inappropriate prescriptions9, as well as treatment without adequate evaluation in 54% of visits, implying appropriate treatment in fewer than half of the cases10.

An insufficient in-clinic evaluation may result from the absence of suitable equipment (i.e., microscope), lack of proper training in the preparation and interpretation of a wet mount, time constraints, and lack of awareness that these measures allow improved detection of the causes of vaginitis1. As an alternative to microscope usage, providers can send vaginal samples for laboratory evaluation. Nevertheless, laboratory tests are aimed at detecting only infectious etiologies and cannot detect noninfectious conditions such as vaginal atrophy, desquamative inflammatory vaginitis (DIV, also termed aerobic vaginitis, AV), and cytolytic vaginosis. In addition, these methods are not available in many clinical settings, are expensive, time-consuming, and do not provide timely results, as many of these tests require a process that lasts from several hours15 to several days, thus not allowing a point-of-care diagnosis.

Laboratory methods include cultures, a multiplex polymerase chain reaction (PCR) panel, and nucleic acid amplification testing (NAAT)6. Some rapid point-of-care tests are available16,17, nevertheless, these tests are only for a subset of the related conditions, including BV and trichomoniasis. Each test can only diagnose a single cause, necessitating the conduction of multiple tests, thus increasing the turnover time and cost. Collectively, these findings suggest that current clinical practice is sub-optimal and indicate the need for a diagnostic modality that is efficient, cost-effective, allows for a wide range of diagnoses, and is usable during the clinic visit18,19.

The investigational test evaluated in this study is the GYNI™ rapid point-of-care system (GynTools, Israel), an automated in-vitro diagnostic system, intended to aid in the diagnosis of vaginitis in symptomatic women, by qualitatively detecting the following vaginitis conditions or pathogens at the point-of-care: (1) BV, (2) Candida albicans vaginitis (CA), (3) Candida non-albicans vaginitis (NAC), (4) T. vaginalis, (5) vaginal atrophy (also referred to as atrophic vaginitis or genitourinary syndrome of menopause, GSM), (6) aerobic vaginitis/desquamative inflammatory vaginitis (AV/DIV), and (7) cytolytic vaginosis (CV).

The objective of this study was to evaluate the diagnostic performance of the investigational test in qualitatively detecting different vaginal disorders, by comparison to a composite reference standard (CRS) of wet mount microscopy performed by a specialist in vulvovaginal disorders (the first author, ALS) and laboratory tests of CHROMagar™ candida culture and sexually transmitted infection (STI) multiplex PCR for detection of trichomoniasis.

Results

Performance of the investigational test

A total of 226 women with vaginitis symptoms were recruited between December 2020 and October 2022. The majority were Caucasians, representing the Israeli population.

Thirty-four cases were excluded in accordance with the exclusion criteria of an uninterpretable sample. The performance of the investigational test for the detection of each condition based on a total of 192 included cases is shown in Table 1 and Fig. 1.

Table 1 Diagnostic Performance of GYNI™ test compared to a CRS of culture, PCR, and wet mount microscopy.
Fig. 1: The investigational test diagnostic performance chart.
figure 1

T. vaginalis and Candida non-albicans are excluded due to a low number of cases.

Performance of the specialist

The results for the specialist diagnosis, compared to the CHROMagar Candida albicans culture tests were of an overall accuracy level of 93.1% (189/203) and a sensitivity and specificity of 86.7% (72/83) and 97.5% (117/120), respectively, and κ of 0.855 (0.782–0.927), as shown in Table 2.

Table 2 Diagnostic performance of physician diagnosis compared to lab culture for Candidiasis.

Condition distribution

The number of cases found of each condition by the gold standard method of either the expert or the laboratory results are as follows (including coinfections): Candida albicans vaginitis: 69, BV: 33, vaginal atrophy: 26, AV/DIV: 25, cytolytic vaginosis: 14, other conditions, including T. vaginalis and Candida non albicans:28. In 17 cases no definite diagnosis was made by the CRS.

The percent of cases by condition is presented in Fig. 2.

Fig. 2: The percent of cases by condition.
figure 2

Every diagnosed condition was separately counted. Reflecting 195 diagnosed conditions/192 patients including 22 cases of coinfection and 17 cases of no established diagnosis.

Coinfections

We found 22 cases (11.5%) where the CRS indicated a coinfection, i.e., presence of more than one condition20. 12 cases were for a combination of BV and Candida albicans vaginitis. The other combinations were Candida albicans and Candida non-albicans (3), BV and DIV (2), Candida albicans and DIV (2), cytolytic vaginosis and Candida non-albicans (1), atrophy, Candida albicans and Candida non-albicans (1), BV and T. vaginalis (1).

Symptoms distribution

The distribution of patient-reported vaginal symptoms is presented in Fig. 3.

Fig. 3: The distribution of patient-reported vaginal symptoms.
figure 3

The bars depict the number of patients reporting each symptom.

Discussion

This study aimed to evaluate the performance of the investigational test in diagnosing seven vaginitis conditions, from a single swab, in a clinical setting.

The results for two of the seven conditions (Candida non albicans and T. vaginalis) were not reported due to an insufficient number of cases. For the remaining five conditions, sensitivity and specificity ranged from 84.1%/65.9%, respectively for Candida albicans and up to 90.9%/83.6% for BV. These results are superior to those reported for in-clinic testing (Amsel’s test, potassium hydroxide preparation, and wet mount) performed during routine clinic visits21.

It should be noted that samples of more than 10% of recruited patients were excluded: samples obtained from the first 30 patients were uninterpretable due to a camera stabilization problem that was identified and resolved by a scanner software update. One patient used a topical external ointment, a temporary server error occurred in one case, and reading was not available in one case due to a scanner horizontal movement malfunction.

Studies reported that at least 30% of women presenting with vaginal complaints do not receive a diagnosis after a comprehensive workup1. Furthermore, the presence of coinfections makes accurate diagnosis of vaginitis even more challenging. It was previously shown that clinician diagnosis of BV based on microscopy was less accurate when T. vaginalis and/or Candida spp. were also detected in the sample by the reference methods22. In addition, studies comparing diagnostic accuracy of providers to that of laboratory results, showed a high proportion of disagreement between the practitioners’ diagnoses and the laboratory diagnoses10,14,23. Nyirjesy et al conducted a survey among 333 physicians in order to measure awareness of vaginitis clinical guidelines and the use of in-office diagnostic tools24. The study found limited awareness of recommended diagnostic practice guidelines, and limited access to all three point-of-care tools (microscope, pH test strips and KOH solution)24.

Detecting the etiology of both infectious and non-infectious vaginal disorders provides an unprecedented diagnostic spectrum. To our knowledge, our results are the first ever reported for any automated tool to aid in diagnosing either vaginal atrophy or cytolytic vaginosis. In these cases, the diagnosis is based on cytologic characteristics and pH level. Correct diagnosis of vaginal atrophy is of high significance, since according to the 2020 position statement of the North American Menopause Society, GSM affects approximately 27% to 84% of postmenopausal women and can significantly impair health, sexual function, and quality of life, but only a minority of them seek help or are offered assistance by their physicians25.

Unlike laboratory methods such as Gram stain and culture which may be highly subjective to sampling, transport conditions, and technical proficiency, and may have prolonged turnaround times26, the evaluated investigational test provides fast in-clinic results from a single specimen within 5 minutes from scan initiation. This is done by automating the entire recommended practice of reported symptoms consideration, pH measurement, slide preparation, and slide scanning, and by harnessing deep-learning computer-vision image classification to support the provider’s diagnosis.

In various primary care settings, multiple reasons preclude an accurate diagnosis of vaginitis during the clinic visit, including the lack of an experienced microscopist. In addition, in some countries the first-line physician diagnosis for vulvovaginitis is made by a gynecologist and in others (for example, the United Kingdom and the Netherlands), it is a GP. The presented results suggest that using an in-office automated tool that relies on a combination of inputs (automated microscopy, automated pH, and patient’s reported symptoms) can improve patient’s evaluation and treatment, regardless of caregivers training and skills. This may result in reducing empirical treatment, as well as resolving both subjective intra and inter-observer variability between caregivers who do perform microscopy.

One major limitation of this study is the relatively small sample size, especially for the less prevalent disorders. Another limitation is that the deep-learning software model that provides the computer-vision classification was trained on images that were classified and tagged by a single physician, whose diagnosis also acted as one of the reference methods in this study. Better evaluation of true performance for BV is constrained by the known limitations of the Amsel criteria which demonstrated only 81.0% positive percent agreement and 86.0% negative percent agreement compared with positive (7–10) and negative (0–3) Nugent score27.

Given the above-described limitations of the current point-of-care diagnosis of vaginitis, the investigational test presents a potential for a marked improvement in the first-line evaluation of vaginitis and will hopefully guide a more appropriate treatment while decreasing healthcare costs and improving patients’ quality of life.

Methods

Outline

This prospective cross-sectional study evaluated women reporting vaginal symptoms, seen at a single designated clinic for vulvovaginal disorders at the Clalit Health Organization, Jerusalem, Israel. All patients were examined, diagnosed, and treated by the same provider (the first author, ALS). Inclusion criteria were (i) women with vaginal complaints: discharge, malodor, itch, burning, pain, or dryness, (ii) 18 years old and above. Exclusion criteria included (i) patients unfit to provide informed consent, (ii) an uninterpretable sample (e.g., patients who used vaginal creams or lubricants before the visit, recent or current bleeding, or insufficient sampling material).

During the gynecological exam, vaginal discharge samples were taken for pH levels, wet-mount microscopy which was conducted immediately in the clinic using Olympus CX31 microscope, vaginal cultures (for bacteria and CHROMagar candida), and trichomonas PCR, per standard of care in the clinic. An additional sample was taken for the investigational test diagnosis using a swab with a soft cytobrush head (Fig. 1), pulled through a dedicated cartridge, and scanned in the GYNI™ investigational test table-top scanner in the clinic.

Patients were diagnosed by the physician using wet-mount microscopy and were treated according to her recommendation. The investigational test diagnosis was stored in the cloud and blinded to the physician to prevent bias. Each patient was identified using a numerical code and an investigational test number was generated by the application. Laboratory results were recorded only by the patient’s code number, without any identifying details such as name or personal identification number. The comparators used as the gold standard for each of the assessed conditions studied are shown in Table 3. Results of the three methods, the physician’s wet-mount diagnosis, the laboratory findings, and the investigational test, were summarized and compared by the chief investigator (the last author, ABC).

Table 3 Composite reference standard (CRS) comparators.

The sample size for this study was calculated for estimating the overall accuracy (the percent agreement on the diagonal between the investigational test and the reference diagnosis) via the level of precision required for the estimate. The level of precision is measured by the half-width of the 95% confidence interval around the proportion of interest. We calculated based on Hajian-Tilaki K28 that an accuracy level of at least 90% with a confidence interval half-width of 5% can be estimated with a minimum sample size of 139 patients. Recruitment continued to a larger sample size to obtain reasonable representation of each of the included conditions.

The results of the investigational test were compared individually to the CRS comparator of the specialist wet mount results, the candida cultures, and the STI PCR panel for detection of T. vaginalis. The CRS was defined as positive if there was a positive result by either wet mount or culture/PCR. Samples were classified as negative if all comparators were negative. The comparison included the calculation of overall accuracy, sensitivity, specificity, positive predictive value, and negative predictive value according to standard equations. These measures are presented with two-sided 95% Wilson score confidence intervals. Cohen’s kappa coefficient (κ) is presented as a measure of inter-test agreement, with 95% confidence interval. All analyses were performed using Excel (Microsoft Corp. Redmond, Wash.), and SPSS (IBM Corp. IBM SPSS Statistics for Windows. Armonk, NY).

The investigational test

The GYNI™ system is aimed at providing non-expert health-care providers with the means to obtain a wide diagnostic range, with a fast and inexpensive analysis of various vulvovaginal conditions from a single swab. This is done by fully automating (a) consideration of patient’s reported symptoms (b) saline wet mount microscopy (c) KOH microscopy, and (d) pH measurement, similar to the evaluation performed in vaginitis-specializing clinics29. This point of care test does not replace an adequate history taking or physical examination. It does offer a rapid, readily available saline and 10% KOH automated microscopy, which are critical diagnostic steps, without a need for caregiver’s training or any previous skills. It also contains software logic which excludes conditions based on pH or patient’s reported symptoms, according to current related clinical knowledge. The test uses machine learning computer vision in the form of a deep convolutional neural network (CNN) model that performs a multi-label classification of seven major vaginitis conditions. The model was trained on 13,500 microscopy images which were collected and classified by a specialist in a dedicated clinical trial (ClinicalTrials.gov, NCT03585049). Unlike described proofs of concept for applying deep neural networks for the classification of smaller subsets of vaginitis conditions in manually gram-stained and selected microscopy images30,31, this test does not use slide staining, therefore, minimizing the required operator labor and shortening the time until results are available.

The investigational test system is comprised of the following components (Fig. 4):

Fig. 4: The Gyni investigational test system components.
figure 4

Actual hardware components of the disposable discharge collection tool and the tabletop scanner.

A vaginal discharge collection tool

A disposable plastic assembly with a swab/brush-head, connected to a plastic rod (1a), that is used to collect the discharge sample from the vagina; (1b) a “floating” transparent optical tray; (1c) transparent cover and; (1d) diluent containers.

A compact tabletop scanner

The scanner (2) includes a high-resolution color camera, illumination LEDs, magnifying lenses, and linear motion systems, both vertical and horizontal. The operator places the disposable cartridge in a cradle connected to the horizontal motion system. During insertion of the tray into the cradle, the cartridge’s internal diluents are automatically sprayed on the optic tray and the tray is lifted to create optical coupling. Upon activation, the cradle begins its linear motion between the illumination sources and the camera, dozens of microscopic pictures are acquired, and the pH level is determined by a color change of pH indicator paper located in the cartridge. The collected information is then transferred to the cloud.

Web-based user interface

A test operation website provides an interactive mechanism for data entry and test control, available via any web-connected device, such as a laptop or a smartphone.

Cloud software platform

Test processing includes a deep learning convolutional network model for multi-label image processing classification, analysis of pH paper images, and color calibration images for pH calculation. The processing software also cross matches the patient’s reported symptoms and pH level with the results of the computer vision classification of the microscopy images.

The results provided by the investigational test include (a) suggested diagnoses–one or more detected conditions, (b) pH level, and (c) a heatmap annotation of a selected input microscopy image, obtained via guided-back propagation32. The heatmap annotation is aimed to provide explainability33 by visualizing the salient areas in the input image having the strongest effect on the model output. Examples of such heatmap results are shown in Fig. 5. The test results are displayed online and are available to the operator for download.

Fig. 5: Heatmap examples.
figure 5

Examples for microscopy heatmap images included with the investigational test results.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.