Development of an Automatic Interpretation Algorithm for Uroflowmetry Results: Application of Artificial Intelligence
Article information
Abstract
Purpose
To develop an automatic interpretation system for uroflowmetry (UFM) results using machine learning (ML), a form of artificial intelligence (AI).
Methods
A prospectively collected 1,574 UFM results (1,031 males, 543 females) with voided volume>150 mL was labelled as normal, borderline, or abnormal by 3 urologists. If the 3 experts disagreed, the majority decision was accepted. Abnormality was defined as a condition in which a urologist judges from the UFM results that further evaluation is required and that the patient should visit a urology clinic. To develop the optimal automatic interpretation system, we applied 4 ML algorithms and 2 deep learning (DL) algorithms. ML models were trained with all UFM parameters. DL models were trained to digitally analyze 2-dimensional images of UFM curves.
Results
The automatic interpretation algorithm achieved a maximum accuracy of 88.9% in males and 90.8% in females when using 6 parameters: voided volume, maximum flow rate, time to maximal flow rate, average flow rate, flow time, and voiding time. In females, the DL models showed a dramatic improvement in accuracy over the other models, reaching 95.4% accuracy in the convolutional neural network model. The performance of the DL models in clinical discrimination was outstanding in both genders, with an area under the curve of up to 0.957 in males and 0.974 in females.
Conclusions
We developed an automatic interpretation algorithm for UFM results by training AI models using 6 key parameters and the shape of the curve; this algorithm agreed closely with the decisions of urology specialists.
INTRODUCTION
Uroflowmetry (UFM) is a simple test that measures the urine stream in volume per unit time [1]. The main advantage of UFM is that it is non-invasive and relatively inexpensive [2,3]. Therefore, it is considered an indispensable, first-line screening method for most patients with suspected lower urinary tract dysfunction [2]. Most commercially available office uroflowmeters are based on weight transducers, which measure the voided volume (VV) and calculate the flow rate by detecting the difference over time [4]. These flowmeters provide both a graphical presentation of the uroflow and a range of electronically read parameters [4].
To obtain representative results in UFM, adequate privacy should be provided, and patients should be asked to void when they feel a “normal” desire to do so [2]. However, UFM is usually carried out on an outpatient basis, in specified procedure areas without basic privacy, and often involves having the person urinate into the uroflowmeter at a predetermined time [5]. This process is unnatural and requires “on-demand” voiding often with either low or very high bladder filling, which compromises the results [5]. It has therefore been recommended that UFM should be repeated, which is time-consuming and costly for both patients and health care providers [6].
Sound-based UFM represents a new approach to recording urinary flow patterns and measuring urinary flow parameters in a non-invasive manner by analysing the sound generated by a stream of urine striking the water surface in the toilet bowl. We developed a novel mobile acoustic UFM that works as a microphone built into a smartphone. In a previous study, researchers developed a device called sonouroflow with a similar concept [7]. At the technical level, sonouroflow processed each voiding session as a whole and analyzed only the sound pressure level. However, our acoustic UFM method analyzes a session in detail by dividing it into hundreds of sections. In addition, our method estimates variables and analyzes data by applying various signal processing methods in the time domain and frequency domain as well as the sound pressure level. Clinical trials confirmed that our device was non-inferior in performance to a conventional UFM. We released the developed acoustic UFM program as an application through the App Store and Google Play.
Patients can use our acoustic UFM application for free with only a smartphone. Using this program, they can perform UFM tests in any location (at home, at work, on vacation, or anywhere else) at any time just as comfortably as they would void in daily life. However, even if users obtain accurate and representative measurements, it is difficult to judge whether the results are normal or whether they reflect abnormal conditions that require urological management [8]. To identify abnormal results and recommend that those users visit a urologic clinic, we aimed to develop an automatic interpretation system for UFM results by applying machine learning (ML) and deep learning (DL), 2 subsets of artificial intelligence (AI).
MATERIALS AND METHODS
Patients
With the approval of the Seoul National University Bundang Hospital Institutional Review Board (IRB No.: B-1912-583-001), 3,000 patients over 20 years of age who were scheduled to undergo UFM in the outpatient urology clinic based on clinical judgement were prospectively included in this study from January to December 2019. Before being included as subjects, patients agreed to join this study of their own volition and completed an written informed consent form. This study was conducted in accordance with the ethical principles stated in the Declaration of Helsinki. All patients with incomplete UFM results were excluded. Patients with a UFM volume of less than 150 mL were also excluded.
Using a web-based reading tool, 3 urologists read the UFM measurements independently. The participants included a senior urologist with more than 10 years of clinical experience, a urologist with more than 5 years of clinical experience, and a junior urologist with less than 5 years of experience. The 3 independent researchers classified each result as normal, borderline, or abnormal by visually inspecting the pattern of the flow curve and evaluating the relevant quantitative parameters of UFM as defined by the International Continence Society (ICS): voiding time (VT), flow time (FT), time to maximum flow rate (TQmax), maximum flow rate (Qmax), average flow rate (Qavg), and VV.
Abnormality was defined as a condition in which a urologist judged from the UFM results that further evaluation was required and that the patient should visit a urologic clinic.
ML algorithms
To develop the optimal automatic interpretation system, we applied 4 ML algorithms (logistic regression, decision tree, support vector machine, and random forest algorithms) and 2 DL algorithms (a convolutional neural network [CNN] and a recurrent neural network [RNN]).
ML models were trained with all parameters of UFM results. DL models were trained to digitally analyze 2-dimensional (2D) images of the UFM curve. DL modelling was performed by converting the 2D image of the UFM curve into the time-series value of the instantaneous flow rate at time “t.”
Through supervised ML algorithms, the UFM results of a randomly selected 80% of cases per gender were selected as a training set, and algorithms were developed to classify them into 3 groups: normal, borderline, and abnormal. The developed algorithms were validated externally with a test set consisting of the remaining 20% of results to evaluate the consistency and discrimination of the model.
Statistics
The evaluation variables are represented by descriptive statistics. The interobserver consistency of the investigators’ readings was assessed in terms of the interclass correlation coefficients by calculating Cronbach alpha. Additionally, in order to determine the extent of agreement between the investigators (intraobserver agreement), Cohen kappa values were calculated [9]. A scatter plot matrix was used to visualize relationships between pairs of variables in a grid format. Each scatter plot shows the correlation between 2 variables. In addition, the kernel density estimation curve for each variable was drawn, and different colours were displayed for each group to provide additional information.
The consistency between the clinical decision and the interpretation by the ML algorithm was calculated as the accuracy, defined as the percentage of correct interpretations out of the total number of results. The area under the receiver operating characteristic curve was used to assess the discrimination performance of the model as a summary performance measure [10].
RESULTS
A total of 3,741 UFM cases were screened, and 1,269 tests were excluded according to exclusion criteria. After excluding 894 cases with a VV of less than 150 mL, we ultimately analyze 1,574 cases (1,031 in males and 543 in females).
The mean ages of the male and female patients were 66.5±10.5 years and 63.6±12.1 years, respectively. The UFM results of male cases were labelled normal in 521 cases (50.5%) and abnormal in 232 cases (22.4%), with unanimous decisions in 51.4% of cases. For female cases, 420 (77.3%) were normal and 60 (11.0%) were abnormal, with a 70.5% unanimity rate.
The internal consistency of the UFM readings was high (Cronbach alpha 0.88 [0.87–0.89] in males and 0.85 [0.83–0.86] in females). Moderate interobserver agreement was reached with regard to the normalcy of the UFM curve, with a kappa value of 0.43–0.55 in male cases and 0.39-0.56 in female cases.
Regarding the correlation between continuous variables as observed from the scatter plot matrix, VV and Qavg were positively correlated, and a negative correlation of Qmax with VT or FT was observed (Fig. 1). In addition, in the scatter plots with different colours between groups, it was possible to observe a clear distribution difference between the normal and abnormal groups.
For ML, 824 male and 338 female cases (80% of the data) were used as the training set, and 207 male and 82 female cases (the remaining 20% of the data) were used as the test set. When ML with logistic regression was performed with only one feature, 57.0%–83.8% accuracy was achieved. The index with the highest accuracy as a single variable was the Qmax value. When 2 features were used, 71.1%–85.2% accuracy was achieved. The variables that showed the best accuracy in a 2-feature model were Qmax and VV. When the number of features was increased one by one from 4 (VV, Qmax, TQmax, and Qavg) to 7 (VV, Qmax, TQmax, Qavg, VT, FT, and DT), the accuracy plateaued, increasing only from 86.5% to 88.9%. The interpretation algorithm showed the best accuracy when using a combination of 6 parameters: VV, Qmax, TQmax, Qavg, VT, and FT (Fig. 2).
In male cases, the interpretation accuracy of the ML models was 87.4%–88.9%, with the random forest model showing the highest accuracy. In the DL models, which were trained using the shape of the UFM curve, the accuracy slightly increased, reaching 0.918 for the CNN model and 0.908 for the RNN model. In female cases, the interpretation accuracy of the ML models was 87.2%–90.8%, with the random forest and logistic regression models tied for the highest accuracy. Interestingly, in the female cases, the DL models showed a dramatic improvement in accuracy over the other models, with the CNN model achieving 95.4% and the RNN model achieving 94.5% accuracy. The performance in clinical discrimination was outstanding in both genders, with a maximum area under the curve of 0.957 in males and 0.974 in females (Fig. 3).
DISCUSSION
We developed an AI algorithm that automatically interprets the results of UFM using 6 key parameters and the shape of the curve; we confirmed that the results generated by this algorithm were in very close agreement with the decisions of urology specialists. There was no significant difference in consistency according to the ML method in male cases, but in female cases, the accuracy increased dramatically when DL models were added to recognize the shape of the UFM curve. To the best of our knowledge, this is the first study to develop an automatic UFM reading algorithm using AI models.
For female cases, there was a dramatic increase in accuracy when DL models were added to recognize the 2D images of the UFM curve. However, when male cases were analyzed using DL models, there was no significant change in accuracy. One plausible explanation is that the 6 numerical parameters extracted from the UFM curve are sufficient to reflect the patient’s condition in male cases; on the other hand, in female patients, the parameters of UFM may be insufficient, and the shape of the curve must be referenced when reading UFM results. Few studies on the diagnostic application of UFM are available in women, and there is no clarity regarding reference values, their variations, and which factors influence these values [11]. Particular caution is recommended when interpreting the UFM results of female patients [12]. Further research may be needed to determine whether there are better numerical parameters and/or any unknown features that can be used for the interpretation of female UFM results.
Several studies have examined the limitations of clinic-based UFM [13,14]. The difficulty of providing a space with adequate privacy to relax and the demand for the patient to void without the normal desire to void are unrepresentative of daily voiding patterns, and it is also not feasible to repeat measurements in the clinic due to time constraints [15]. A potential solution to this phenomenon of “bashful bladder” under forced conditions is for the patient to measure his or her own urinary flow at home [15]. Several home UFM techniques have been introduced, such as timing methods, funnel devices, and electronic devices [16-18]. However, these techniques also do not provide a complete alternative due to the economic barrier imposed by the high cost of electronic devices, as well as the possibility of inaccuracy when patients calculate the values manually. For these reasons, the best option to date is our sound-based UFM system that works with a smartphone.
Our mobile app-based acoustic UFM is an easy-to-use, non-invasive method to estimate a patient’s urodynamics simply by recording sounds with a smartphone during voiding. A novel acoustic AI engine is applied to suppress sound artifacts, offset environmental characteristics, and improve prediction accuracy. This acoustic UFM system was built from 35,000 sessions of voiding data from 4,700 people in various real acoustic environments. It has already been validated clinically and is listed by the U.S. Food and Drug Administration as a uroflowmeter and medical device data system. Our mobile acoustic UFM system can be used to check and monitor the rate and volume of urinary flow in daily, natural settings. It can also track longitudinal trends and includes an automatic voiding diary for daily usage. This smartphone application might improve the shortcomings of current voiding diaries, such as incomplete records with missing values and low compliance [19]. In this study, we developed an automatic reading algorithm using ML to carry out mobile acoustic UFM.
The common disadvantage of all home UFM techniques, including the sound-based UFM we developed, is the lack of measurement of postvoid residual (PVR). Although the ICS recommends reporting PVR in UFM results, the key parameters of UFM are Qmax, VV, and flow pattern [2,20]. However, the most relevant parameter related to bladder outlet obstruction is Qmax [21]. Rather than having low clinical significance, PVR may be considered an independent parameter not included in UFM, hence the term “UFM with PVR.” If PVR is included, it can greatly increase the potential clinical impact, but UFM itself is sufficient as a screening test. An abnormal result from home UFM would presumably lead to a subsequent clinic visit, where stand-alone measurement of PVR could be obtained after normal voiding [15].
In this study, the interobserver reliability of UFM readings by the 3 researchers was relatively low. Other studies have also reported on the variability of interpretation [22]. In particular, the degree of agreement in the diagnosis of bladder outlet obstruction is very poor, at K=0.20 [23]. Urologists’ degree of experience may be the most important factor in these differences [24]. Since there are no absolute values defining normal limits, the interpretation of UFM results must be subjective and empirical [25,26]. For this reason, it is believed that the only way to automatically read UFM results is to use the opinions of clinical experts as a reference and attempt to replicate them. Although the degree of consensus among clinical experts’ opinions is not high, the agreement between AI readings and majority expert readings was excellent, at 95%. Urologists label UFM results mainly by reviewing the shape of UFM curves, but the numerical values of the UFM parameters actually represent the meaning of the shape quite well.
In conclusions, we developed an AI system that applies 4 ML and 2 DL algorithms and automatically interprets the results of UFM using 6 key parameters and the shape of the curve, in this study. We confirmed that the agreement between the automated readings and the judgement of urology specialists was very high. In females, the accuracy of the readings increased dramatically when DL models were added to recognize the shape of the UFM curve. Further research may be needed to determine whether there are currently unrecognized parameters of the shape of the UFM curve that would improve the interpretation of UFM results.
Notes
Fund/Grant Support
This study was supported by grant (No. 14-2019-018) from the SNUBH Research Fund and by a research fund from the Korean Continence Society, 2019, and was supported by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety [project number: 1711138269, KMDF_PR_20200901_0141]).
Research Ethics
This study was approved by the Institutional Review Board (IRB) of Seoul National University Bundang Hospital (IRB No. B-1912-583-001).
Conflict of Interest
MSC, a member of the Editorial Board of International Neurourology Journal, is the first author of this article. However, he played no role whatsoever in the editorial evaluation of this article or the decision to publish it. No potential conflict of interest relevant to this article was reported.
AUTHOR CONTRIBUTION STATEMENT
• Conceptualization: SL
• Data curation: MSC, HYR, SL
• Formal analysis: MSC, HYR, SL
• Funding acquisition: SL
• Methodology: MSC, HYR
• Project administration: SL
• Visualization: MSC, HYR, SL
• Writing-original draft: MSC
• Writing-review & editing: MSC, SL