Background Hypertension is a major modifiable risk factor for cardiovascular diseases. Early identification of high-risk individuals using predictive models can facilitate targeted interventions. This study aims to develop and validate a machine learning-based risk prediction model for hypertension using complex, nationally representative survey data and to deploy it as an accessible web application.
Methods We utilized data from the WHO Steps Survey Nepal, 2019 including 5,593 participants. The dataset featured clustering, stratification, and sampling weights, which were incorporated into the analysis. Fourteen initial predictors were considered. We employed a combination of SMOTENC and KMeansSMOTE to address class imbalance and optimized prediction thresholds for each model. Six machine learning algorithms (Logistic Regression, Naive Bayes, Random Forest, LightGBM, XGBoost, and SVM) were trained and evaluated based on AUC ROC, precision-recall, calibration, and clinical utility (Decision Curve Analysis). Model interpretability was assessed using SHAP values. The best model was deployed as an interactive web application using Streamlit.
Results The prevalence of hypertension in the weighted sample was 26.1%. After feature selection using SHAP analysis, seven key predictors were retained: age, smoking, waist-hip ratio risk, heavy alcohol use, physical activity, fasting blood sugar, and total cholesterol. Logistic Regression demonstrated the best overall performance (AUC ROC: 0.718, F1-Score: 0.552) and was well-calibrated. It also offered the highest net benefit across a wide range of clinical thresholds. The model was successfully deployed as a publicly available web application (https://htnrisknepal.streamlit.app/).
Conclusions We developed a robust, interpretable, and clinically useful hypertension risk prediction model. The deployment of the model as an open-access web application bridges the gap between research and practical implementation, enabling its use by healthcare workers and for public health screening initiatives. Our methodology provides a reliable framework for building and deploying predictive models from public health survey data.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study did not receive any funding
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study used publicly available, de-identified data from the WHO STEPS Survey 2019 (Nepal). According to the WHO data access policy and our institutional guidelines, this secondary analysis of anonymized data did not require additional ethics approval.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Comments (0)