Explainable machine learning for revisiting reported Irritable Bowel Syndrome correlates in a student cohort

Abstract

Irritable Bowel Syndrome (IBS) affects a substantial proportion of university students, yet its factors remain incompletely characterised in South Asian populations. We reanalysed a publicly available dataset of 550 Bangladeshi students from Hasan et al. [1], conducting a data audit that identified implausible records, including males reporting menstrual symptoms, and reduced the analytic sample to 506 observations. Using Explainable Boosting Machines (EBMs), which capture non-linear effects and pairwise interactions without sacrificing interpretability, we found that psychological distress, elevated BMI and academic dissatisfaction were the strongest predictors of IBS (mean AUC = 0.852 across 100 stratified train-test splits). Critically, several findings diverged from the original logistic regression analysis. Physical activity showed a non-linear risk pattern only at high intensity, the association with gender was substantially weaker when we accounted for metabolic and psychological factors as well and malnourishment does not have a strong an impact as in the original study. These divergences likely arise because the machine-learning model captures non-linear effects and interactions that were not represented in the original regression specification. Our findings underscore the value of reanalysing existing datasets with methods suited to capturing complexity and highlight data quality verification as a necessary step in the secondary analysis.

Author summary We reanalysed a dataset on Irritable Bowel Syndrome (IBS) among university students in Dhaka, Bangladesh. Before modelling, we audited the dataset, removed implausible records, and reconstructed the IBS classification from the Rome III questionnaire. We then applied an interpretable machine-learning model capable of modelling non-linear effects and interactions between variables. Psychological distress (particularly anxiety and stress), body mass index, and dissatisfaction with academic major showed the strongest associations with IBS. The model also identified several interaction effects involving BMI. Our results differ in several respects from the original regression analysis, suggesting that modelling assumptions and data validation can influence the interpretation of IBS correlates. This study shows how explainable machine-learning models can complement conventional statistical analyses and how data validation can affect results in secondary analyses.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The authors received no specific funding for this work. No grants, commercial funding, or financial support from public, private, or non-profit funding agencies were received for the conduct of this study. The authors declare that the research was carried out independently and that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study uses previously published and publicly available data and does not involve new human participants or identifiable personal information. Therefore, institutional ethics approval was not required.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The data used in this study originate from the publicly reported dataset described by Hasan et al. All data necessary to reproduce the analyses in this study, together with the code and detailed methodological steps used for data processing and modelling, will be made publicly available in an open repository upon acceptance of the manuscript.

Comments (0)

No login
gif