Background The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types.
Methods We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis.
Results Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993).
Conclusions The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study did not receive any funding
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethical approval is not applicable since the study only analyzed the data from public domain.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data reanalyzed in this study are publicly available, with access details provided in Table S1. The computational codes are also publicly accessible and can be found in the respective method papers cited in the Materials and Methods section.
Comments (0)