UnMuted: Defining SARS-CoV-2 lineages according to temporally consistent mutation clusters in wastewater samples

ElsevierVolume 54, March 2026, 100876EpidemicsAuthor links open overlay panelHighlights•

Wastewater samples over time allow for estimation of the definition of lineages.

Mutations with similar temporal patterns form clusters, which can share mutations.

Estimated lineage definitions from wastewater match those from clinical samples.

Abstract

SARS-CoV-2 lineages are defined according to placement in a phylogenetic tree, but approximated by a list of mutations based on sequences collected from clinical sampling. Wastewater lineage abundance is generally found under the assumption that the mutation frequency is approximately equal to the sum of the abundances of the lineages to which it belongs. By leveraging numerous samples collected over time, I am able to estimate the temporal trends of the abundance of lineages as well as the definitions of those lineages. This is accomplished by assuming that collections of mutations that appear together over time can be used to define lineages.

Three main models are considered: One that does not imposes a temporal structure, one that includes an explicit temporal component but allows for missing lineages, and one with an explicit temporal component that attempts to estimate all lineages. It is found that the temporal trend of estimated lineage definitions approximately corresponds to the trend of lineage definitions determined by clinical samples, despite having no information from clinical samples.

Keywords

Unsupervised machine learning

clustering

Wastewater-based epidemiology

Variants of concern

© 2025 The Author. Published by Elsevier B.V.

Comments (0)

No login
gif