Cis-regulatory elements (CREs), such as promoters and enhancers, are short DNA regions with central roles in controlling gene expression. Typical CREs are approximately a few hundred base pairs long and exert regulatory function by binding to diverse proteins, including transcription factors (TFs) and nucleosomes. The combinatorial binding of these proteins to CREs comprises the ‘cis-regulatory code’, which has remained enigmatic and one of the central questions in modern-day biology. Hence, to help solve this code, tools are needed to accurately quantify TF and nucleosome binding genome-wide and across cell types or states. To this end, DNA footprinting is an elegant method that detects protein binding through their protection of DNA from chemical or enzymatic reactions. However, most footprinting methods can be applied only to bulk samples, rather than single cells, limiting insight into systems with complex cell-type (or cell-state) dynamics.
Next, we developed a deep learning model named seq2PRINT that learns the footprints generated by PRINT to then predict footprints from local DNA sequence (Fig. 1). By extracting the learned sequence features, we found that seq2PRINT can distinguish between bound and unbound TF binding sites, which we evaluated using ChIP–seq (chromatin immunoprecipitation with sequencing) data as the gold standard for binding. Surprisingly, this prediction accuracy holds even for TFs without visible footprints from PRINT by inferring their effects on neighbouring footprints. Thus, seq2PRINT serves as a powerful tool to map TF binding across the genome.
Comments (0)