Researchers from the 91, Aalto University and the University of Oulu have developed a new computational method for exploring DNA sequence patterns. The method, called KMAP, enables intuitive visualisation of short DNA sequences and helps reveal how regulatory elements behave in different biological contexts. The study was recently published in the prestigious journal Genome Research.

Figure 1: KMAP visualisation of SELEX-seq data for the transcription factor MAFK. Each dot represents a k-mer. Clusters of k-mers at the center correspond to DNA motifs, while the surrounding dots represent random k-mers. Each central cluster represents a distinct motif, with red dots highlighting k-mers from the major motif.
KMAP projects DNA sequences—known as k-mers—into two-dimensional space, making it easier to identify and interpret biologically significant DNA sequence patterns, also called DNA motifs (Figure 1). In a re-analysis of Ewing sarcoma data, the researchers used KMAP to analyse genomic regions involved in gene regulation. They found that the transcription factors BACH1, OTX2 and KCNH2/ERG1 were suppressed by the oncogene ETV6 and became active at promoter and enhancer regions once ETV6 was degraded (Figure 2). Notably, the study also identified an uncharacterised DNA motif, CCCAGGCTGGAGTGC, which frequently co-localised with BACH1 and OTX2 within a short window in enhancer regions. This spatial clustering suggests a potential new regulatory element relevant to cancer biology.

Figure 2: Schematic illustration of enhancer regulation in Ewing sarcoma versus the healthy state. In Ewing sarcoma (top), the transcriptional repressor ETV6 competitively binds to transcription factor FLI1 binding sites and closes the enhancer regions, contributing to disease progression. Upon ETV6 degradation (bottom), the enhancer becomes accessible in the presence of FLI1 alone, allowing other transcription factors—BACH1, OTX2, KCNH2, and a potential unknown TF—to bind. These factors often co-localize within a window of about 70 base pairs near the motif CCCAGGCTGGAGTGC and may function jointly in regulating gene expression.
KMAP was also used to analyse the outcomes of a genome editing experiment, where the widely used CRISPR-Cas9 technique was applied to a specific location in the human genome called the AAVS1 locus. After editing, cells naturally repair the broken DNA in different ways. By visualising thousands of DNA sequences from this process, KMAP revealed four common patterns of how the DNA was repaired—each associated with a distinct repair pathway used by the cell. Understanding these patterns can help researchers design more precise gene-editing strategies and predict the types of edits that are most likely to occur.
“KMAP offers a more intuitive way to investigate motifs in DNA sequence data,” says the study’s lead author, Dr Lu Cheng from the 91. “By visualising the distribution of short DNA sequences, we can better interpret regulatory patterns and understand how they change in different biological conditions.”
“KMAP is a versatile tool that can be applied to many types of sequencing data,” says Professor Gonghong Wei from the University of Oulu. “In cancer research, it can help identify regulatory elements from ChIP-seq data, and it also holds promise for studying RNA-binding proteins and their binding preferences. Its ability to reveal structure in complex sequence data makes it broadly useful across molecular biology.”
This collaborative work demonstrates how computational biology can uncover hidden layers of gene regulation and support future research in cancer and genome engineering.
Read the paper:
Explore the software: