DNA sequencing has revolutionized biomedicine, but the massive databases—which now total 100 petabytes (such as SRA and ENA)—have been difficult to search efficiently until now.
“MetaGraph”: The “Google for DNA”
Computer scientists at ETH Zurich have solved this problem using the “MetaGraph” tool. For the first time, it enables researchers to perform full-text searches of the raw data from all stored DNA and RNA sequences, similar to an internet search engine.
Instead of having to laboriously download entire datasets, researchers can now enter a sequence and find out within seconds or minutes where it has already appeared. This is made possible by an innovative technology: MetaGraph links raw data and metadata and compresses the data by a factor of 300 without any loss of information.
Benefits and Applications
- Efficiency and cost: The tool is not only precise and efficient, but also relatively inexpensive.
- Accelerating research: MetaGraph can speed up genetic research, for example by identifying resistance genes or beneficial viruses (bacteriophages) in the fight against antibiotic resistance.
- Scalability: The approach is scalable, which means that the additional computational effort decreases as the volume of data grows.
“MetaGraph” is available as open source, is constantly being improved, and may even be used by private individuals in the future.
Source and full article:
Swiss Federal Institute of Technology Zurich (ETH Zurich) (10/2025)
