Digital dark matter clouding AI in genome analysis

By | June 5, 2023

This article was reviewed based on Science X’s editorial process and policies. The editors have highlighted the following attributes ensuring the credibility of the content:


peer-reviewed publication

trusted source


View gradient correction. Incorrect saliency map sequence logo (top row), gradient angles at each location (second row), and correct saliency map (third row) for a patch from representative test sequences. a, bCNN-deep-relu trained to make binary predictions on asythesis data and bChIP-seq data for ATF2 protein in GM12878. The ground truth sequence logo is shown for CNN-deep-exp for asynthesized data. bAn ensemble mean salience map is shown in place of the ground truth (bottom row). ceA similar plot is made for a cDeepSTARR model trained to predict enhancer activity from STARR-seq data, a dBasset model trained to make binary predictions of chromatin accessibility sites from DNase-seq data, and an eCNN model trained to predict Basic resolution read coverage values ​​from ATAC-seq data in PC-3 cell line. ceA colored box and a corresponding sequence logo of a known motif of JASPAR (with a corresponding ID) are shown for comparison. Credit: Genome biology (2023). DOI: 10.1186/s13059-023-02956-3

Artificial intelligence has entered our daily life. First, it was ChatGPT. Now, they’re AI-generated pizza and beer ads. While we can’t trust AI to be perfect, it turns out that sometimes we can’t trust ourselves with AI either.

Cold Spring Harbor Laboratory (CSHL) assistant professor Peter Koo has found that scientists using popular computational tools to interpret AI predictions are picking up too much “noise” or extra information when analyzing DNA. And he found a way to solve this problem. Now, with just a couple of new lines of code, scientists can get more reliable explanations from powerful AIs known as deep neural networks. This means they can continue to chase the true characteristics of the DNA. These features could only signal the next breakthrough in health and medicine. But scientists won’t see the signals if they’re drowned out by too much noise.

So what causes the annoying noise? It is a mysterious and invisible source like digital “dark matter”. Physicists and astronomers believe that most of the universe is filled with dark matter, a material that exerts gravitational effects but which no one has yet seen. Similarly, Koo and his team found that the data on which the AI ​​is trained lacks critical information, leading to significant blind spots. Even worse, these blind spots are taken into account when interpreting the AI’s predictions of DNA function. The study is published in the journal Genome biology.

Koo says, “The deep neural network is embedding this random behavior because it learns a function everywhere. But DNA is only in a small subspace of that. And it introduces a lot of noise. And so we show that this problem actually introduces a lot of noise on a wide variety of important AI models”.

Digital dark matter is the result of scientists borrowing computational techniques from the artificial intelligence of computer vision. DNA data, unlike images, is limited to a combination of four nucleotide letters: A, C, G, T. But image data in the form of pixels can be long and continuous. In other words, we’re giving the AI ​​input it doesn’t know how to handle properly.

By applying Koo’s computational correction, scientists can interpret AI DNA analyzes more accurately.

Koo says, “We end up seeing sites become much sharper and cleaner, and there’s less spurious noise in other regions. One-off nucleotides that are considered very important suddenly disappear.”

Koo believes that noise disturbance affects more than AI-powered DNA analyzers. He thinks it’s a common affliction among computational processes involving similar types of data. He remembers, dark matter is everywhere. Thankfully, Koo’s new tool can help lead scientists out of the darkness and into the light.

More information:
Antonio Majdandzic et al, Correcting gradient-based interpretations of deep neural networks for genomics, Genome biology (2023). DOI: 10.1186/s13059-023-02956-3

About the magazine:
Genome biology

#Digital #dark #matter #clouding #genome #analysis

Leave a Reply

Your email address will not be published. Required fields are marked *