The Simplex Things In Life: Utilizing Artificial Intelligence Models to Better Understand Autism

Autism Spectrum Disorder, or ASD, is nothing if not unique.

The way ASD manifests itself in people is unique; although it most often presents as some form of variable impairment in social interaction and communication, each individual has behaviors and habits that are as unique to them as snowflakes are to one another.

ASD has also proven itself to be a uniquely challenging disorder to study. In the past decade, de novo (new) mutations have been identified as key contributors to causality of ASD. However, the majority of these identified de novo mutations are located in protein-coding genes, which comprise only 1–2% of the entire human genome.

Up to this point, a majority of previous research has focused on identifying mutations located in the 20,000 identified genes in the protein-coding region, which would seem like a promising approach. Genes are the genetic blueprints for creating proteins, which control and perform crucial tasks in our bodies, such as fighting off infections, communicating between your organs, tissues, and cells as chemical messengers, and regulating your blood sugar levels. It seems like basic math: Genes + Mutations = Mutated Proteins. Mutated Proteins = Disrupted Protein Function.

However, it has been observed that all the known genes that are ASD-associated can explain only a minor fraction of new autism cases, and it is estimated that known de novo mutations in the protein-coding region contribute to not more than 30% of cases for individuals who have no family history of autism (better known as simplex ASD). This provides evidence to suggest mutations contributing to autism must additionally occur elsewhere in the genome.

The other 98%, the noncoding region, of the human genome has previously been thought of as “junk DNA”, due to the fact that it appeared to have no real purpose or function. Research has just begun to clear up the shroud surrounding this genetic mystery. Unfortunately, the sheer vastness of the 98% of the 3.2 billion chemical pairs that comprise the human genome makes detecting noncoding mutations an exceptionally difficult task, made even more challenging by the fact that a single individual could contain dozens of mutations, many of which would be entirely unique to that individual.

Due to these challenges, the traditional study approach, which involves identifying common mutations throughout affected populations, is an impractical one in this scenario. Up until recently, scientists’ ability to effectively scour the noncoding region of the genome for mutations that are functional (and of those functional mutations, which contribute to the phenotype of the disorder) would have involved a nearly painstaking, never-ending slog of hundreds upon millions of lab experiments, which sounds about as unfathomable as it does unsavory. Thus, the pursuit to uncover which noncoding mutations might cause ASD is by no means a small feat.

Keep It Simplex

In a recent study published in Nature, a research team at Princeton University set out, new method in hand, to tackle this massive undertaking. The team applied an artificial intelligence (AI) technique called deep learning to a collection of data known as the Simons Simplex Collection (SSC).

This data set was comprised of whole-genome sequencing for 1,790 ‘quartet’ families: families of four that have one child with autism, as well as two parents and a single sibling, all of whom are unaffected by the disorder. Each of these quartets had no prior family history of autism, which implicates non-inherited mutations as the cause of the singular child’s autism in each family. This, combined with the built-in controls of the matched, unaffected siblings, made this particular set of data ideal for this large-scale study.

In deep-learning models, an algorithm performs consecutive layers of analysis in order to understand and detect patterns. The framework of deep learning for this experiment was designed to identify biologically relevant sections of DNA and RNA, that could predict the functional, as well as pathogenic, impact of de novo mutations in the genomes available from the SSC. The algorithms of deep-learning models are entirely dependent on having tons of data to learn from and, luckily, there is no danger of a data shortage in this particular case.

For the DNA level of deep learning, the team trained the model on cell-type-specific models from the ENCODE and Roadmap Epigenomics projects, which includes approximately 2,002 transcriptional regulatory effects. For the RNA level of learning, the deep-learning model was trained using the biochemical profiles of approximately 232 post-transcriptional regulatory features, including ribosomal binding protein (RBP) profiles, histone modifications, and transcription factors, which were gathered from cross-linking immunoprecipitation (CLIP) data.

Using these models, the algorithm made it’s way through 7,097 whole genomes from the SSC, predicted the estimated functional impact of every mutation possibility in the context of the 1,000 chemical pairs around it, and provided a biochemical interpretation of that impact. To match up the biochemical outcomes to phenotypic impact, the team trained a regularized linear model using a set of mutations that have already been identified in human disease from the Human Gene Mutation Database (HGMD), as well as a collection of rare variants detected in healthy individuals in the 1000 Genomes populations.

The linear model was then able to generate a predicted disease impact score (DIS) for every autism-related mutation individually, based on what the model learned about transcriptional and post-transcriptional regulatory effects. The DIS estimates the likelihood of any given mutation to contribute to a disorder or disease (in this case, autism) and serves to help prioritize and rank them based on this likelihood.

Through analysis with these models, the research team was able to determine that there was a significantly higher functional impact of noncoding de novo mutations in the children with autism, when compared to their unaffected siblings.

The Gene-Tissue Connection

It is widely recognized that one of autism’s key attributes is altered development in the brain. However, a comprehensive gene-tissue association had never been established for the de novo noncoding variants. To address this, the research team methodically tested the variant effects of autism-specific mutations for the 53 cell types and tissues defined in the Genotype-Tissue Expression (GTEx) project.

The results of this testing reveal that the noncoding mutation variants have an effect on gene expression in the brain as well as in some genes that have already been linked to autism, particularly in brain synapse transmission and regulation of chromatin.

The observation that some of the noncoding variants affected similar genes and functions as previously identified coding variants suggests that mutations of both the coding and noncoding variety affect pathways and processes that have some overlap. This, in turn, implies a convergence in the genetic landscape, and emphasizes the possibilities of combining the mutations to pinpoint specific ASD-associated genes.

Validation Station

The analysis efforts of the research team utilizing the deep-learning models revealed prime candidates of noncoding, disease-associated mutations that have the potential to influence ASD through gene expression regulation. As a way to functionally validate and provide additional evidentiary support for potential causality of the predicted high-impact mutations, the researchers utilized a cell-based luciferase reporter assay system to study the allele-specific effects in the lab.

The team inserted some of the predicted high-impact mutations into Promega’s pGL4.23 Firefly Luciferase Vector, which was then transfected into human neuroblastoma BE(2)-C cells, along with our NanoLuc® Luciferase Genetic Reporter. Following transfection, the team utilized Promega’s Nano-Glo® Dual-Luciferase® Assay System to detect luminescence, and thus the changes in gene expression. The resulting changes affirmed that the predictions made by the deep-learning model translated to quantitative allele-specific effects on gene expression.

Great Expectations

The successful application of AI modeling to the world of genomics, reveals the previously unknown significance of mutations in the noncoding region, and hints at the roles that both transcriptional and post-transcriptional mechanisms might play, not only in the causality of ASD, but in other complex diseases as well, such as heart disease and cancer.

Although the results of the study did not ultimately reveal the precise causes of autism, it did the next best thing, by providing thousands of possible ASD-contributors. This will help narrow the focus of study for future researchers, which in and of itself feels like a huge step forward in cracking the autism code.

Bio
Latest Posts

Natalie Larsen

Natalie is formerly a Science Writer at Promega. She earned her B.S. in Microbiology from the University of Wisconsin-Madison, and her Associate's Degree in Science from Cottey College. In her spare time, she can be found playing volleyball, making music, chipping away at her never-ending stack of craft projects, and volunteering with animals.

Latest posts by Natalie Larsen (see all)

How an Innovative Mobile DNA Analysis Lab Helped Identify War Victims in Ukraine - September 8, 2023
Breathtaking Breakthrough: How Gut Microbial VOCs Are Revealing Biomarkers, One Exhale At A Time - August 24, 2023
Custom Manufacturing: Translating Research into Product - June 9, 2023

One thoughtful comment

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
gdpr_status	6 months 2 days	This cookie is set by the provider Media.net. This cookie is used to check the status whether the user has accepted the cookie consent box. It also helps in not showing the cookie consent box upon re-entry to the website.
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
SC_ANALYTICS_GLOBAL_COOKIE	10 years	This cookie is associated with Sitecore content and personalization. This cookie is used to identify the repeat visit from a single user. Sitecore will send a persistent session cookie to the web client.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.
WMF-Last-Access	1 month 18 hours 24 minutes	This cookie is used to calculate unique devices accessing the website.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
BIGipServerwww.promega.com_sitecore		No description
CanCheckOut		No description
CommerceCustomerId		No description
CONSENT	16 years 7 months 15 days 6 hours 22 minutes	No description
cookies.js	session	No description
Country	3 months	No description
CountrySelected	3 months	No description
CustomerId		No description
PreferredLanguage	3 months	No description
PromegaCompno	3 months	No description
PromegaCountry	3 months	No description
RememberMe	6 months	No description
SameSite		No description
sc_ext_contact	2 years	No description
sc_ext_session	session	No description
TS01ae363a		No description
UID	2 years	No description
website#lang		This cookie is used for storing the visitor language preferences. It heps in delivering localised language version.
wp_api	past	No description
wp_api_sec	past	No description
_ga_WHZLGVEZ9X	2 years	No description

Cookie	Duration	Description
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
_gat_UA-62336821-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.

Promega Connections

Thoughts, tech tips and news about science

The Simplex Things In Life: Utilizing Artificial Intelligence Models to Better Understand Autism

Keep It Simplex

The Gene-Tissue Connection

Validation Station

Great Expectations

Natalie Larsen

Latest posts by Natalie Larsen (see all)

Like this:

Related

One thoughtful comment

Leave a ReplyCancel reply

Keep It Simplex

The Gene-Tissue Connection

Validation Station

Great Expectations

Natalie Larsen

Latest posts by Natalie Larsen (see all)

Share this:

Like this:

Related

One thoughtful comment

Leave a ReplyCancel reply