Asian genome unravelled

Whole genome data unveils startling facts on pathogenic mutations in Indian ethnic groups

Asian  genome unravelled

Asian populations have a particularly high number of novel genetic variations that increase their susceptibility to various genetic diseases including cancer, reveals one of the most extensive studies on Asian DNA ever carried out.

The report from the pilot phase of the GenomeAsia 100K Project includes a whole-genome sequencing reference dataset from 1,739 individuals of 219 population groups across Asia. It includes data from 598 individuals representing 55 ethnic groups that span the major language groups from the Indian subcontinent.

GenomeAsia 100K Project is a non-profit consortium aimed at sequencing and analysing genomes from 100,000 Asian individuals.

The practice of endogamy, or marriage within a select social or caste group, over the last 1,000 years or so has led to a higher burden of population-specific recessive disorders in India, observed the authors of the study, which was jointly carried out by US and Asian scientists from Nanyang Technological University, Singapore, the National Institute of Biomedical Genomics (NIBG), Kalyani, India, and University of California in the USA.

While it remains a fact that certain culprit genes are associated with diseases of monogenic origin, the development and progression of several genetic diseases like cancer are believed to be influenced by a host of gene variations. Researchers are now trying to figure out how these gene variations determine disease pathways and progressions. Population-specific genomic data would be of immense value to these efforts, because the incidence and frequency of several diseases tend to exhibit a clear population-group bias.

Currently available genomic data sources are largely Eurocentric. The underrepresentation of non-Europeans in human genetic studies has limited the diversity of individuals represented in genomic datasets, leaving out a large proportion of the world’s population, points out the study.

Because of this, available genetic data often has little or no medical relevance to Asians. 

This deficit can be addressed through population-specific datasets and genome-wide association studies, say the authors of the pilot study.

The study also represents a significant step towards bridging the gap in our understanding of the underexplored genome variations found in ethnically diverse groups in India.

“We have a great opportunity to apply genomics in India to understand, manage and treat diseases,” says Sam Santhosh, one of the authors of the paper and CEO of MedGenome, a genomics-driven research and diagnostics company headquartered in San Francisco and Bengaluru. MedGenome is the founding partner of GenomeAsia 100K consortium.

Genomic analysis of India’s unique population groups and disease cohorts will lead to the identification of genetic mutations and drug targets not just for the country but for the whole world, he adds. 

It is the first time that this data will be available in public. This resource can be leveraged by experts to enhance efforts for making cutting edge genomics accessible to Indians and to improve healthcare outcomes.

Enriched pathogenic variants

The GenomeAsia study has chronicled over 63 million DNA variations. Exploring disease-associated coding sequences, the study found that 23% of protein-altering variants in the genomes of Asians were not detected in other publicly available data sources. Though the majority of these genes were rare, the absolute numbers of novel variants with a minor allele frequency (MAF) greater than or equal to 0.1% within the pan-Asian data set has been found to be large, representing a total of 194 out of 585 sequences.

In addition, the researchers discovered 144,329 novel variants with an MAF of greater than 1% in populations grouped by geography: South Korea, Southeast Asia, Northeast Asia or Oceania.

These regions contain many diverse groups, which warrants additional studies to characterise these genetic patterns and their disease-relevance.

Genetic variations play a crucial role in the majority of rare diseases.

“The data from this study, besides helping understand the population groups, has already proved to be a great resource for disease gene discovery in an ongoing analysis of over 1,500 familial inherited disorder cases,” said Dr Sekar Seshagiri, one of the authors of the study and President, SciGenom Research Foundation, India.  

In rare-disease genetics, for instance, the study has identified 152 exomes from individuals participating in the Indian Maturity Onset Diabetes in the Young (MODY) project. MODY, a monogenic diabetes, is one of the metabolic diseases that is becoming increasingly common. It is now known that there are 14 different forms of MODY, each with its own unique clinical characteristics. Researchers have reported additional MODY relevant genes in South India, apart from the most frequently mutated (FM, August 2018).

Among disease-causing mutations, the analysis identified 732 pathological and pathogenic variations in 514 genes. This comprised 686 single nucleotide polymorphisms (SNPs) and 46 insertions or deletions or indels.

SNPs, which result from the substitution of a single nucleotide that occurs at a specific position in the genome, underlie the differences that make an individual susceptible to genetic diseases like sickle cell anaemia, cystic fibrosis, beta thalassaemia etc.

When the researchers compared and revalidated these variants, it was found that the high-frequency, pathogenic disease-associated genes were highly enriched in Asian population. For example, an HBB variant (chromosome 11:5248155c.92+5G>C) associated with beta thalassaemia was detected almost exclusively in South Asians and, at a lower frequency, among Southeast Asians.

Besides, a number of variants that had previously been reported to be pathogenic occurred at high frequencies.

Unique cancer mutations

As far as novel variants that increase cancer risk are concerned, the study lists 13 unique variants from 6 genes from 17 samples. These include frameshift, stop-gain and essential splice-site mutations in BRCA2, BRCA1, ATM, BLM, NBN and PMS2.

Frame-shift mutation occurs when indels in the base pairs in the DNA of a gene change the reading frame, resulting in a completely different translation of the genetic code. Frame-shift mutations are known to be a factor in several cancers.

Abnormally shortened proteins resulting from stop-gain mutation promotes tumorigenesis and other genetic conditions. Similarly, mutations in the areas of non-coding intron segments of DNAs, called splice sites, can be detrimental.

Mutations in tumour-suppressor BRCA genes are strongly associated with breast and ovarian cancers. 

The activity of protein kinase ATM gene is enhanced by DNA damage. Besides ataxia-telangiectasia,
ATM gene variations are linked to breast cancer, melanoma etc.. The BLM gene encodes a protein crucial for Bloom syndrome, a cancer predisposition disorder. Mutations in the nibrin-making NBN gene raises the risk of breast and prostate cancers. PMS2, which encodes a mismatch repairing protein, is implicated in colon cancer.

In a separate study on gall bladder cancer, the authors identified the same splice-site PMS2 mutation (chromosome 7:6043690C>G) in a Korean patient
whose gall bladder cancer exhibited microsatellite instability, according to the report which has been published recently in Nature.

Predicting drug response

In a finding that can have far-reaching implications for current clinical practice, the study revealed unique patterns of genetic variations that affect drug responses specific to a population. 

Drugs work differently for different people. It is often difficult to predict who will benefit from a medication, who will not respond at all, and who will experience adverse drug reactions.

With the advent of pharmacogenomics researchers can now tell how inherited differences in genes affect the body’s response to medications.

A few of such genes found to be determining the drug response have already been listed. 

The probability of adverse drug reactions in a specific population can be predicted from aggregate allele frequencies of these known variants associated with responses to the indicated drugs.

The GenomeAsia 100K pilot study has identified carbamezepine, clopidogrel, peginterferon and warfarin as drugs that show the largest variation in adverse drug responses between populations, with groups ranging from 0 to 100.

The widely-used carbamezepine, clopidogrel and warfarin drugs are notorious for their life-threatening adverse reactions in certain individuals.

Carbamazepine is frequently associated with Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN), a severe skin reaction. Extensive damage to the skin and mucous membranes, including the lining of the mouth and the airways, can lead to a dangerous loss of fluids and allow infections to develop. About 10 percent of people with SJS die from the disease, while the condition is fatal in up to 50 percent of those with TEN.

Variations that occur in the human leukocyte antigen (HLA)-B gene are most strongly associated with SJS/TEN. The HLA-B*15:02 variant was found to occur at an increased frequency in Austronesian language-speaking populations from southeast Asia, compared with other groups. The frequency of the mutation, for example, was 63% in Mentawi of West Sumatra and 46.6% in the Nilas of South Sumatra.

There are roughly 400 million individuals who belong to Austronesian groups that are at increased risk of carabamazapine sensitivity, including the vast majority of people from Indonesia, Malaysia and Philippines.

Interpopulation differences can have potential implications on drug testing and treatment.

Among Asian countries, Thailand has implemented genomic testing before administering carbamazepine to decide whether it is safe to give the medicine to a specific patient. Following this, the frequency of SJS/TEN has been drastically reduced in the country, reports show. People with clopidogrel-resistance receive very limited benefit from the anti-platelet treatment as they metabolise little or no clopidogrel and are at the risk of abnormal blood clot formation.

Warfarin sensitivity makes individuals less tolerant to the anti-coagulant. These people require lower doses of warfarin as the medication remains active in their body longer than usual.

Alpha interferons, including peginterferon alfa-2a, carry a USFDA-mandated boxed warning saying that the treatment may cause or aggravate fatal or life-threatening neuropsychiatric, autoimmune, ischemic and infectious disorders.

Loss-of-function alleles

A big loss of genetic variations can occur when a new population is established by a very small number of individuals from a larger population. This is called the founder effect. The new population that migrates or gets separated from the parent population does not carry the same frequency of alleles as the main population.

In populations with strong founder effects, the frequency of loss-of-function variants is skewed higher. This greatly increases the power of associations and provides unique advantages when trying to identify genes associated with both rare and common diseases.

Following this approach, the study found that a number of Asian groups with large urban populations have high identical-by-descent (IBD) scores. Samples from an outpatient department from a Chennai hospital, for example, had an IBD score that was 1.5 times higher than a Finnish group, which had an IBD score of 1465.

Population bottlenecks produce strong founder effects and increased rates of recessive diseases.

Mutation events introduce random genetic changes in the genes. Hence, most of the time, they result in a loss of function. A loss-of-function genetic variant is frequently associated with severe clinical phenotypes. Homozygous loss-of-function alleles provide the opportunity to assess the phenotypic effects of specific gene loss and can provide important information about opportunities for treating the disease.

Protein-truncating variants (PTVs), a stop-gain mutation, are genetic variants that shorten the coding sequence of genes. Examining high-confidence PTVs, the researchers found 17,566 variations with at least 1 PTV in approximately 43% of all protein-coding genes in the data set.

An additional 121 homozygous PTVs that have not previously been reported were also detected. The novel homozygous PTVs include an allele of the ABCA7 gene, Q2010*, that is found only in the Aeta population. Heterozygosity for loss-of-function alleles of ABCA7 has been shown to increase susceptibility to Alzheimer’s disease in European populations.  

Straight Talk

View More