Zip Code or Genetic Code: Teasing Out Effects of Genes and Environment on Health

Summary: A new study of twins in the US sheds light on both the environmental and genetic risk factors for numerous diseases.

Source: Harvard.

When it comes to disease and health, which is more powerful—ZIP code or genetic code?

The degree to which nature and nurture affect disease and health remains one of the eternal—and still unanswerable—questions in medicine.

Now a team of investigators from Harvard Medical School and the University of Queensland in Australia have tackled this question in a decidedly novel way.

In what the researchers describe as a coup for big data and a scientific first, the team has used a massive insurance database of nearly 45 million people in the United States including thousands of twin pairs to determine the effects of genes and environment in 560 common conditions. The diseases analyzed span 23 categories, ranging from cardiovascular illness and neuromuscular diseases to skeletal conditions.

The work, published Jan. 14 in Nature Genetics, is thought to provide the largest assessment of U.S. twins to date, the researchers said. It is also the first one to go beyond the traditional one-disease-at-a-time approach and analyze hundreds of the most common conditions among more than 56,000 twin pairs.

To date, most twin or familial studies of genes and environment have looked at a single disease or one environmental factor at a time.

Many diseases are neither purely genetic nor purely environmental but rather the result of a complex interplay between the two. Unlike classic inherited conditions—those caused strictly by mutations in a gene or a set of genes—environmentally fueled conditions are the sole result of factors external to an individual’s biology.

Most diseases do not fall neatly in either category but have elements of both. Disentangling how genes and environment contribute to multiple diseases in the same population has been astoundingly difficult, the researchers said. The new study aims to solve this challenge by developing a new large-scale analytical approach.

“The nurture-versus-nature question is very much at the heart of our study. We foresee the value of this type of large-scale analysis will be in shining light on the relative contribution of genes versus shared environment in a multitude of diseases,” said senior study author Chirag Patel, assistant professor of biomedical informatics in the Blavatnik Institute at Harvard Medical School.

The new method, the team said, underscores the value of large-scale analyses in informing national research efforts such as the National Institutes of Health’s All of Us program, part of the Precision Medicine Initiative, which aims to tease out biologic, genetic, social and environmental factors in disease and health as a way to inform individualized therapies. The findings of the new study can help direct research efforts by clarifying the relative influence of genetic versus environmental factors for a range of diseases.

“Our findings can provide signposts that inform subsequent research efforts and helps scientists narrowly focus their pursuits,” said study first author Chirag Lakhani, a post-doctoral research fellow in biomedical informatics in the Blavatnik Institute at Harvard Medical School. “For example, if our study of twins shows that there is very little heritability effect in a certain family of eye disorders, then future research should pursue alternative explanations.”

Using a database of nearly 45 million patient records—including more than 56,000 twin pairs and more than 724,000 sibling pairs—the investigators estimated the influence of genes and environment in fraternal twin pairs—those who share half of their genome, or DNA—and identical twins, whose DNA is 100 percent the same.

Same-sex twins can be either identical or fraternal, while opposite-sex twins are always non-identical or fraternal, but the researchers did not know which same-sex pairs were identical.

To circumvent this hurdle, they developed a novel statistical method that inferred the probability that a pair of twins is fraternal (non-identical) or identical. In doing so, the researchers were able to separate purely genetic from non-genetic contributions.

All patients had been part of the insurance database for at least 3 years, providing the researchers with more than just a snapshot in time. The newly published study, which involved young twin pairs, newborns to 24 years of age, was not designed to follow disease development over time. This meant the researchers were unable to assess the genetic and environmental influences of certain diseases that tend to develop in middle and older age such as cardiovascular disease and neurodegenerative conditions.

The analysis included variables such as clinical diagnosis, imaging test results, blood chemistry tests such as red and white blood cell counts, cholesterol levels and many others, as well as environmental factors such as air pollution levels, climate conditions and socioeconomic status, all extrapolated from the patients’ zip codes.

Nearly 40 percent of the diseases in the study (225 of 560) had a genetic component, while 25 percent (138 of 560) were driven at least in part by factors stemming from a shared living environment—conditions emanating from sharing the same household, social influences and the like.

Cognitive disorders demonstrated the greatest degree of heritability—four out of five diseases showing a genetic component—while connective tissue diseases had the lowest degree of genetic influence.

Of all disease categories, eye disorders carried the highest degree of environmental influence with 27 out of 42 diseases showing such effect. They were followed by respiratory diseases, with 34 out of 48 conditions showing an effect stemming from sharing the same household.

The disease category with lowest environmental influence was reproductive illnesses, with three of 18 conditions showing such effect, and cognitive conditions, with two out of five showing such influence.

Overall, socioeconomic status, climate conditions and air quality of each twin pair’s zip code had a far weaker effect on disease than genes and shared environment—a composite measure of external, nongenetic influences including family and lifestyle, household and neighborhood.

In total, 145 of 560 diseases were modestly influenced by socio-economic status derived by zip code. Thirty-six diseases were influenced, at least in part, by air quality, and 117 were affected by changes in temperature. The condition most potently linked to socioeconomic status was morbid obesity. While obesity undoubtedly has a genetic component, the researchers said, the findings raise an important question about the influence of environment on genetic predispositions.

a tree and dna
In the largest study of U.S. twins to date, researchers have used insurance records to tease out effects of genes, environment in hundreds of diseases. image is adapted from the Harvard news release.

“This finding opens up a whole slew of questions, including whether and how a change in socioeconomic status and lifestyle might compare against genetic predisposition to obesity,” Patel said.

Lead poisoning was, not surprisingly, entirely driven by shared environment. Conditions such as flu and Lyme disease were, again unsurprisingly, affected by differences in climate.

When researchers looked at classes of diseases by monthly health care spending, they found that both genes and environment significantly contributed to cost of care with the two being nearly equal drivers of spending. Nearly 60 percent of monthly health spending could be predicted by analyzing genetic and environmental factors.

Large-scale analysis like this study can help forecast long-term spending for various conditions and inform resource allocation and policy decisions, the researchers said.

About this neuroscience research article

Co-investigators Braden Tierney and Arjun Manrai of Harvard Medical School, and Jian Yang and Peter Visscher of the University of Queensland, Australia.
Data sets for the study were provided by Aetna insurance company. Aetna had no funding role in the study.

Funding: The research was supported by the Australian National Health and Medical Research Council (grants 1078037 and 1113400), National Science Foundation (grant 1636870) and Sylvia and Charles Viertel Charitable Foundation.

Source: Harvard
Publisher: Organized by
Image Source: image is adapted from the Harvard news release.
Original Research: Abstract for “Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes” by Chirag M. Lakhani, Braden T. Tierney, Arjun K. Manrai, Jian Yang, Peter M. Visscher & Chirag J. Patel in Nature Genetics. Published January 14 2019.

Cite This Article

[cbtabs][cbtab title=”MLA”]Harvard”Zip Code or Genetic Code: Teasing Out Effects of Genes and Environment on Health.” NeuroscienceNews. NeuroscienceNews, 14 January 2019.
<>.[/cbtab][cbtab title=”APA”]Harvard(2019, January 14). Zip Code or Genetic Code: Teasing Out Effects of Genes and Environment on Health. NeuroscienceNews. Retrieved January 14, 2019 from[/cbtab][cbtab title=”Chicago”]Harvard”Zip Code or Genetic Code: Teasing Out Effects of Genes and Environment on Health.” (accessed January 14, 2019).[/cbtab][/cbtabs]


Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes

We analysed a large health insurance dataset to assess the genetic and environmental contributions of 560 disease-related phenotypes in 56,396 twin pairs and 724,513 sibling pairs out of 44,859,462 individuals that live in the United States. We estimated the contribution of environmental risk factors (socioeconomic status (SES), air pollution and climate) in each phenotype. Mean heritability (h2 = 0.311) and shared environmental variance (c2 = 0.088) were higher than variance attributed to specific environmental factors such as zip-code-level SES (varSES = 0.002), daily air quality (varAQI = 0.0004), and average temperature (vartemp = 0.001) overall, as well as for individual phenotypes. We found significant heritability and shared environment for a number of comorbidities (h2 = 0.433, c2 = 0.241) and average monthly cost (h2 = 0.290, c2 = 0.302). All results are available using our Claims Analysis of Twin Correlation and Heritability (CaTCH) web application.

Feel free to share this Neuroscience News.
Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.