The Twin Research Debate in American Criminology

“There is a danger of concealing assumptions which have no factual basis behind an impressive façade of flawless algebra.”

                                   — Lancelot Hogben, 1933[1]

The debate on the validity of twin research has recently resurfaced in the field of American criminology, and has major implications for other areas of behavioral research as well. Criminologists Callie Burt and Ronald Simons, in a 2014 critique of twin, adoption, and other “heritability” studies in their field, challenged the assumptions underlying studies of reared-together twins (the “classical twin method”). They also challenged the behavioral genetic position that observed behavior is the result of the additive influences of genes, the “shared environment,” and the “unshared environment.” [2] They concluded that the field should abandon heritability studies because they are “methodologically flawed,” and because they are based on “an oversimplified and incorrect model of gene function” based on the “biologically unsound” practice of “partitioning genetic versus environmental influences on variance in phenotypes.”[3] Burt and Simons’ original article was followed by two lengthy responses from a group of leading biosocial criminologist twin researchers, which include J. C. Barnes, John Paul Wright, Brian Boutwell, Kevin Beaver, and their colleagues (hereafter, Barnes and colleagues).[4] Burt and Simons responded to these and other critics in a subsequent article.[5] Our purpose here is to focus on the twin method and its key assumptions.[6]

Twin studies of criminality and “antisocial behavior” (ASB) are not new, and go back to the 1930s and earlier, when biological and genetic theories of crime flourished in the United States, Germany, Scandinavia, and elsewhere.[7] Since that time, the main technique used by supporters of genetic theories of human development and human behavioral differences has been twin research, which has been put forward as a scientifically validated research method that provides an ideal “natural experiment” for assessing the relative importance of heredity and environment. In almost all cases these studies are based on reared-together twin pairs, while in an extremely small yet influential handful of studies, twin pairs were said to have been reared apart in different families. These “reared-apart” (separated) twin studies, however, are plagued by numerous invalidating problems and biases, which include that most pairs were not truly “reared-apart,” and the role of cohort effects and other non-genetic influences inflating twin correlations.[8]

At the same time, apart from a few possible minor exceptions, decades of searches have failed to produce confirmed gene findings at the molecular level for differences in personality, socially disapproved behavior such as criminality, the normal range of IQ (cognitive ability), as well as the major psychiatric disorders.[9] Indeed, Barnes and colleagues could not name any confirmed gene discoveries for behaviors studied in criminology, or for that matter any other type of behavior. As sociologist Aaron Panofsky put it in his 2014 book Misbehaving Science, “Molecular genetics has been a major disappointment, if not an outright failure, in behavior genetics.”[10]

Rather than arrive at the reasonable conclusion that no such genes exist, however, most genetic researchers interpret these negative results as evidence of a “missing heritability problem,” enabling genomic research to continue as a major focus of research attention and funding.[11]

The Classical Twin Method

In the context of the stunning failure to discover genes, behavioral genetics and its adherents in various fields have fallen back on emphasizing “classical twin method” comparisons between MZ (monozygotic, identical) and same-sex DZ (dizygotic, fraternal) twin pairs reared together in the same family home. MZ pairs are said to share 100% of their segregating genes, whereas (like ordinary siblings) same-sex DZ pairs are said to share only 50% on average.[12] If MZ pairs resemble each other more (correlate higher) than DZ pairs for the behavior, behavioral disorder, or medical condition in question, twin researchers conclude that it has an underlying genetic component.

Genetic interpretations of twin method results, however, are based on twin researchers’ much-criticized MZ-DZ “equal environment assumption” (EEA). According to this assumption, MZ and DZ pairs grow up experiencing roughly equal environments, and the only factor distinguishing them is their differing degree of genetic relationship to each other (100% versus 50%). Twin researchers’ acceptance of the validity of the EEA allows them to argue that genetic factors explain the usual finding that MZ pairs behave more similarly (or correlate higher on psychological tests) than do same-sex DZ pairs. Twin correlations are then factored into more complicated “biometrical model fitting” (structural equation modeling) procedures, which produce numerical estimates of heritability, and of the “shared” and “unshared” environments.

During the first 40 years of the twin method (roughly 1924 to 1964), twin researchers in psychology and psychiatry defined the EEA—without qualification—as the assumption that MZ and same-sex DZ environments are equal. By the early-1960s, most twin researchers came to agree with the critics that MZ pairs experience more similar environments, that they are treated more alike, and that they are socialized to be more alike than DZ pairs. As behavioral geneticists John Fuller and William Thompson put it in their field-defining 1960 work Behavior Genetics, “MZ pairs are treated more alike, and may even be confused, by parents and associates. MZ co-twins model their behavior upon each other to a greater extent than DZ co-twins.” Seven years later J.C. DeFries, another leading early behavioral geneticist, wrote that the validity of the EEA is “questionable.”[13] (Barnes and colleagues also recognized that MZ pairs experience more similar environments than DZ pairs, and that MZs are emotionally closer to each other, more often belong to the same peer networks, attend classes together more often, and are dressed more similarly than are DZ pairs.[14])

Like most types of human behavior, MZ twin pairs correlate significantly higher for criminal and antisocial behavior than do same-sex DZ pairs.[15] The key question, which has always been the main area of contention between twin researchers and their critics, is why do MZ pairs show greater behavioral resemblance than DZ pairs? Twin researchers answer that MZs’ more similar genetic resemblance is the cause; most critics answer that the cause is MZs’ more similar environment. We and others have published detailed critical analyses of twin research during the past 20 years, and have called the validity of the EEA and the twin method into question.[16]

Critics point to the sometimes subjective and political aspects of defining criminal and antisocial behavior, in addition to the fact that, even though most people break the law at various times in their lives, poor people and people of color are more likely to be arrested, charged, and convicted of a crime.[17] Criminality is a social construct that depends on context. “Killing,” wrote Hubbard and Wald, “can be heroism or murder,” and “taking someone’s property can be confiscation or theft.”[18] The American Psychiatric Association’s Diagnostic and Statistical Manual (DSM) behavioral criteria for diagnosing antisocial personality disorder are vague and subjective. And yet, the ability to define and validate “criminal and antisocial phenotypes” is a basic requirement of genetic research in criminology.

In light of the overwhelming evidence that MZ pairs experience much more similar environments than experienced by DZ pairs, twin researchers of the 1960s and 1970s were faced with two options: (1) abandon the twin method, including all previous results, because the EEA is false, or (2) redefine the EEA in an attempt to keep the twin method alive. They chose Option 2.

Redefining the Equal Environment Assumption

Argument A

The first way that twin researchers redefined the EEA was through what could be called Argument A, which holds that MZ pairs “create” or “elicit” their more similar environments because they are more similar genetically.[19] As a leading twin researcher put it, writing in support of the EEA’s validity, “Although similarity in environment might make MZ twins more similar, it is equally plausible that by behaving alike, MZ twins create for themselves more similar environments.”[20] This, however, is a circular argument because the conclusion that genetic factors explain the greater behavioral resemblance of MZ versus DZ twin pairs is now based on a premise stating the very same thing. Twin researchers invoking Argument A, therefore, refer to the genetic premise in support of the genetic conclusion, and then refer back to the genetic conclusion in support of the genetic premise, in a continuously circular loop of faulty reasoning.[21] There are additional problems with Argument A, and it is clear that it fails to support the EEA and the twin method. [22] 

Argument B

Supporters of the Argument B position also recognize that MZ environments are more similar than those experienced by same-sex DZs, but hold that critics must show that MZ and DZ environments differ in aspects that are relevant to the behavioral characteristic (trait) in question. For example, in 1993 a leading group of psychiatric genetic twin researchers wrote, “The traditional twin method…[is] predicated on the equal-environment assumption (EEA)—that monozygotic (MZ) and dizygotic (DZ) twins are equally correlated for their exposure to environmental influences that are of etiologic relevance to the trait under study.”[23] This is twin researchers’ “trait-relevant” definition of the EEA. It allows them to argue that the twin method is valid even if, as a prominent twin researcher put it, MZ and DZ pairs “experience quite different environments.”[24] The mid-1960s developers of Argument B simply inserted the term “trait-relevant” in front of the word “environments” in the 40-year-old original definition of the EEA, and then placed the burden of proof on their critics to show that MZs share more similar trait-relevant environments than shared by DZs.[25]

Leading twin researchers and their critics therefore have agreed for over half a century that MZ environments are more similar than same-sex DZ environments, and both also recognize that environmental factors play a role in explaining behavioral differences in the population. Until twin researchers are able to identify specific and exclusive “trait-relevant” factors that contribute to the cause of the behavioral characteristic they are studying, these two widely recognized facts combine to invalidate any genetic inferences based on Argument B.

The Impact of Identity Confusion and Attachment between Twins

In a series of “EEA-test” studies spanning several decades designed to test the validity of the equal environment assumption, twin researchers have measured aspects of twins’ environmental similarity, such as whether twins shared the same bedroom growing up, had common friends, were dressed alike, and so forth. Although they usually find that MZ pairs grow up experiencing more similar environments than experienced by same-sex DZ pairs, these EEA-test researchers usually conclude in favor of the EEA and the twin method on the basis of Argument A, Argument B, or both.[26]

At the same time, twin researchers usually fail to acknowledge or adequately assess other important aspects of the twin relationship experienced by MZ pairs to a far greater degree than same-sex DZ pairs. The evidence clearly shows that reared-together MZ pairs experience much higher levels of identity confusion, attachment, and emotional closeness than experienced by reared-together DZ pairs, which will (presumably) lead to greater behavioral resemblance among the former.[27] For example, a schizophrenia twin researcher performed a “global evalua­tion of twin-closeness” based on 117 pairs, and found that 65% of the MZ pairs had an “extremely strong level of closeness,” which was true for only 17% of the same-sex DZ pairs. Fully 90% of the MZ pairs had experienced “identity confusion in childhood,” which was experienced by only 10% of the DZ pairs.[28]

In a 1976 Norwegian twin study of criminal behavior, Dalgard and Kringlen found that 42 of the 49 MZ pairs (86%) had an “extremely strong” or “strong” level of “emotional closeness” (interdependence), which was true for only 32 of the 89 DZ pairs (36%).[29] They found, due to the likelihood that “similar external milieu and mutual identification lead to similarities in personality, including the shared criminal tendencies,” that MZ pairs more often “operate together as a unit, and accordingly carry out criminal acts together.” They concluded that the EEA is “an assumption which today cannot be accepted.”[30] These findings provide additional evidence against the validity of the EEA.

Implications for Criminology and Other Areas of Behavioral Research

Because the evidence weighs heavily against the validity of the EEA—whether in its original, Argument A, or Argument B form—some critics have argued that the greater behavioral resemblance of MZ versus same-sex DZ twin pairs can be completely explained by non-genetic (environmental, developmental, and random) influences. They conclude that genetic interpretations of all past, present, and future MZ–DZ twin method comparisons in the social and behavioral sciences should be rejected outright, and that the best explanation for present-day “positive” twin study genetic findings in combination with negative results from genome-wide association (GWA) and other types of molecular genetic studies is therefore not that the “heritability is missing,” but that genetic interpretations of MZ-DZ comparisons are wrong.[31] Other critics refrain from reaching such definitive conclusions, but argue that the greater environmental similarity of MZ pairs greatly inflates heritability estimates based on MZ-DZ comparisons, and that the genetic contribution therefore is overstated.

Barnes and Colleagues’ Defense of the Twin Method

Although Barnes and colleagues did not invoke Argument A, their conceptualization of the EEA contained elements of Argument B.[32] They raised the following ten major points related to the EEA and Burt and Simons’ discussion of it: (1) although the twin method is indeed based on false assumptions, such as the EEA and the “no assortative mating assumption,”[33] the impact of these assumptions is that they “cancel each other out” in favor of genetic interpretations of MZ-DZ comparisons; (2) the twin method debate was “settled” decades ago in favor of its validity, and contemporary critics mainly rehash previous arguments that subsequently were shown to have no merit; (3) twin researchers performing EEA-test studies have upheld the validity of the assumption; (4) the EEA debate must be decided mathematically, and computer simulations provide “mathematical proof” and “demonstrate unequivocally” that genetic interpretations of twin method data are valid, and prove that “violations of the EEA” have only a minor impact on heritability estimates; (5) the results of “reared-apart” twin studies, and adoption studies, produce heritability estimates similar to studies using the twin method; (6) the results of newly developed “cutting edge” molecular genetic studies, such as “Genome-wide complex trait analysis” (GCTA), are consistent with genetic interpretations of twin studies; (7) Burt and Simons’ call to abandon heritability studies is wrong, and amounts to a “de facto form of censorship”; (8) Burt and Simons are politically motivated outsiders to twin research; (9) Burt and Simons “cherry picked” studies that support their arguments; (10) Burt and Simons’ arguments rely on “highly questionable sources” and “aggressive” politically motivated critics who attack the work of (presumably unbiased and non-politically motivated) “scholars.”

Points 1-7 range from very questionable to clearly false.[34] Points 8-10 question Burt and Simons’ knowledge, scientific objectivity, and integrity, and we will refrain from addressing them here, while noting that this tactic is sometimes used in behavioral genetics in an attempt to discredit the arguments of its critics.[35]

Regarding Point 1, if the null hypothesis stating that humans carry no genes predisposing them to criminality or ASB is true, then there is no assortative mating bias because mating patterns would have no genetic influence on these behaviors, and any observed MZ-DZ correlational differences would be caused entirely by non-genetic factors.[36] In order for Barnes and colleagues to be able to claim that assortative (non-random) mating patterns “downwardly bias heritability estimates” for criminality and ASB, they had to assume in advance that there are genes predisposing people for criminality and ASB. In other words, Barnes and colleagues assumed an important role for genetics as a means of concluding in favor of the very same thing, and like supporters of Argument A, their conclusion was based on illogical circular reasoning because it merely restated their premise in slightly different terms.

Previous behavioral genetic researchers also speculated that false twin research assumptions somehow cancel each other out in favor of genetics, usually without producing any “mathematical proof” in support of this claim.[37] The researchers carrying out the famous yet greatly flawed Minnesota Study of Twins Reared Apart (MISTRA) made a similar argument, writing that some of the assumptions underlying their models and conclusions “are likely not to hold,”[38] and “are generally oversimplifications of the actual situation, and their violation can introduce systematic distortions in the estimates.” However, they concluded that “several combina­tions of violations of assumptions can act to offset each other.”[39] Such unscientific speculation has not prevented the MISTRA studies from being widely cited in support of major genetic influences on intelligence and behavior.

Point 2 merely reflects the opinion of researchers supporting behavioral genetic and psychiatric genetic positions, but critics of course continue to highlight what they see as massive problems in twin research. We note that Barnes and colleagues felt the need to produce a 61-page article in response to their critics in a supposedly settled debate, a response that required the collaboration of no fewer than 24 people.[40]

Point 3 is very questionable, as several critics have subjected the EEA-test literature to critical review and have found major problems in this body of research.[41] As Richardson and Norgate concluded, in the context of IQ, “these studies do not support the validity of the EEA.”[42] Barnes and colleagues cited Joseph’s 2006 book The Missing Gene in both of their publications, yet they failed to mention that Chapter 9 of that book consisted of a detailed critical analysis of the EEA-test literature, where Joseph concluded, “The flawed and narrowly focused EEA-test literature provides little support for the EEA, regardless of how twin researchers have defined it. Moreover, any false theory or assumption can be ‘tested’ and upheld as long as the ‘testers’ (1) determine the hypotheses to be tested, (2) perform the tests, (3) draw the conclusions, and (4) remain blind to obvious real-world refutations of their conclusions.”[43] Instead, Barnes and colleagues chose to highlight sociologist Jacob Felson’s analysis, where he concluded that environmental bias in the twin method “is likely modest.”[44]

Although Barnes and colleagues and other supporters of behavioral genetic positions argue that the EEA has been tested and upheld, they completely overlook the best-replicated and longest running EEA-test studies ever performed, which consist merely of all the behavioral twin studies ever published. Nine decades of such studies have shown consistently that pairs experiencing similar environments and high levels of identity confusion and attach­ment—MZs—resemble each other more for behavior and behavioral disorders than do pairs experiencing less similar environments and much lower levels of identity confusion and attachment—DZs. The results of these EEA-test studies strongly suggest that the assumption is false.

Barnes and colleagues’ Point 4 argument that the validity of the EEA can be demonstrated mathematically is clearly flawed, and their computer simulations, which again are based on assuming genetic influences, make no more sense than a similarly failed attempt by twin researchers in political science two years earlier.[45] Barnes and colleagues believe that the twin method “rests on a foundation of testable assumptions,”[46] and that “there is no room for subjective opinion….There is only algebra.”[47] The EEA is indeed testable, but the only way that twin researchers can test it is simply to determine—based on sociological and psychological information—whether MZ and DZ twin pairs grow up experiencing roughly equal environments. As Barnes and colleagues acknowledged, the results show that MZ and DZ environments are not equal.

The EEA debate has nothing to do with “algebra,” and has everything to do with the actual lives and experiences of people, or more specifically, the childhood and adult social and familial environments of twins, and the levels of identity confusion and attachment they experience. In their computer simulations Barnes and colleagues produced percentage figures for the “amount of” common environment twins share (or theoretically could share), but these types of experiences are not easily quantified. As a leading American psychiatric genetic researcher noted almost 50 years ago, genetically oriented researchers “like to look at numbers” and “produce and analyze statistics,” whereas environmentally oriented researchers like to look at people, and the impact of the environments in which they grow up and live.[48]

In Beaver’s 2009 Biosocial Criminology: A Primer, the EEA was defined in the Argument B sense that the only way that the more similar environments of MZ pairs (such as being dressed alike and looking more alike) would “bias heritability estimates is if they actually increased the similarity of MZ twins on the phenotype.”[49] However, there cannot be a “foundation of testable assumptions” without agreement on what the specific and exclusive trait relevant environmental factors actually are. To the extent that biosocial criminologists recognize relevant environmental factors (see Beaver, Chapter 5), they must demonstrate that only these factors are trait relevant for such varying behaviors as aggression, shoplifting, tax evasion, murder, embezzlement, robbery, prostitution, and so on, and then must show that MZ and DZ pairs are equally exposed to these specific and exclusive environmental factors.

Moving on to Point 5, there are major problems with previous criminal and ASB adoption studies, many of which were discussed by Burt and Simons,[50] and reared-apart twin studies are greatly flawed on several critical dimensions (see note 8). Moreover, the heritability concept is controversial in and of itself, with some critics arguing that it is highly misleading and valid only for its original purpose as a breeding statistic,[51] that it does not measure the “strength” of genetic influences, and that its use should be discontinued in the social and behavioral sciences.[52]

Regarding Point 6, in their 2015 rejoinder Burt and Simons cited three studies in which GCTA heritability estimates and twin method estimates differed greatly.[53] One of these was a 2013 study by Trzaskowski, Dale, and Plomin, who compared GCTA and twin method results in a study of “childhood behavior problems,” which included autistic, depressive, hyperactive, anxiety, and con­duct symptoms. The title of their article read, “No Genetic Influence for Childhood Behavior Problems from DNA Analysis.”[54] The researchers used a large sample of over 2,000 twin pairs, and over 2,000 individuals for the GCTA analysis, and found a large difference between twin method and GCTA results. The twin study findings reflected the usual behavioral genetic conclusions based on the acceptance of the EEA and heritability estimates, with the researchers calculating heritabilities in the .40 to .60 range. The GCTA estimates, however, “are non-significant and mostly zero for self-report and parent measures of behavior problems.”[55] Rather than conclude that something is wrong with twin method assumptions, Trzaskowski and colleagues decided to attribute this discrepancy to what they called “missing GCTA heritability.” Charney discussed several potential biases in GCTA studies, includ­ing the failure to adequately account for genetic differences based on variation found among differing populations (population stratification), which introduces a potential environmental confound into GCTA stud­ies. He concluded that the GCTA search for thousands of genetic variants of tiny effect “is the last gasp of a failed paradigm.”[56]

Twin Method at the Crossroads

Burt and Simons recommended “end[ing]…heritability studies in criminology.”[57] Indeed, if the twin method is unable to disentangle the potential influences of genes and environments, as critics have charged since the 1930s, it certainly follows that the method should be discarded, or that its results should be reinterpreted. Family studies and family pedigree diagrams showing that behavioral characteristics and disorders “run in the family” were once widely, yet incorrectly, seen as providing “conclusive proof” in favor of heredity.[58] Nowadays, leading behavioral geneticists such as Robert Plomin and colleagues correctly recognize that “family studies by themselves cannot disentangle genetic and environmental influences,” and seek no mathematical proof to determine otherwise.[59] We argue that a similar conclusion holds true for twin method data in the social and behavioral sciences, and in many areas of medicine as well.

It was good science and not “censorship” when earlier scientists called for ending studies based on craniometry,phrenology, and physiognomy, and any contemporary criminologist calling for the use of astrological charts to predict whether certain people will commit violent crimes would be justifiably ridiculed. Whether or not the twin method will eventually join these pseudosciences remains an open question, one that will be decided by rigorous public and scientific examination and debate. It is our hope that the recent controversy in American criminology marks a new beginning of this debate, and we look forward to taking part in it.



Jay Joseph, Psy.D is a Clinical Psychologist in Oakland, California; Claudia Chaufan, M.D., Ph.D., is Associate Professor, University of California; Ken Richardson Ph.D. is an Independent Researcher and Former Senior Lecturer, The Open University; Doron Shultziner, Ph.D., is Assistant Professor, Department of Politics and Communications, Hadassah Academic College, Jerusalem, Israel; Roar Fosse, Ph.D. is at Vestre Viken Hospital Trust, Norway; Oliver James, M.A., Ph.D. (honorary) is a psychotherapist and author in London; Jonathan Latham, Ph.D., is Executive Director, The Bioscience Resource Project; and John Read, Ph.D. is Professor of Clinical Psychology, Swinburne University, Melbourne.

