We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Surnames are inherited in much the same way as are biological traits. Since surnames were generally adopted - in Europe during Medieval times - their distribution has become very uneven: analysis of the present geographic patterns provides an insight into the kind of redistribution of genes that has resulted from all the migrations of the intervening years. Using non-technical language and a minimum of mathematics, this book presents a lucid description and evaluation of these studies of the genetic structure of human populations. A special feature is the appendix which presents computer-generated maps and distribution diagrams of 100 common surnames in England and Wales.
Modern mainstream United States populations have been little investigated by surname analyses. A sample of names in a suburban Detroit telephone directory yields a low proportion of random isonymy, as one would expect from the heterogeneous ethnic origins of such populations. However, they are not true breeding populations and actual matings may be considerably more inbred. The diverse origins of Americans and consequent large numbers of different surnames – said to be over a million – imply low levels of isonymy. Because of considerable polyphyly of surnames and the high rate of name-changing that has characterized immigrants to the United States, the random component of inbreeding is probably even less than implied by random isonymy. However, in the early stages of immigration of each new group, the group seeks to keep its identity and members tend to marry among themselves, so the inbreeding rates may be similar to those of the country of origin. In fact, if the immigrant group is small, inbreeding may be more intense than in the place of origin. This would be reflected in the non-random component of inbreeding in the first few generations. Except for a few groups with very strong religious sanctions against intermarriage, such immigrant enclaves in the United States dissolve after one, two or three generations in the melting pot of urban America.
The classic study of Arner (1908) in the United States has already been cited.
Geneticists have long been interested in human inbreeding because of the known deleterious effects of close inbreeding (such as repeated sibling matings in domestic cattle). The effect of the lower levels of inbreeding that occur in human populations is less clear. Rare recessive genetic traits are found principally in the offspring of consanguineous marriages: for instance most cases of fructosuria occur in offspring of related parents (M. Lasker, 1941). Because such conditions are very rare, however, they have little importance. The significant question is whether inbreeding between first and second cousins (the usual levels in human populations) increases the risk of morbidity and mortality from common disorders. There is some controversy about it but the best large studies seem to show a small but significant effect (Schull and Neel, 1965).
The interest of biological anthropologists in inbreeding has usually been in the basic questions of the kinds and amounts of inbreeding that occur in the isolated populations that traditionally interest them. This has led to studies of small societies, such as that by Simmons et al. (1962, 1964) of the Aborigines of several islands in the Gulf of Carpentaria, Australia, that had been cut off so completely from each other and from mainland Australia that they had quite divergent gene frequencies. In South America, Ward and Neel (1970) found that seven villages of Makiritare Amerindians were virtually as diverse in frequencies of blood group genes as are the tribes of Central and South America from each other.
The application of models based on surnames to the study of the genetic structure of the human population would seem to call for some justification. Any such application involves the assumption that the inheritance of surnames and biological inheritance are similar, or alternatively it must attempt to measure and allow for the differences between the inheritance of surnames and that of genetic traits.
One may begin to introduce the subject of surname models by an account of the scope of human biology, the place of human population structure in it, and the reason that models by analogy are needed. Human biology is concerned with the adaptive mechanism that makes human life possible. From one point of view this is controlled by those aspects of the genome that are shared by all humans and distinguish human beings from members of other animal species. Human life involves the response of human beings in various cultural and natural environments.
The other chief concern of human biology is human differences and the factors that account for them. Again these can be genetic at base, but they also involve interaction with the environment – which, for human beings, involves virtually the whole range of land habitats and is rendered much more varied still by the results of human activities and their variation from region to region.
Yasuda and Morton (1967) traced the history of the use of surname models for the study of human inbreeding to George Darwin's (1875) article in the Journal of the Statistical Society. Darwin's father, the famous naturalist Charles Darwin, and his mother, a member of the Wedgwood family of china pottery fame, were first cousins. Darwin was interested in the possible deleterious effects of consanguinity of parents and he wanted to know the frequency of cousin marriages in England. He therefore sought data on cousin marriages and on marriages between persons of the same surname in various sources such as Burke's Peerage and the Pall Mall social register. He then followed an ingenious line of thinking to estimate the proportion of marriages between first cousins. He reasoned that marriages to a person of the same surname who was not a first cousin would be proportional to the frequency of the surname in the population. This would be frequent only for common surnames. The Registrar General (1853) had published the frequency of the 50 most common surnames in the marriage registers and from the sum of the squares of these frequencies (0.0009207) Darwin estimated that marriages between unrelated persons of the same surname would be not much different from one per thousand. The excess over this of marriages of persons of the same surname was ascribed to cousin marriages and this was divided by the fraction of cousin marriages that were same-name marriages to give the number of cousin marriages in the population.
In 1955 I interrupted my teaching of anatomy for a year in order to teach physical anthropology at the University of Wisconsin. I had already been introduced to the subject of population genetics and at that time occasionally met with and discussed the subject with J. F. Crow and N. E. Morton. Back in Michigan the members of the Department of Human Genetics at the University of Michigan and other geneticists continued to stimulate these interests of mine. However, some of the mathematical models, which are common in population genetics, seem abstract and in my research I have preferred more comprehensible (even if somewhat less elegant) formulations when they are adequate to explain the empirical data. About two years ago, G. A. Harrison of the Department of Biological Anthropology at the University of Oxford expressed his pleasant surprise at how much interesting information on the structure of human populations had been garnered by studies of the distribution of surnames, and he suggested that I write a little book on the genetic structure of human populations as seen in this way. He implied that such a book should be directed at an audience who may not be at home with the erudite algebra of population genetics. I have therefore avoided discussion of some minor theoretical differences between mathematical models. Those who wish to pursue such issues will find additional definitions in the Glossary and appropriate references cited in the text.
The population structure of part of Northumberland has been studied by the method of surname analysis. Dobson (1973) examined the parish records of four ecclesiastical parishes that lie along the valley of the River Coquet, and found that the retention of surnames in the population was similar in the four parishes. Of the surnames present in 1780–1809, 47–53% had been present in the same parish in 1720–49 and 39–41% had been present in 1690–1719. The further apart a pair of parishes, the fewer surnames they shared. Analysis of the surnames plus an examination of marital distances led Dobson to conclude that the population structure, and amount and distance of gene flow, led to effective population sizes too large to permit local genetic differentiation. Nevertheless, there was much less gene flow into these parishes than into that of Horringer in East Anglia where of the surnames present at the end of the eighteenth century only 2% had been present in the latter part of the seventeenth century (Buckatzsch, 1951).
One of the Northumbrian parishes, Warkworth, was further studied by Rawling (1973). After merging different spellings of the same surnames there were 22 instances of isonymous unions in the period from 1686 to 1812. Eighteen of these were endogamous marriages and four exogamous marriages. The amount of inbreeding, and especially the size of the non-random component, increased with time. However, the random component actually decreased, although perhaps not statistically significantly.
The simplest use that can be made of surname data in human biology is to record and compare frequency distributions. If the surnames of all the members of a population (or of a random sample of it) are arranged in rank order from the most to the least frequent, a series of statements can be made about such matters as the frequency of the most frequent name (or several of the most frequent names). Statements can also be made about rare names, for instance the number of names that occur only once. If the samples are large, the frequency of each common surname should vary only by sampling error and should not be appreciably affected by sample size. The number and proportion of surnames that occur once and only once, however, are very dependent on the sample size: as one begins sampling, every name will be unique, but, since the number of surnames is finite, as one continues sampling one will approach a condition in which all surnames in the sample have been encountered a second time. The same applies, although less markedly, to surnames that are listed only twice or three times and to other rare surnames. Thus the shape of the curve of the distribution of surname occurrences is dependent on sample size. In comparative studies a very simple solution would be to draw samples of equal size, but I do not think that this has yet been done in surname studies.
Coleman (1979, 1980a, 1982) has done a great deal to shed light on the population structure and marriage pattern of a modern English city and its environs. Most of the population of England is in urban areas, but almost all the other studies of population structure have been of villages. Coleman analysed the pattern in Reading, Berkshire, and its environs by a study of all the marriage records for a 12-month period in 1972–3, using copies of the registration records in the districts of Reading, Wokingham and Henley-on-Thames. A questionnaire seeking much additional information was circulated to a sample and returns were received from 946 of the 2396 couples married during the specified period. The average couple had been born 103 kilometres from each other – about the same distance as their parents. The average distance from their place of residence when single to the place where they met their future spouses was small, however, compared with the mean distance from the meeting place to the place where the couple resided after marriage. Thus migration after, or at the time of, marriage was an important element in inter-generational migration. As Coleman puts it, the people become individually more heterogeneous but collectively more homogeneous with regard to ancestry (and hence genetic constitution). There is enough migration that in only three generations the population of the built-up area of Reading would be 95% homogeneous.
Well over a million different surnames exist. This is true despite the fact that the largest society, that of China, has only a few hundred surnames and there are still some societies with none. The system of hereditary surnames (usually passed from father to offspring of both sexes – but of equal value in surname models where some other system prevails) can be thought of as a gene with over a million alleles. With that many alleles the system is much richer in informational content than any biological gene. The HLA system of tissue compatibility antigens, which is the most variable human genetic system so far investigated, has about four orders of magnitude fewer known variants than the system of surnames.
The first method to be considered in the present context is the mere counting of surnames. How a surname will be defined for this depends on the use that is to be made of the results. To the extent that the purpose is to simulate genetics, there is no point in grouping together two surnames that may have had their origin from the same profession (Mueller, Miller), personal attribute (Blanc, Bianco, White, Weiss) or saint's name (Martinez, Martin, Martini) if they were originally attached to unrelated families. On the other hand, since one is interested in considering together families that stem as branches from the same trunk, one may wish to merge two different surnames that derive from the same place name (Rotherham, Rudram).
The surnames of Scotland, Ireland, Wales and England have somewhat different histories and therefore vary in time of origin and in geographic distribution patterns.
Highland Scotland had clan names rather than hereditary surnames. After the defeat of the clans at the battle of Culloden in 1746, however, the transition was rapid; many Scots in the border region took English-sounding surnames, and Scottish names became more frequent in England. Many Scottish surnames are indistinguishable from Irish ones and the difference between Mac and Me is of little help. These prefixes indicate that the surnames of which they are a part arose from clan names or patronymics (that is, from a father's given name).
In Ireland the clan names apparently go back throughout historic times. Eventually some were Anglicized and the O' was often dropped. Many names of Norman origin also took root in Ireland. More recent migrants to Ireland were often Protestants and they carried different surnames that are usually distinct from those derived from the old clans. The clan names, at first closely associated with particular counties, spread to other areas but are still sometimes more frequent in their area of origin than elsewhere in Ireland. Names of non-Irish origin were usually introduced into large towns and cities, such as Dublin, and of course names of Protestants from Scotland are most frequent in Ulster.
There are two types of model of the geography of population structure on which theoretical constructions are based: the island model and isolation by distance. Both have been used by population geneticists to model inbreeding in human populations. In actuality human populations tend to fit one or the other type of model better but almost always have elements of both. The island model assumes that individuals aggregate in tight clusters with essentially random mating within the cluster (‘island’) and with greater genetic differences from all other such clusters than among themselves. In this model the individuals share in the gene pool of their own ‘island’ in equal degree. This is a model which fits human isolates reasonably well and is implicit in most of the studies cited in the previous chapter.
The alternative model, of isolation by distance, assumes that space (conceived as a two-dimensional surface in the case of terrestrial organisms) is more or less evenly peopled by individuals whose ancestry and genetic constitution vary according to their position on the surface. The difficulty with this model when applied to humans is the obvious clustering of people in cities, towns and villages. In addition, genes and surnames are discrete entities with all-or-nothing occurrence. Since any point in space can have only one individual at it, the probabilities of occurrences at each point have to be calculated from other occurrences not at that point but considered relevant because of their spatial relationship to it.
Human migration has been studied from many points of view. In using a surname model to study its effects, however, one is concerned with migration from a single angle: as the mechanism that redistributes genes geographically. Human migration draws pedigree lines on maps. The pattern of such lines depicts an aspect of human population structure of signficance to population genetics: it is the obverse of inbreeding. Such mapping of pedigree lines can be used to help explain distributions of human genetic polymorphisms and even, perhaps, to predict future redistributions – or, more exactly, to describe the conditions that would lead to alternative outcomes. Human genes cannot move except by the movements of people who carry them. At least that was true before the invention of artificial insemination. Therefore, historically, human migration accounted for all the movement of genes.
Gene movement may be seen in the distances from birthplaces of parents to the birthplaces of their children. A certain amount of tracing of individual pedigrees has been done by geneticists and others. Such studies inevitably have a geographic aspect. Pedigrees, however, are not representative of the population as a whole: they are more likely to be complete with respect to higher social classes, successful and noteworthy individuals and patients with hereditary diseases. Male ancestors are usually easier to identify and trace than female ones, so the male line is usually more complete than female and mixed lines in pedigrees.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.