Chapter 3:

Demographics, Genetic Diversity and Breed Structure of the Cleveland Bay Horse Assessed By Pedigree Analysis

1. Introduction

In the previous chapters we have been introduced to the importance of understanding the nature of livestock biodiversity in order to identify and develop strategies to counter its erosion.

In recent years much effort has been put into understanding breed diversity and structure through molecular methods. Whilst there can be no doubt this has advanced knowledge, it is only very recently that the trend in research has come full circle back to appreciation of the importance of analysis of pedigree data, to uncover the nature of founder contributions and the levels of inbreeding brought about through historic breeding practices. With the increase in availability of appropriate software there has been a growth in the number of livestock breeds being successfully described in genetic terms, including specific closed populations of different equine breeds such as the Lipizzan(Zechner et al., 2002), Polish Arabian (Glazewska and Jezierski, 2004), Andalusian(Valera et al., 2005) and Fresian(Ducro et al., 2006).

In this chapter we look in detail at what can be learned from analysis of the Cleveland Bay Horse Society Studbook Data, and gain a more complete understanding of why the breed is placed on the Critically Endangered watch-list of the Rare Breeds Survival Trust.

2. Materials and Methods
2.1. Data Acquisition

The Cleveland Bay Horse Society published its first studbook in 1884. Volume One contained retrospective pedigrees of 567 stallions foaled prior to January 1880. The Society has maintained its studbook since that time, now comprising 38 volumes and containing pedigree information on 5757 animals.

Since Volume III, which included retrospective data for numerous mares as well as additional stallions, the studbook has remained effectively closed, and now requires 8 generation pedigrees for animals to be eligible for inclusion in the pure bred register. In addition to the pure register, since 1934, the Society has also maintained a Grading Register, in which mares of “Cleveland Type” can be listed and be used in an upgrading scheme. This scheme was originally conceived at a time when breed numbers were heavily depleted as a result of losses in the Great War of 1914 - 1918, and breeders recognised that there were still unregistered mares of probable Cleveland Bay origin of farms in the North East of England. In recent years very few mares have been added to the Grading Register, numbering single s in the last decade.

In 1985 the Society published its 29th volume - the Centenary studbook, which contained a summary of all animals recorded in previous volumes. Until the mid 1990's all society studbook records were in paper format, with no electronic data available. The data for the present study was extracted from Volume 29 and the subsequent 9 editions of the studbook. The information was digitised in Filemaker™ database format, listing the unique identities of each animal, along with those of the two parent animals (where known); the sex of each animal and date of birth.

Subsequent to this first complete digitisation, the electronic data was made available to the Cleveland Bay Horse Society, and now forms the basis for their own electronic studbook, which it maintains using XIS systems software.

At the outset of this study very little software was publicly available with which to conduct appropriate pedigree analysis. In previous investigations of a restricted dataset the Fortran software COWMATE 93 had been used to calculate rates of inbreeding (Walling, 1994). However the software GENES (Lacy, 1998) was being widely used by studbook keepers in zoos around the world and was freely available under the terms of the GUI licence. In order to facilitate genetic analysis using GENES, the data was first entered into the commercially available SPARKS (Single Population Animal Record Keeping System) software, distributed by The American Zoological Society/ International Species Information System (ISIS). The size of livestock studbooks is generally greater than any zoological collection and this was the case with the Cleveland Bay data. In order to run successfully under the SPARKS system a specially extended version was produced by ISIS, and they report that at the time, the Cleveland Bay dataset was the third largest database to run under this framework.

In recent years research into Livestock Biodiversity has caught up with that conducted by wildlife conservationists, and there have been a number of useful software developments and a wider range of programmes freely available for research purposes. Of particular note are ENDOG (Gutierrez and Goyache, 2005); EVA (Berg et al., 2007); CFC (Contribution, Inbreeding [F], Co-ancestry) (Sargolzaei et al., 2006), and POPREP (Groeneveld et al., 2009). Each of these offers slightly different routines and where they have been used they are identified in the text.

Pervious analysis of a partial Cleveland Bay Studbook dataset (Walling, 1994) centred around the level of inbreeding and coancestry within the breed. Calculation of an individual's inbreeding coefficient (Wright, 1922, Malècot, 1948) depends on the extent to which its ancestry is known and the number of defined generations in the animal's pedigree. Whilst the absolute level of inbreeding may not be of importance, its rate of increase certainly is, being the prime parameter in assessing the additive genetic variation within the population (Groeneveld et al., 2009). The rate of inbreeding has a direct effect on one of the major statistics of breed diversity - the effective population size - and we will see that a variety of different assumptions and methods lead to different estimates of this. The broad spectrum of software now available has allowed detailed analysis, utilising a number of different assumptions, and each contributes to our combined knowledge of the Cleveland Bay breed.

2.2 Data Quality and Pedigree Completeness

It is now accepted that the quality and validity of any genealogical analysis from studbook pedigree data is directly related to the accuracy and completeness of the dataset(Oliehoek and Bijma, 2009).

Importation of the electronic studbook data into SPARKS revealed many pedigree errors, the majority of which were infinite loops, caused by incorrect identification of sire or dams. Each of these was investigated and the most likely solution found - often the cause being two animals bearing the same name but different studbook number and birth-date. The necessary corrections were agreed with the Society and both studbook datasets amended, with corresponding corrections made to the paper records when the Society published its Millennium compendium edition.

A number of different parameters may be used to evaluate pedigree “quality”. The most popular of these is that proposed by MacCluer et al, who defined a weighted index to measure the completeness of a pedigree(MacCluer et al., 1983). This index summarizes the proportion of known ancestors in each ascending generation. It quantifies the chance of detecting inbreeding in the pedigree (Sorensen et al., 2005). Calculation of Pedigree Completeness was made using PopRep (Groeneveld et al., 2009). The following formula was used to compute pedigree completeness: and Where k represents the paternal (pat) or maternal (mat) line of an individual, and ai is the proportion of known ancestors in generation i; d is the number of generations measured when calculating the pedigree completeness. Values for pedigree completeness will range from 0 to 1. Where all of the ancestors of an individual are known to some specified generation (d) then Id=1. However , where one of the parent animals is unknown Id = 0 (Groeneveld et al., 2009).

2.3 Population Demographics

The software POPREP (Groeneveld et al., 2009) was used to calculate:

The number of breeding males and females per year. The number of breeding animals at any one time will determine the genetic structure of the population in subsequent generations. The number of breeding females is of particular importance because the FAO and the RBST use assessment of the number of breeding females as a measure of vulnerability or “endangerment” of livestock breeds.

ii. The Age Structure of Parents by Birth Year. Variations in breeding practice and bottleneck events may be indicated by wide variation in the age of animals being brought into or kept in reproduction.

iii. Distribution of Parity of Dams at Birth of Offspring. The rate of genetic progress of a population will be influenced the turn-over of breeding stock. Where non random mating conditions exist, and animals stay in production for a long time the turn-over of breeding stock will be affected. This is particularly the case in equine breeds, where the females are non-menopausal and can remain in production well into their twenties where other age related conditions do not preclude breeding.

Family Size. This relates to the number of offspring of an individual that go on to become breeding animals themselves in the next generation (Falconer and MacKay, 1996). In an idealised situation all parents have an equal chance of contributing offspring to the next generation. However, in practice this is rarely the case in production livestock. The unequal contributions both of individuals, and also the sex bias between sires and dams leads to variation in family size. A consequence of this is an increase in the rate of inbreeding and a reduction in effective population size. In calculating family size offspring were categorised into four groups - All Offspring ; Selected Offspring ( going on into production); Selected Sons ; Selected Daughters (Groeneveld et al., 2009).

2.4 Average Generation Interval

The generation interval (GI) is one of the most influential factors affecting the rate of genetic progress over time. In its most simple form GI is assessed from the average age of males and females in the studbook. However it is more usefully defined as the average age of the parent animals at the time of birth of their offspring, where an offspring is an animal that subsequently goes on to have progeny of its own in the studbook (Falconer and MacKay, 1996).

Transfer of genes from parents to offspring will occur through four pathways - sire to son; sire to daughter; dam to son and dam to daughter. The generation interval is calculated for each of these four pathways and the results averaged for each year group using PopRep (Groeneveld et al., 2009)

2.5 Inbreeding Analysis

The accumulation of inbreeding within the Cleveland Bay breed has previously been reported (Walling, 1994) and has also been a topic of debate at subsequent breed conferences. Inbreeding is the mating of related individuals, which results in some loci bearing alleles that are identical by descent. This occurs because alleles from a common ancestor, appearing in both the maternal and paternal sides of a pedigree, pass though multiple offspring. The proportion of genes that are identical by descent are designated as the Inbreeding Coefficient (Wright, 1922).

The Inbreeding Coefficient is defined as the probability that an individual has two identical alleles by descent. It is calculated from the formula:

Where n is the number of animals in any pathway connecting the two parents of animal X back through a common ancestor, and where Fa is the inbreeding of the common ancestor.

Inbreeding coefficients for each individual animal were calculated using GENES, Endog, EVA and PopRep with a high consistency of results across the range of programmes.

The unavoidable mating of related animals in closed populations such as the Cleveland bay will lead to accumulation of inbreeding and loss of genetic diversity (Falconer and MacKay, 1996). This loss of heterozygosity is often expressed as an accumulation of homozygosity.

The Increase in Inbreeding or Rate of Inbreeding (∆F), was calculated for each generation, by means of the formula where Ft and Ft-1 are the average inbreeding of offspring and their parents respectively (Falconer and MacKay, 1996).

The Average Relatedness Coefficient (AR) of an individual is the probability that two alleles at a given locus belonging to two different individuals, taken at random, are identical by descent (De Braekeleer et al., 1996).. It equates to twice the mean coancestry between any individual and all of the other animals in the reference population, including the individual under consideration. The Average Relatedness of a founder is a good measure of its genetic contribution to the entire pedigree(Cervantes et al., 2008). Average Relatedness was calculated from the formula

c' = (1/n) 1'A.

Where c' represents the relatedness of the individual to every animal in the population, including itself, and A is the numerator relationship matrix size nxn (Gutierrez and Goyache, 2005).

Average Relatedness was calculated using ENDOG 4.6 (Gutierrez and Goyache, 2005).

2.6 Effective Population Size

The Effective Population Size (Ne) is the number of breeding animals in an idealized population, that would lead to the same rate of calculated or observed inbreeding (∆F), as observed in the real population(Falconer and MacKay, 1996). It is a measure of diversity within a population and can be calculated in a number of different ways., principally based on either the rate of inbreeding, or on the number of parents (Falconer and MacKay, 1996).

The Effective Population Size from the rate of inbreeding is computed using the classic equation

Where the rate of inbreeding per generation is calculated using

Ft and Ft-1 being the average inbreeding of offspring and parents respectively(Falconer and MacKay, 1996).

The Effective Population Size from the number of parents is computed as

Where Nm and Nf are the number of male and female parents respectively (Falconer and MacKay, 1996). This method makes the assumption that the ratio of breeding males to breeding females is 1:1, and that all individuals have an equal opportunity to contribute to the next generation. This is seldom the case in livestock populations and the tendency is for this method to overestimate effective population size by a considerable margin (Groeneveld et al., 2009).

Calculations of effective population size were made using both Endog 4.6 (Gutierrez and Goyache, 2005)and POPREP (Groeneveld et al., 2009).

2.7 Founder and Ancestor Representation

Studbook information was analysed in order to identify stallion and dam lines, defined respectively as unbroken descent through male or female animals only from an ancestor to a descendant(Cunningham et al., 2001). The results of this analysis were compared with those previously reported (Emmerson, 1984) & (Walling, 1994).

Detailed founder and ancestor analysis was carried out using Endog 4.6 (Gutierrez and Goyache, 2005), to determine:

The Number of Founders, where a founder is defined as an animal that has contributed by breeding to the living population, but there is no record in the studbook of its own parent animals. Calculations were made of founder contributions to each animal in the Reference Population and of the number of living descendants of each founder.

All animals with two unknown parents are regarded as founders for this analysis. In the situation where one of the parent animals is unknown this animal is also considered to be a founder. In many studbooks the analysis of the total number of founders can be misleading because many animals are recorded as sire or dam unknown, where the animal is otherwise present in the data. This is almost certainly so in the case in the recording of many dams in Cleveland Bay Horse pedigrees. In addition, some founders have been used more intensely and therefore contribute more to the current population than other founders. The effective number of founders, ƒe, has been designed to correct for this second shortcoming(Gutierrez and Goyache, 2005).

Effective Number of Founders (ƒe) defined as the number of equally contributing founders that would be expected to produce the same genetic diversity as in the population under study (Lacy, 1989). This is computed from the genetic contributions of the Nf founders:

Where qi is the genetic contribution of the ith founder to the reference and f the real number of founders(Sorensen et al., 2005).

In a scenario where every founder makes an equal contribution, the effective number of founders will equal the actual number of founders. However, it is far more usual for founders to contribute unequally, and then the effective number of founders will be smaller than the true number. The genetic contributions will have converged after 5 to 7 generations (Bijma and Woolliams, 1999). Once this convergence occurs the effective number of founders will have limited usefulness as it will remain constant irrespective of later changes in the population. Pedigrees of more than 7 generations can be characterized with a high effective number of founders even after a severe, recent bottleneck.

Whilst the effective number of founders is not a perfect measure of genetic diversity, it does form a basis for comparison of the effective population size and the effective number of ancestors which we qualify in the following paragraph. In a population with minimum inbreeding, the effective number of founders would be expected to be one-half the effective population size. Where the effective number of founders differs from this, there is evidence that the breeding structure has been changed since the founder generation (Sorensen et al., 2005).

The Effective Number of Ancestors (ƒa) supplements the effective number of founders (Boichard et al., 1997).

It is calculated from the genetic contributions of ancestors with the largest marginal genetic contributions. Whilst genetic contributions of founders are independent and sum to one, this is not the case for genetic contributions of ancestors. The dam of a highly used sire has >50% contribution of her son, because the same genes are represented in both generations. To deal with this imbalance Boichard et al. introduced the concept of marginal contributions (Boichard et al., 1997). The ancestors contributing most to the reference population are considered, each in turn, in a recursive process. With each iteration the ancestor with the highest contribution is chosen, and the contributions of all others are calculated conditional on the contribution of the chosen ancestor. The marginal contribution thus takes into account the prior genetic contribution of ancestors already included in the recursive calculations. When summed together the product of all of the marginal contributions will be 1. Ancestors will have a large marginal contribution to the reference population when their genes have passed through a large number of descendants, for example a sire from which the progeny has a large number of offspring themselves. (Sorensen et al., 2005).

The effective number of ancestors helps to account for the losses of genetic variability produced by the unbalanced use of reproductive individuals, which is the norm in domestic equines, and also takes into account bottlenecks in the pedigree.

The parameter fa is computed as

Click to view the MathML source

where qj is the marginal contribution of an ancestor j.

The ratio of the effective number of founders and the effective number of ancestors gives an indication of the significance of any bottleneck events in population development. When the two are close to equal the population will have been relatively stable and balanced in contributions. However where the effective number of founders substantially exceeds the effective number of ancestors there is a high probability that bottleneck events have played a significant part in population development(Sorensen et al., 2005).

The Effective Number of Founder Genomes (ƒg) was proposed by Lacy (1989) to account for unequal founder contributions, random loss of alleles caused by genetic drift and for bottleneck events. It is computed by the equation:

Where pi is the expected proportional genetic contribution of a founder i ; ri is the expected proportion of founder i's alleles that remain in the current population, and c is the total number of contributing founders(Lacy, 1989). This gives an indication of the number of equally contributing founders with no loss of founder alleles, that would produce the same amount of diversity as found in the reference population (Lacy, 1995). The effective number of founder genome will be smaller than both the effective number of founders and the effective number of ancestors, even under minimum inbreeding, and will also be less than half of the effective population size. The scale of these differences is indicative of the degree of random loss of alleles. Alleles will be lost with every generation of a pedigree and thus the number of founder genomes will decrease as the depth of pedigree increases (Sorensen et al., 2005).

2.8 Largest Genetic Contributions

The genetic contribution of an ancestor is the average genetic relationship of that ancestor with its descendents in a later generation (Woolliams and Thompson, 1994). The programme CFC (Sargolzaei et al., 2006) was used to assess the ancestors making the greatest contribution to the reference population.

In addition, in order to identify unbalanced contributions of male and female animals to the Cleveland Bay Studbook, the programme POPREP (Groeneveld et al., 2009) was used to identify the Dams and Sires with the most progeny in the population; the number of progeny per Dam or Sire and also the Dams and Sires with the greatest number of progeny selected (going on to produce offspring themselves).

2.9 Population Structure

Genetic structure of a population can be assessed using F- statistics(Wright, 1978). F (fixation) statistics extend the study of inbreeding coefficients in the case of sub-divided populations and consist of three parameters. FIT is inbreeding of an individual relative to the total population.FIS is the inbreeding of an individual relative to its own subpopulation. FST is the average inbreeding of the subpopulation relative to the whole population.

The three indices are obtained as:-

, and

where and are, respectively, the mean coancestry and the inbreeding coefficient for the entire metapopulation, and, the average coancestry for the subpopulation, so that

(1 - FIT ) = ( 1 - FIS)( 1 - FST) (Caballero and Toro, 2002).

All F-stats were calculated using ENDOG 4.6.

Structure between subpopulations was also calculated by use of genetic distances. Nei's minimum distance (Nei, 1987) was calculated as the genetic distance between subpopulations i and j given by the equation

Using the data obtained in matrix format from the previous calculation, the programme TREX (Makarenkov, 2001)was used to construct phylogenetic trees to illustrate the structure and relationships between subpopulations.

The analysis of subpopulations from distance matrices assumes pre-determined subgroups. In the case of the Cleveland Bay these are based on Female Ancestry lines (Emmerson, 1984).

3. Results
3.1 Data Quality and Pedigree Completeness

In October 2009 the Cleveland Bay Horse Society published the 38th Volume of its studbook. Like the centenary and millennium editions, this was a compendium of recent and all existing studbook registrations. Subsequent to the publication of Volume 38, a number of new registrations have been approved by the Council of the Cleveland Bay Horse Society, so that by February 2010, the Cleveland Bay Studbook included a total of 5757 animals, of which 2763 were male and 2552 were female. In addition 230 animals were listed in the grading register and a further 212 held overseas registrations.

Society data on death date of animals relies exclusively on notification from breeders or owners and is far from complete. Thus it is difficult to determine with absolute confidence the make-up of the living population. However, after analysis showed the average generation interval for the dataset to be 10 years the studbook data was revised to reflect a reference population from the most recent complete generation (1997 to 2006) of 402 individual animals. This reference population was also selected on the basis that it was the most recent for which 100% of its members had microsatellite DNA parentage testing data on file, thus providing the opportunity of comparative genealogical and molecular analyses. The analysis of mictosatellite data is reported in Chapter 4.
Pedigree Completeness

The pedigree file was analysed to assess the number of fully traced generations for each individual, the maximum number of generations traced and the equivalent complete generations for each animal. The maximum number of traced generations was 39.

The average pedigree completeness was assessed for each animal in the studbook, for 1 to 6 and 15 generations. Completeness assessed using POPREP for animals born between 1997 and 2006 was found to be: 1 generation deep 100%; 2 generations deep 100%; 3 generations deep 99.9%; 4 generations deep 98.6%; 5 generations deep 92.6%; 6 generations deep 83.7%.

Whilst POPREP selects only animals that go on to produce progeny that continue in production themselves for completeness calculations, Endog looks at the whole studbook, and thus the completeness index for the lower generations does not reach 100% as shown below.

The nature of the Cleveland Bay studbook is somewhat complicated by overlapping generations, and so assessment of pedigree completeness by year-group alone cannot be considered as the only appropriate analysis. Pedigree completeness by maximum traced generations was also assessed, and was found to decrease from 89% where maximum generations was 1, to 50% at 6 generations and under 10% at 17 generations.

3.2 Population Demographics

Population demographics and breeding patterns were assessed for both the whole studbook and the reference population. Of the 402 animals in the reference population, 193 animals were male and 209 female. Of the 193 males only 38 animals were registered as having been neutered but the true total is likely to be far higher. The reference population was sired by 83 unique stallions, out of 219 unique mares.

Over the three most recent breeding seasons for which complete records are available, (2005 - 2007), 153 animals are registered in the studbook, sired by 52 different stallions, out of 120 different mares. These latter s most closely represent the current actively breeding global population. In 2008 there were 88 male animals holding Society stallion licences. 40 of these licensed animals have progeny registered in the studbook.

5 shows the pattern of annual registrations in the studbook. Registrations prior to 1885 will have been retrospective, with the ones prior to around 1830 based on annecdotal evidence or family breeding records, as they will have predated the living memory of the vast majority of early Society members and breeders. The peak between 1885 and 1900 will probably reflects the enthusiasm brought about by the newly formed CBHS, which was clearly already in decline before the start of World War 1 in 1914. In the interwar years annual registrations seldom reached double s. Between 1945 and 1960 it decreases further, such that the breed goes through a genetic bottleneck that has previously been identified(Walling, 1994).

s 6 and 7 show the number of animals in active reproduction, listed by the year of their birth, with peaks corresponding to the pattern of registrations already described. 7 reveals that of the stock being produced only about 33% of females go on into reproduction themselves whereas the for males is closer to 50%. This may be masked by the practice of some breeders not registering male foals that are not intended to be kept entire, but go on to be neutered and sold on in the past as working animals or in more recent years for sport and leisure purposes. European Union legislation, brought into force in the late 1990's now reuires every equine to have passport documentation. This will have brought to a halt the practice of non registration of neutered animals, although not all of this group will continue to be registered through the CBHS (for social or political reasons), with other general registry bodies being available.

8 focuses onto the pattern of breeding since 1950 and shows the accute nature of the bottleneck, with no male progeny being produced or selected between 1952 and 1954, at a time when there were only 4 registered stallions. The number of offspring being produced shows a significant increase in the early 1970's, possibly associated with the enhanced general awareness of rare breeds brought about by the founding of the Rare Breeds Survival Trust, and the identification of the endangered status of the Cleveland Bay horse.

The Rare Breeds Survival Trust (RBST) maintains a Watch List of endangered animals, in which Category 1 Critical is the most endangered status. Three equine breeds are listed in this category which includes the Cleveland Bay Horse, alongside the Suffolk Punch and the Eriskay Pony. To be in this list a breed has to have fewer than 300 breeding females. To asses this the RBST take a “breeding female” to be a female who has produced a registered foal. To calculate the number of adult breeding females for the watch-list, the trust uses the number of female registrations averaged over the last three full years and applies a multiplier (6.67) to give an estimate of the number of adult breeding females. By adopting this system females over a certain age which may still have produced registered progeny are not discounted from the analysis.

Using the data provided by the CBHS (late in 2009 for the 2010 RBST Watch-list) the above calculation gives an average of 24 female registrations per year over 2006-2008, to which the multiplier of 6.67 is applied, resulting in an estimation of 162 breeding females.

Whilst there has been an increase in the number of breeding females since the early 1970s, with the calculated numbers approaching the 300 in the mid 80's and again in the 90's, the recent trend is for a decline in numbers. The most recent of 162 breeding females indicates that a doubling of the breeding stock would be required for the breed to be “downgraded” from its present “Critical” status.

10 shows that the mean age of reproduction for both males and females is at 5 years old. The studbook shows some animals in reproduction at 1 year of age, but this is certainly an anomaly caused by retrospective registration practices at the formation of the studbook, and an attempt to remove loops from pedigrees, with animals having recorded birth dates before their parents. Whilst some of the records of 2 year old animals in production may also be erroneous, it has certainly been a practice amongst some Cleveland Bay breeders to use young entire males to cover their mares, and this practice will no doubt be of benefit in balancing contributions of overused stallions.

From its peak at 5 years old the number of females in production decreases steadily through to 15 years. There are certainly a number of mares going on to produce progeny well into their twenties and these are seen clustered together in the graph as 16+. Similarly the pattern of males in reproduction decreases from 5 years of age but also extends beyond 16 years of age, with recent records in the studbook of one “rediscovered” stallion being successfully brough back into reproduction by natural covering at 29 years of age.

s 11 and 12 illustrate the parity of females as best as can be done, with no records of service being available. Using the number of foals registered may not show the true pattern as animals being lost before full term gestation and during or soon after birth will not be included in the data.

Family Size

The breeding pattern revaled by this analysis shows that even whilst the breed was going through a bottleneck during the 1950's and 60's some diehard breeders were successfully getting 6 or more pure bred foals from some mares. Whilst well over 200 mares have successfully bred 5 pure bred foals there are a significant number producing 10 or more, with one mare being recorded as having 15 pure bred foals.

Some of the mares included in the analysis will no doubt have been bred to non-Cleveland Bay stallions at some time during their breeding life, and these foalings will not be reflected in the tables. This practice of non pure, or “part-bred” breeding has been in existence throughout the history of the Society, with considerable demand for Cleveland Bay x Thoroughbred animals as coaching horses prior to the mechanisation of transport, and in more recent years as sport and competition horses.

13 shows the comparative analysis of number of progeny per sire. Whilst over 250 individuals are recorded as having sired only one progeny, and 125 having only two, a significant number of stallions are recorded as having over 20 progeny. A small number of stallions are recorded as having over 60 progeny , with the most prolific animal having in excess of 250 foals. This unbalanced pattern of breeding is of importance in explaining the loss of diversity in the breed and is explored in more detal later in this chapter,in the analysis of Genetic Contributions.

The influence of parent animals to the subsequent generations can only be properly shown by analysis of their progeny that go on into production, and have live offspring themselves. s 14 and 15 illustrate the number of “selected” offspring being bred by dams and sires respectively.

16 shows the overall pattern of breeding since 1900. By selecting only the post 1900 data, the erratic breeding pattern of the early Society years is ignored, and a significant increase in the number of progeny per sire in the post bottleneck period is revealed. This trend peaks again in the mid 1960's and late 70's, then begins to decrease. However the graph shows that there is still a very unbalanced breeding pattern, with individual animals, particularly males, making large contributions to subsequent generations.

3.3 Average Generation Interval

The average generation interval for each breeding year was found to range between 5.5 and 13 years, being at a minimum in the immediate post WW2 period 1946 to 1950, which coincides with the previously identified genetic bottleneck(Walling, 1994).

The average generation interval for the whole population comprises four component parent progeny pathways, and when the separate components are examined, wide variations are evident, particularly in the sire-son pathway, with a maximum of 25 years in 1904 and a of 19.9 years as recently as 2003.

s 18 and 19 illustrate the generation interval by each of the four selection pathways in both the whole and reference polulations. The average generation interval for both populations is 10 years which is commensurate with that reported for other equine breeds(Zechner et al., 2002, Hamann and Distl, 2008).

3.4 Inbreeding Analysis

Calculations show mean inbreeding across the whole population of 7.8% with an associated mean average relatedness of 8.3%.

The graph shows a near linear trend in accumulation of average inbreeding between 1885 and 1985, by which time it was approximately 20%. Since 1985 the rate of accumulation has slowed, with an average in the reference population of approximately 21%. The pattern of inbreeding in the Reference Population, over the period 1997 to 2006 is shown in Table 1 (source POPREP).


No Animals

Min Inbreeding

Max Inbreeding































































Table 1: Inbreeding Coefficients F of Reference Population by birth year

21 shows the rate of change of both the Inbreeding Coefficient and the Additive Genetic Relationship between 1901 and 2009. The rate of change of the average inbreeding coefficient, based on slope regression was 0.00214, which represents a ΔF per generation of 0.02709.The rate of change in the Average Genetic Relationship (also based on slope regression) was calculated as 0.00202 per year. This results in a Δf per generation of 0.02629. From these results the average Effective Population size for the Cleveland Bay Horse breed over the period 1901 to 2009, based on Δf was 19 and from ΔF 18 (source POPREP).

Significant difference between the inbreeding coefficient and average genetic relationship may be masked by overlapping generations, and to identify any such trend Table 2 presents the data by complete generations.

Complete Generations


















































Table2: Change in inbreeding coefficient and average relatedness for 9 fully traced generations. (Source ENDOG 4.6)

Note that with and average inbreeding of 0 for animals of no traced generations it is not possible to calculate the Effective Population size from the rate of inbreeding.

21 shows the discrepancy between Average Genetic relationship and Inbreeding across a maximum of 39 generations. The scale of the difference is indicative of the deviation from random mating and unbalanced genetic contributions.

3.5 Effective Population Size:

Effective population size was calculated by two different methods (rate of change of inbreeding and number of parents), both of which are illustrated in 22.

This illustrates the wide variation in Effective Population size brought about by the fluctuations in inbreeding at the time of the start of the studbook, during two world wars and the post war bottleneck. The Effective Population Size calculated from the number of parents appears to be more stable, with two peaks at about Ne = 100. The first occurs in about 1888, shortly after the founding of the studbook. The second occurs in more recent times, approaching a maximum of Ne = 105 in 2006 as shown in Table 3.

3.6 Founder and Ancestor representation


Gene drop analysis on the whole population using GENES (Lacy, 1998) identified 182 founders with a mean retention of 0.033. The number of founder genomes surviving was 6.015 and the number of Founder Genome Equivalents 2.219. The fraction of source diversity retained was 77.5% with a corresponding source diversity loss of 22.5%.

The proportion of ancestry known was 0.330 reflecting the fact that in early volumes of the studbook it was often the case that only a record of the sire of an individual animal was made, with its dam recorded as “unknown”.

Effective Number of Founders / Ancestors

For analysis of The Effective Number of Founders/Ancestors (Boichard et al., 1997) the base population was taken as the animals with both parents known. This was found to be 4947 animals, and was substantially smaller than the number of individuals considered in other whole population analyses. The number of ancestors contributing to this base population was 614.

For the base population, the Effective Number of Founders was found to be 77 and the Effective Number of Ancestors 24. 8 Ancestors explain 50% of the genetic make-up of the base population.

Analysis showed that only 30 Ancestors contribute to the 402 individuals in the Reference Population (1997 to 2006). In the Reference Population the Effective Number of Founders was reduced to 61 and the Effective Number of Ancestors was down to 9. 50% of the makeup of Reference Population was explained by only 3 Ancestors .

Ancestors were selected following Boichard et al. (1997), while founders were selected by their individual Average Relatedness coefficient (AR(Gutierrez and Goyache, 2005)).
Effective Number of Founder Genomes

Calculations on the base population using CFC show the founder genome equivalent (Lacy, 1989)to be 2.366..

Sire Lines

A total of 11 stallion lines were identified in the pedigree. However, as previously reported (Walling, 1994), only one paternal ancestry line is present in the living population. This line had previously been traced back and attributed to Wonderful Lad (SB No 361), but more detailed analysis, made possible by complete digitisation of the Cleveland Bay Studbook, traces this line a further five generations to the stallion Skyrocket (SB No280) born in 1805. Further research of sources outside the Cleveland Bay Studbook indicates that this animal is probably descended in the male tail line from the thoroughbred founder Byerley Turk. The male pathway for this is Byerley Turk - Jigg - Croft's Partner - Tartar - Herod - Highflyer - Skyrocket (TB) - Skyrocket (CB).

The pedigree of Highflyer is illustrated in 23 and shows that all three of the imported horses said to be founders of the thoroughbred (Byerley Turk / Darley Arabian and Godolphin Arabian) are present to some degree in the pedigree of the Cleveland Bay founder Skyrocket (Studbook No.280).

In his study Walling linked Wonderful Lad 361 to what were identified as 4 “bottleneck founders”(Walling, 1994). In fact these are not strictly founders but are more appropriately described as ancestors (Boichard et al., 1997).

25 shows the pathway from the founder Skyrocket to Walling's 4 Bottleneck Founders. Also highlighted are the two stallions, Newton 216 and Reform 653, that Emmerson has identified as “founders” of male lines 1a and 1b. There is a single male pathway back to Skyrocket from the common sire of these two animals (Sportsman 299).

Dam Lines

Analysis of the female members of the studbook identified a total of 17dam lines (Table 5). As previously reported (Emmerson, 1984, Walling, 1994) only nine maternal ancestry lines are present in the reference population. Three of these lines (2, 4 & 9) are only represented in the living population, in direct female descent, by one or two individual animals.

The three most common of the maternal lines (1, 5 & 6) make up 70% of the present female population.

Interestingly, analysis of the relative contributions of the most influential maternal ancestry lines to the genome of the reference population reveals that some of the lines least well represented in direct descent actually continue to make a substantial genetic contribution as shown in Table 6.

Table 5: Female Tail Lines / Founders

Maternal Line

Contribution to Reference Population

TWO - Depper 39


ONE - Stainthorpes Star


SIX - Trimmer 268


FIVE - Depper 42


THREE - Dais(y) 318


FOUR - Marvellous 72


EIGHT - Church House Queenie


Table 6: Contribution of Female Founders to Reference Population

Of the 2793 female registrations some 1271 animals fall outside the 17 female lines. This can be explained by the fact that in the early days of the stud book many horses only had their sires recorded. In comparison of the 2831 registered males, only 176 do not belong to one of the eleven male lines.

Genetic Contributions

The relative genetic contributions of each of the female tail lines to the whole and reference populations are shown in Table 7.

Maternal Line


Whole Pop


Evol Rate Of Whole Pop


Rep Pop


Evol Rate Of

Ref Pop







































































Table 7: Relative contributions of maternal ancestry lines to the evolution of the whole and reference (1997- 2006) populations.

Note that the population sizes reported here include both male and female representatives of the Maternal Lines, so N is higher than shown in table 4 where only female members are shown. The reason behind this is that when considering a female tail line some of the genetic material (in particular mitochondrial DNA) can only be passed down the female to female pathway. Thus whilst a son will carry its dam's mitochondrial DNA it will not pass it on to its own progeny.

s 26 and 27 show the 30 sires with most progeny and most selected progeny (those that go on to reproduce themselves) in the whole population. Whether the stallion Prince George M235 actually produced 285 progeny is debateable. As he was a fashionable stallion at the time his progeny would have been highly marketable and sold for a premium. Whilst not questioning the integrity of breeders of the day, it is not improbable that some animals were registered as by him that were otherwise bred and without the modern day parentage testing and registration requirements the truth may never be revealed.

However, there is more certainty about the prolific use of certain modern day stallions, particularly Forest Superman M1925 and Storth House Temptation M2054. In the full version of his dissertation Walling comments on the problems of overuse of stallions like Storth House Temptation. This possibly sensitive observation was omitted when reported in the Cleveland Bay Horse Society Studbook Vol 33.(Walling, 1994), the stallion in question still being alive and in use at the time.

28 illustrates the 30 most prolific dams in the whole population, whilst 29 shows them ranked by the number of progeny that go on into reproduction. Interestingly both Depper 42 and Dais 318 feature in this list. They are the founding mares of line 5 and 3 respectively.


Table 8 details the 10 founders making the greatest contribution to the reference population as evaluated by Endog. In making these calculations the base Population (one or more parents unknown) was assessed at 1117 animals, and the Actual Base Population (one unknown parent = half founder) as 655. The effective population size of founders was assessed to be 101.00.


Studbook Number


Year of birth





Festivity FSB






Bilsdale Violet














Cleveland Champion





















Park Polly







Barley Harvest





















Table 8: The 10 founders contributing the most to the reference population.

V1 = Contributions of genes of founders to the average inbreeding coefficient

V2 = Contributions of genes of founders to the average coancestry

Ancestors were selected following Boichard et al. (1997).

Table 10 sets out the male ancestral contributions to the reference population assessed using CFC(Sargolzaei et al., 2006).

Of the four “modern day founders” identified by Walling only three appear as ancestors in the table. The most significant of these is Lord Fairfax M1875, who makes the greatest contribution of just below 22%. Also appearing in the rankings are Apollo M1857 at 11.23% and Cholderton Druid M1859 ranking 31st with a contribution of 6.50%.

Interestingly Walling's other “modern day founder” Cholderton Cockade M 1858 does not appear in this table. However, also of note is that a number of pre-bottleneck stallions make a significant contribution to the reference population, including Morning Star (17.03%); Aislaby Lad (16.20%) and Cholderton Ryecroft (15.35%).

Table 9: Male ancestral contributions to the reference population

Table 10: Female ancestral contributions to the reference population

Table 10 sets out the corresponding largest female contributions. Of note is the fact that a number of these ancestors fall outside the 17 identified female ancestry lines including Star of Hope and her daughter Star of the Sea, Woodland Starlight and the mare Beauty who sits much earlier in the studbook. These proportional contributions are shown by ancestry line in 30. Note that the sum of genetic contributions of ancestors can be >1 as there may be sharing of genes with other ancestors.

3.7 Population Structure

Within population genetic differentiation was assessed using F-statistics (Wright, 1931, Weir and Cockerham, 1984) computed from the pedigree subdivided by female ancestry line. Fst reached 0.039427; Fis = 0.003891 ; and Fit = 0.035689. This indicates a significant differentiation of the population at this level and a departure from Hardy-Weinberg equilibrium.

In order to visualise this differentiation and the possible relationship between the different lines two different paired distance matrices were constructed. The first was based on Fis-Fst relationships and the second standard distance(Nei, 1973).

This data was input into the web based version of the programme T-Rex (Makarenkov, 2001) to produce visual phylogetetic trees ( s 31 to 34).

A variety of tree drawing methods were implemented, with consistency between methods regarding both the number of sub populations at the root of the tree and the identity of those populations. The diagrams show that lines 3, 1 and 5 consistently appear at the root of three clusters or clades in each of the trees.


The overall results presented here highlight the significant losses of founder representation that have occurred in the Cleveland Bay Horse population. Approximately 91% of the stallion and 48% of the dam lines are lost in the reference population. The unbalanced representation of the founders is illustrated by the effective number of founder animals (fe) and the effective number of ancestors (fa). The parameter fe constitutes over a third of the equivalent number of founder animals for the RP, whilst the ratio fa/fe is 22.5%. This ratio is substantially lower than that reported in other horse breeds such as 41.7% in the Andalusian (Valera et al., 2005), 54.4% in the Lipizzan (Zechner et al., 2002) and 38.2% for the endangered Catalonian donkey (Gutierrez et al., 2005)

The average inbreeding computed for the Cleveland Bay Horse at 20.64%in the RP is substantially higher than most of the values reported in the literature (Valera et al., 2005), with typical values ranging from 6.5% to 12.5%. Although most of these inbreeding values have been computed in breeds with deep pedigrees such as Andalusian, Lipizzan or Thoroughbred there are significant differences in population sizes, and the accumulation of inbreeding in populations of restricted size will occur at a greater rate.

The smaller the number of individuals in a randomly mating breed the greater will be the accumulation of inbreeding because of the restricted choice of mates. The Cleveland Bay horse is therefore predisposed to inbreeding and associated loss of genetic variation. In the reference population of 402 individuals the Effective Population Size (Ne) computed via individual increase in inbreeding was 27.84.

The Effective Population Size (Ne) computed via regression on equivalent generations was 26.29. Inbreeding and genetic loss under random mating will occur at 1/2Ne per generation. Thus in the Reference Population where Mean Ne is 32.32, under random mating we can expect inbreeding to accumulate at 1.5% per generation. It is generally accepted that livestock breeds should be managed to keep the accumulation of inbreeding below 1% per generation.

The results obtained in this investigation are a clear indication of the need for proactive breed management in order to maintain founder representation and maximise genetic diversity. This is reflected by the genealogical FIS values. This parameter characterises the mating policy meaning the departure from random mating as a deviation from Hardy-Weinberg proportions. Positive FIS values mean that the average F value within a population exceeds the between individuals coancestry, thus indicating that matings between relatives have taken place(Caballero and Toro, 2000, Gutierrez et al., 2005) Moreover, the average AR values computed for nine complete generation shown in Table2 are approximately equal to the value of F. In an ideal scenario with random matings and no population subdivision, AR would be approximately twice the F value of the next generation.(Goyache et al., 2003) However until the year 2002, F has continued to increase in the Cleveland Bay Horse population.

Departure from random mating will have been influenced by a number of factors common to restricted populations of domesticated equines. These include selection by breeders for particular lines of descent; natural differences in fertility between individuals; a restricted number of male animals leaving significantly more offspring than females and geographic distribution of animals and breeders making some matings far easier than others. This is indicative of the typical practice of the larger studs, where breeding tends to be done in pasture by live cover, with stallions running with the mares. Often only one stallion is used per year per herd and the same stallion may be retained for several breeding years. This practice is compounded by breeders with only a small number of breeding females sending their animals to run with these herds or to be covered in hand by the same stallion.

This pattern of stallion use has different implications for the genetic diversity of the Cleveland Bay Horse compared to the alternative of mares travelling to stud to be covered in hand by a wider range of stallions, that do not have their own herds of mares(Luis et al., 2007). Although this latter practice has clear benefits in conservation programmes, there is the danger of inappropriate matings bringing the more common and less frequent alleles together. Whilst such matings increase the frequency of the rarer alleles they simultaneously increase the frequency of the more common ones(Lacy, 2000), highlighting the need for in depth understanding of the genetic diversity of any rare breed and for an effective management plan for its maintenance.

What is evident from this analysis is that inbreeding has accumulated at an undesirable rate in the Cleveland Bay breed. In addition there has been a significant loss of founder representation in the living population. This loss of diversity is reflected in the low effective population size for the breed. Taken in combination this provides substantial evidence that the uncoordinated random breeding practice that has taken place in the past is not the best strategy for the preservation or genetic health of the breed.

In the next chapter we shall examine what can be understood about the genetic diversity of the breed at molecular level, and learn whether the use of microsatellite DNA analysis can shed further light on within-breed population structure.


BERG, P., SORENSEN, M. K. & NIELSEN, J. 2007. EVA Interface Program. University of Aarhus.

BIJMA, P. & WOOLLIAMS, J. A. 1999. Prediction of Genetic Contributions and Generation Intervals in Populations With Overlapping Generations Under Selection. Genetics, 151, 1197-1210.

BOICHARD, D., MAIGNEL, L. & VERRIER, E. 1997. The value of using probabilities of gene origin to measure genetic variability in a population. Genet. Sel. Evol. , 29, 5-23.

CABALLERO, A. & TORO, M. A. 2000. Interrelations between effective population size and other pedigree tools for the management of conserved populations. Genet Res, 75, 331-43.

CABALLERO, A. & TORO, M. A. 2002. Analysis of genetic diversity for the management of conserved subdivided populations. Conservation Genetics, V3, 289.

CERVANTES, I., MOLINA, A., GOYACHE, F., GUTIERREZ, J. P. & VALERA, M. 2008. Population history and genetic variability in the Spanish Arab Horse assessed via pedigree analysis. Livestock Science, 113, 24-33.

CUNNINGHAM, E. P., DOOLEY, J. J., SPLAN, R. K. & BRADLEY, D. G. 2001. Microsatellite diversity, pedigree relatedness and the contributions of founder lineages to thoroughbred horses. Animal Genetics, 32, 360-364.

DE BRAEKELEER, M., DAIGNEAULT, J., ALLARD, C., SIMARD, F. & AUBIN, G. 1996. Genealogy and geographical distribution of CFTR mutations in Saguenay Lac-Saint-Jean (Quebec, Canada). Annals of Human Biology, 23, 345.

DUCRO, B., BOVENHUIS, H., NEUTEBOOM, M. & HELLINGA, I. 2006. Genetic diversity in the Dutch Friesian horse. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production: 13-18 August 2006.

EMMERSON, S. 1984. Cleveland Bay Horse Society Centenary Studbook, Cleveland Bay Horse Society.

FALCONER, D. S. & MACKAY, T. F. C. 1996. Introduction to Quantitative Genetics, Essex, England., Longman.

GLAZEWSKA, I. & JEZIERSKI, T. 2004. Pedigree analysis of Polish Arabian horses based on founder contributions. Livestock Production Science, 90, 293.

GOYACHE, F., GUTIERREZ, J. P., FERNANDEZ, I., GOMEZ, E., ALVAREZ, I., DIEZ, J. & ROYO, L. J. 2003. Using pedigree information to monitor genetic variability of endangered populations: the Xalda sheep breed of Asturias as an example. Journal of Animal Breeding and Genetics, 120, 95-105.

GROENEVELD, E., WESTHUIZEN, B., MAIWASHE, A., VOORDEWIND, F. & FERRAZ, J. 2009. POPREP: a generic report for population management. Genet. Mol. Res., 8, 1158-1178.

GUTIERREZ, J. P. & GOYACHE, F. 2005. A note on ENDOG: a computer program for analysing pedigree information. Journal of Animal Breeding and Genetics, 122, 172-176.

GUTIERREZ, J. P., MARMI, J., GOYACHE, F. & JORDANA, J. 2005. Pedigree information reveals moderate to high levels of inbreeding and a weak population structure in the endangered Catalonian donkey breed. Journal of Animal Breeding and Genetics, 122, 378-386.

HAMANN, H. & DISTL, O. 2008. Genetic variability in Hanoverian warmblood horses using pedigree analysis. J. Anim Sci., jas.2007-0382.

LACY, R. 1989. Analysis of founder representation in pedigrees: Founder equivalents and founder genome equivalents. Zoo Biol, 8, 111.

LACY, R., C. 1995. Clarification of genetic terms and their use in the management of captive populations. Zoo Biology, 14, 565-577.

LACY, R. C. 1998. GENES version 1.1.8.

LACY, R. C. 2000. Management of limited animal populations. Report. AZA Marine Mammal Taxon Advisory Group , Silver Spring, MD, Pages 75-93.

LUIS, C., COTHRAN, E. G. & OOM, M. D. M. 2007. Inbreeding and Genetic Structure in the Endangered Sorraia Horse Breed: Implications for its Conservation and Management. J Hered, esm009.

MACCLUER, J. W., BOYCE, A. J., DYKE, B., WEITKAMP, L. R., PFENNING, D. W. & PARSONS, C. J. 1983. Inbreeding and pedigree structure in Standardbred horses. J Hered, 74, 394-399.

MAKARENKOV, V. 2001. T-Rex: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics, 17, 664-668.

MALÈCOT, G. 1948. Les Mathématiques de l'Hérédité, Paris, France, Masson et Cie.

NEI, M. 1973. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci USA, 70, 3321 - 3323.

NEI, M. 1987. Molecular Evolutionary Genetics, New York, Columbia University Press.

OLIEHOEK, P. & BIJMA, P. 2009. Effects of pedigree errors on the efficiency of conservation decisions. Genet Sel Evol, 41, 9.

SARGOLZAEI, M., IWAISAKI, H. & COLLEAU, J. J. 2006. CFC (Contribution, Inbreeding (F), Coancestry).

SORENSEN, A. C., SORENSEN, M. K. & BERG, P. 2005. Inbreeding in Danish dairy cattle breeds. J Dairy Sci, 88, 1865-72.

VALERA, M., MOLINA, A., GUTIERREZ, J. P., GOMEZ, J. & GOYACHE, F. 2005. Pedigree analysis in the Andalusian horse: population structure, genetic variability and influence of the Carthusian strain. Livestock Production Science, 95, 57.

WALLING, G. 1994. Cleveland Bay Horse Society Studbook Vol XXXIII, Cleveland Bay Horse Society.

WEIR, B. S. & COCKERHAM, C. C. 1984. Estimating f-statistics for the analysis of population structure. . Evolution, 38, 1358-1370.

WOOLLIAMS, J. A. & THOMPSON, R. Year. A theory of genetic contributions. In: SMITH, C., GAVORA, J. S., BENKEL, B., CHESNAIS, J., FAIRFULL, W., GIBSON, J. P., KENNEDY, B. W. & BURNSIDE, E. B., eds. 5th World Congress on Genetics Applied ot Livestock Production, 12th August, 1994. 1994 Guelph, Canada. Guelph, Ontario: Organising Committee, 127 134.

WRIGHT, S. 1922. Coefficients of Inbreeding and Relationship. The American Naturalist, 56, 330-338.


WRIGHT, S. 1978. Evolution and the genetics of populations:. Variability within and among natural populations. Chicago. USA: University of Chicago Press.

ZECHNER, P., SOLKNER, J., BODO, I., DRUML, T., BAUMUNG, R., ACHMANN, R., MARTI, E., HABE, F. & BREM, G. 2002. Analysis of diversity and population structure in the Lipizzan horse breed based on pedigree information. Livestock Production Science, 77, 137.