The importance of historically popular sires on the accuracy of genomic predictions of young animals in the US Holstein population

Graphical Abstract Summary: The US dairy industry is known for its extensive use of reproductive biotechnology, which has allowed a small number of bulls to have a disproportionately large genetic contribution to the population. The objectives of this study were to investigate whether historically extensively used sires still contribute to the accuracy of genomic predictions of young animals. Results show that old sires still capture some of the genetic variation of recent populations, but contribute less to the accuracy of prediction compared with more recently used sires. The sires that were identified as genetically more important (Shottle, Goldwyn, and Planet) did not have the highest number of registered progeny.


Graphical Abstract Summary
The US dairy industry is known for its extensive use of reproductive biotechnology, which has allowed a small number of bulls to have a disproportionately large genetic contribution to the population. The objectives of this study were to investigate whether historically extensively used sires still contribute to the accuracy of genomic predictions of young animals. Results show that old sires still capture some of the genetic variation of recent populations, but contribute less to the accuracy of prediction compared with more recently used sires. The sires that were identified as genetically more important (Shottle, Goldwyn, and Planet) did not have the highest number of registered progeny. T he dairy industry is known for its wide use of a relatively small number of bulls. Yue et al. (2015) found that all AI bulls in the United States in 2015 were descendant from only 2 bulls born in 1880. These lineages were mostly distributed in the population through 3 extensively used bulls, namely Pawnlee Farm Arlinda Chief (Chief), Round Oak Rag Apple Elevation (Elevation), and Penstate Ivanhoe Star (Ivanhoe Star) (Yue et al., 2015). The results of their study were based on tracing the Y-chromosome, which is a small fraction of the genome. However, evaluating autosomal chromosomes would include variation from both male and female sources. For many years, the Holstein sire summaries have reported that Elevation and Chief have had the highest proportion of genes in common with the available proven bulls when based on pedigree relationships. Historically popular sires were the best animals at the time. However, these previously superior sires are expected to compare unfavorably with modern animals based on net merit and production traits (De Vries, 2017), since remarkable genetic progress has been made over generations of selection. A recent study aimed to re-incorporate lost genetics from older bulls to improve both diversity and fitness. Progeny resulting from these old bulls showed average or better performance for daughter pregnancy rate, but below average for production traits (Dechow et al., 2020). This is evidence that many decades of strong selection created a modern animal that is considerably different compared with historic populations, both phenotypically and genotypically. Considering this difference, the relative importance of old bulls could be lower than modern animals. Although the above-mentioned sires may be related to almost all modern animals, they are not necessarily strongly related. Parent contributions halve in every generation, which makes a specific parent irrelevant to descendants decades later. However, in the dairy industry, popular sires breed sons that also become popular sires. Additionally, they are used over long periods of time instead of simply one generation. This may increase their genetic representation in later generations.
The breeding values of animals can be conditioned on the breeding values of a subset of animals based on the relationships between this subset and all remaining animals using a recursive formula (Misztal et al., 2014a). When the subset size is large enough (~15K in US Holstein), the prediction accuracy is similar to using all animals, regardless of the choice of animals (Fragomeni et al., 2015). When the subset is small, differences in accuracy are indicative of the relevance of the subset. The recursion coefficients indicate the impact of specific bulls. The objective of this study was to determine whether old, influential sires have greater genetic contributions to the prediction of genomic breeding values of young animals compared with popular sires with fewer progeny or sires born more recently.

The importance of historically popular sires on the accuracy of genomic predictions of young animals in the US Holstein population
Yvette Steyn, 1 * Thomas J. Lawlor, 2 Daniela Lourenco, 1 and Ignacy Misztal 1 Institutional animal care and use or equivalent approval was not required since data were already recorded by the industry. The data were obtained from the Council on Dairy Cattle Breeding (CDCB) and contained information up to 2014. It included 9,817,252 animals in the pedigree, of which 330,837 were sires and 5,471,039 were dams. Phenotypes started in 1983 and included 6,550,442 animals with 10,067,745 measures for stature and 10,067,730 for fore udder attachment (FUA). These traits were used because they have shown to be type traits with substantial genetic change over time . There were 569,404 genotypes with 58,990 SNP markers. This included 121,634 genotyped males, of which 21,824 were sires and 15,908 were sires with phenotyped progeny. The genotyped sires included 8,730 animals with more than 50 progeny, with the maximum number of progeny being 58,266 from Marshfield Elevation Tony (Mars).
The traditional BLUP evaluation depends on the recursion equation In this study, the genomic relationship matrix (G) was obtained using all genotyped animals and the formula where M is a matrix of SNP content centered by twice the current allele frequencies, and p i is the allele frequency for SNP i ( Van-Raden, 2008). To reduce bias due to the different genetic level of genotyped and nongenotyped animals, G was tuned to be compatible with the pedigree relationship matrix between genotyped and nongenotyped animals (A22) using the method described by Chen et al. (2011). To avoid singularity, 5% of A22 was combined with 95% of G.
The inverse of the resulting genomic relationship matrix can be approximated by using a subset as the core animals in the algorithm for proven and young (APY) (Misztal et al., 2014a). Five different subsets (cores) of 100 sires were used based on their birth year and the total number of registered progeny as of 2014. These different subsets were the most used sires overall (top 100) regard-less of when the sire was born, and the most used sires born within different time periods : before 1981, 1981 to 1990, 1991 to 2000, and 2001 to 2010. To allow comparison of the impact of popular sires over all time periods, all 400 were combined in a subset. The relative impact of an animal was calculated as where n is the number of animals in the subset, i is the individual sire, m is the number of validation animals, j is the validation animal, and p ij is the recursion coefficient corresponding to the 400 animals in the combined subset and the validation population. Stature and FUA were analyzed in a 2-trait model as described by Tsuruta et al. (2021). Single-step GBLUP (ssGBLUP) was used to accommodate genotyped and nongenotyped animals with the BLUPf90iod2 package within the blupf90 software suite (Misztal et al. 2014b). The pedigree depth was 30 generations, starting from the phenotyped animals. A traditional pedigree-based BLUP was used to obtain adjusted phenotypes (y adj ) using the PREDICTF90 software package (Misztal et al., 2014b). These are phenotypes adjusted for all effects in the model other than the animal effect. Validation animals were genotyped animals born after 2010 with only one measurement, which totaled 10,153 animals. The empirical prediction accuracy was measured as the Pearson correlation between the genomic breeding value and y adj divided by the square root of the heritability (0.45 for stature and 0.18 for FUA). Figure 1 shows the accuracies obtained when only the 100 most used sires in different time periods were used as a subset. Although using the overall 100 most popular sires also delivers high accuracies, it is still lower than using only sires born from 2001 to 2010. The accuracies obtained when using sires born before 1981 are the lowest (0.54 for stature and 0.47 for FUA). This is still an impressive accuracy reached when only 100 animals are used for the conditioning of the inverse of the genomic relationship matrix of the remaining population. This indicates that these old sires still capture independent chromosome segments (ICS) of the current population. The highest accuracies were reached using the 100 sires from the period 2001 to 2010 (0.69 for stature and 0.61 for FUA). This is approximately 30% more than using the older sires. Thus, the ICS are better captured by young sires. It has been shown that the accuracy of genomic breeding values is influenced by the generational distance between the reference and validation population (Hidalgo et al., 2021;Hollifield et al., 2021). Although the old sires may be related to the majority of the current population, they are not highly related. Many generations of recombination have occurred between old sires and the recent population. Consequently, they may share smaller haplotypes with the validation population and different epistatic effects. Allele substitution effects can differ between the modern population and ancestral population due to other nonadditive effects and different allele frequencies (Legarra et al., 2021). In contrast, young animals are closely related to the validation population and share larger haplotypes. Figure 2 shows boxplots of the relative impact of sires within each 100-animal subset when the recursion coefficients are based on all 400 animals combined, and the validation population. This . The total number of progeny of an animal contributes to its genetic impact on the population, but is not the best indicator of true genetic impact. Many highly popular sires were used in both registered and commercial herds as a sire of dams, while specific sires were used more frequently as a sire of sires. Mars is the bull with the most progeny in the overall population, yet he does not appear among the top 4 most important sires in his time period. Madawaska Aerostar (Aerostar) is the second most important sire within the time period 1981 to 1990, even though he only had 9,805 progeny-considerably fewer than the 42,291 progeny of the most important in that time period (To-mar Blackstar). However, Aerostar is the maternal grandsire of the 2 most important sires overall: Shottle and Goldwyn.
Elevation, Chief, and Ivanhoe Star are responsible for the limited number of Y-chromosome lineages in North American AI sires (Yue et al., 2015), yet only Elevation is among the top 4 most important sires before 1981. Elevation is 1.54 times more important than the average of the 400 selected sires, which is far lower than Shottle, which is 4.38 times more important. Although Chief and Ivanhoe Star do not appear among the top 4, Chief is the sire of 2 of the top 4 (Mark and Valiant), while Mark is the maternal grandsire of Durham-the fourth most important sire born within the period 1991 to 2000. Ivanhoe Star is the sire of Bell (the second most important before 1981). He has sired Elton, which is the sire of Durham and the maternal grandsire of O Man.
O Man was commonly used to improve health and fitness qualities. Goldwyn is known as a bull with outstanding conformation qualities, while Planet has been popular for his production merit. The Holstein Association USA regularly releases a sire summary, which also lists the bulls with the highest average percentage of genes in common with the available proven bulls. In December 2014, Shottle ranked third, O man twelfth, Goldwyn fifteenth, and Planet seventeenth. By August 2022 when the young sires had the opportunity to produce more proven bulls, O man ranked third, Planet fifth, and Shottle sixth, while Goldwyn was outside the top 20. The top 2 bulls were Elevation and Chief.
Since younger, popular sires capture more of the genetic variability in the recent population based on our study, it may not be necessary to still include historical records in genetic evaluations when many genotypes are available. It has been shown that truncating pedigree information beyond 3 generations does not decrease the accuracy of prediction for recent animals (Lourenco et al., 2014). Generational distance between training and target populations has also shown to lead to a decay in accuracy of genomic predictions in poultry and pig populations (Hidalgo et al., 2021;Hollifield et al., 2021). Clear visual differences can be observed when comparing modern animals of a breed to those alive decades ago when strong selection has been applied. A comparison of productivity of the Holstein population in 2007 and 2017 showed that the same unit of energy-corrected milk could be produced using 74.8% of the cattle, 82.7% of the feedstuffs, 79.2% of the land, and 69.5% of the water (Capper and Cady, 2020). When dairy animals in 2007 were compared with dairy in 1944, 21% of animals, 23% of feedstuffs, 35% of the water, and only 10% of the land were required to produce 1 billion kilograms of milk (Capper et al.,262 Steyn et al. | Importance of old US Holstein sires  2009). This remarkable progress is not due to management and nutrition alone. Genetic trends in the US Holstein have shown that production traits have improved favorably while fitness traits have deteriorated (De Vries, 2017). Changes in allele frequencies of the Holstein population in the 1960s have also been observed compared with the population in the 2010s and attributed to ge-netic selection (Ma et al., 2019). Considering these differences, information of old animals is irrelevant. In practice, most available information is included in genetic evaluations (with or without genomic information), whether they are relevant for prediction of current selection candidates or not. The main purpose is to present genetic trends to quantify and visualize genetic change over time.
263 Steyn et al. | Importance of old US Holstein sires Figure 2. The relative importance of the 100 sires within each time period on the accuracy of genomic prediction of the validation population. The line above the box is the first quartile of the data (up to 25th percentile), the upper part of the box is the second quartile (25th to 50th percentile), the line in the box is the median, the lower part of the box is the third quartile (51 to 75 percentile), and the line below the box is the fourth quartile (76 to 100th percentile). The dots indicate outliers that are outside 1.5 times the interquartile range from the 25th to 75th percentile, starting from the edge of the box. A compromise between disregarding old information and using all available information would be to remove only their genotypes or reduce the time period for genetic trends (or both).
The dairy industry has a unique breeding structure where single bulls have many offspring that, in turn, may have many offspring. The sire with the most progeny in this data set (Mars), had fewer than 60,000 direct progeny. However, popular dairy sires are known to have hundreds of thousands of progeny. The bulk of the sires' offspring are in herds that do not participate in US animal improvement programs, such as US commercial herds and foreign herds.
The "Millionaires Club" refers to sires that have produced over 1 million semen doses. One member is Jenny-Lou Mrshl Toystory (Toystory), born in 2001. At the time of his death in 2015, he had produced more than 2 million semen doses that were used in more than 20 countries. Despite his commercial popularity, he only had 14,726 registered progeny in the United States as of 2014. Based on our results, Toystory's relative impact ranks at 90 among the 100 sires with the most registered progeny. Currently, Toystory has more than 162K total daughters with milk production records in over 27 countries, of which ~45K are recorded in the United States (CDCB, 2022). Due to the lack of information from these commercial and foreign herds, the true impact of specific sires toward the gene pool of the whole US population cannot be determined.
This study was limited to conformation traits, which are generally less commonly measured than production. As of 2022, O Man had a total of 61K daughters measured for production in the United States but only 15K for type traits (CDCB, 2022). Therefore, the effect of sires on the accuracy of prediction may be different for other traits. However, the relative importance of sires to capture the ICS of the population will be the same for all traits since genetic relationships are trait-independent. Additionally, this study used only information from registered US animals, while bulls often have many (if not most) daughters measured abroad. For example, as of 2022, O Man had a total of 103K daughters measured for production worldwide, of which 61K were in the United States. Goldwyn had 9K US daughters measured for production in the United States, but a total of 90K worldwide. Shottle had 19K in the United States but 147K worldwide (CDCB, 2022). Therefore, the impact of these bulls go beyond the US border and their relative importance may differ in other countries.
In conclusion, historically popular bulls still capture some of the ICS in the recent population, but have a relatively low impact on the accuracy of predictions compared with younger sires. Consequently, including their information in genetic evaluations may not be necessary. The total number of progeny a sire has is not the only determinant of importance in the population. Specific bulls can have a greater impact when they are used as a sire of sires. The true impact of sires on the US Holstein population cannot be determined due to the lack of participation of commercial herds.