If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
There is now a plethora of biological “omics” and high-throughput new measurements.
Total variability of the trait into omics-mediated and heritable components.
From the theory, reliabilities can be derived for ideal cases.
For selection purposes, it is better to have heritable omics than high explanatory ones.
Gene expression is supposed to be an intermediate between DNA and the phenotype, and it can be measured. Thus, for a trait, we may have intermediate measures, which are in fact a series of genetically controlled traits. Similarly, several traits may be measured or predicted using infrared spectra, accelerometers, and similar high-throughput measures that we will call “omics.” Although these measurements have errors, many of them are heritable, and they may be more accurate or easier to record than the trait of interest. It is therefore important to develop methods to use intermediate measurements in selection. Here, we present methods and perspectives for selection based on massively recorded intermediate traits (omics). Recent developments allow a hierarchical integrated framework for prediction, in which a trait is partially controlled by omics. In addition, the omics measures are themselves partly controlled by genetics (“mediated breeding values”) and partly by environment or residual factors. Thus, a part of the genetic determinism of a trait is mediated by omics, whereas the remaining part is not mediated, which results in “residual breeding values.” In such a framework, genetic evaluations consist of 2 nested genomic BLUP-based models. In the first, the effect of omics on the trait (which can be seen as an improved estimate of the phenotype) and the residual breeding values are estimated. The second model extracts the mediated breeding values from the improved estimate of the phenotype, considering that omics themselves are heritable. The whole procedure is called GOBLUP (genomics omics BLUP) and it allows measures in only some individuals; that is, it is a “single-step”-like method. In this model, heritability is split into “mediated” and “not mediated” parts. This decomposition allows us to predict how accurate the omics measure of the trait would be compared with the direct measure. The ideal omics measure is heritable and explains a large part of the phenotypic variation of the trait. Ideally, this could be the case for some traits with low heritability. However, even if the omics measure explains only a small part of the phenotypic variation, when omics measurement themselves are heritable, the use of such a model would lead to more accurate selection. Expressions for upper bounds of reliability given omics measurements are also presented. More studies are needed to confirm the usefulness of omics or high-throughput prediction. Usefulness of the technology likely needs to be checked on a case-by-case basis.
Before the genomic selection era, collecting phenotypes was an arduous experience, and adding new traits to the breeding objective implied a cost-benefit consideration unless those traits were recorded for management purposes (
). The breeder had to live with “cheap” recordings (that required a fair amount of organization and coordination) and highly expensive (for the time being) computing procedures. Breeding objectives considered just a few traits (
). Second, the genomic revolution implied that high-throughput measurements could be dealt with by animal breeders through a mix of information flow (of genotypes), standardization (of DNA chips), computational power, and new or improved methods such as genomic (G)BLUP and single-step (ss)GBLUP (
). In addition, recent developments (machine learning in particular) have opened the door to predict, in principle, almost anything from almost anything, which has prompted scholars to use more and more data. In the following, we use the word “omics” but we mean any complex set of measurements that could be seen as close to the biology of the trait of interest.
Assume then than we can do excellent work of predicting traits from a myriad of closer or indirect omics measures, whether these are gene transcripts, operational taxonomic unit counts in rumen or feces, accelerometer data, or milk spectra. How can these be converted into something usable for selection? A phenotype per se cannot directly be used to select animals. A fundamental principle in genetics (that, in our view, is sometimes disregarded) is that an animal transmits half its genotype to its offspring. This is the reason why natural and artificial selection act additively. Anything that is not contained in the DNA or cytoplasm of the female is not transmitted. For instance, the transcriptome may explain a large portion of a given phenotype such as growth. However, this transcriptome may be affected by environmental factors (e.g., food and management), which are not transmitted. Second, only a random gamete, half of the genotype, is transmitted—all dominant or epistatic combinations are lost, and many possible gametes exist. Thus, the breeding value (BV) or PTA is, literally, an expectation on random events (meiosis, mates, environments).
Hence, in addition to being able to predict phenotypes from omics, we also need a theory to use omics in genetic improvement of livestock. Recent developments (
) led to prediction of BV (not of phenotypes) using intermediate data, and these developments also clarified relationships between heritabilities and the variance ratio explained by omics (e.g., “microbiabilities”). Before these publications, these relationships were not well understood. In addition to helping our understanding, a theory, even if not perfect, sets the stage for a priori plans for using omics in selection schemes from a few basic parameters.
In this work, we will (1) present a sketch of the theory and how it can be used for BV prediction, (2) discuss the circumstances in which the use of omics is advantageous with respect to current prediction based on phenotypes, (3) present some illustrative examples of omics use in plant and animal breeding, and (4) present some thoughts on selection schemes that use omics features. This review does not contain any studies with human or animal subjects and did not require animal care approval.
, and we stick to its notation as much as possible. We use a linear model, which assumes that measurable observed covariates (belonging to a herd; temperature; omics; genotype at the marker, and so on) have measurable effects on a trait of interest. Whether these effects are “real” or “surrogates” of real effects (e.g., herd is a surrogate for farmer; SNP is a surrogate for QTL) is a question that we will not address here, where we will assume that effects are reasonably stable with respect to time (across a few generations) and space (from, say, Maryland farms to Georgia farms). This allows us to consider in the same framework “truly” biological effects (e.g., transcriptome) and surrogates of biology (e.g., infrared spectra).
A trait is classically decomposed as
where a is an overall BV and
is a residual (the part unexplained by genetics). Alternatively, if we knew all omics (m) that define the outcome of a trait (y), a basic model for individual i is
where mi contains (all relevant) omics measures for individual i and α contains their effects; we say that trait y is “mediated” by omics m. In addition, εi is a residual, the part unexplained by omics. Note that εi is different from
as the 2 models are different. In any case, we cannot measure all relevant omics measures (e.g., some may happen during embryo development). Thus, we postulate a model in which the part unexplained by omics has some genetic determinism not mediated by omics, ar (where r indicates “residual”), leading to
From this, we define a single omics value
(which is not a BV). In addition, omics measures are not transmitted to offspring; only genes controlling m are transmitted to offspring. Thus, omics (mi) themselves need a decomposition into a genetic and a residual part, which leads to a another step in the hierarchy of models:
The contribution of the BV gi,j of omics j to the phenotype is gi,jαj, whereas the contribution of the residual ei,j of omics j to the phenotype is ei,jαj. Thus, we can define an “omics-mediated” BV, am(i), as a sum over omics of gi,jαj:
which is, in fact, the genetic part of um(i). So, for each individual, there is a single omics-mediated value um(i) and a “residual” BV ar(i) that explains the genetic variation of the phenotype part not mediated by omics; the same individual i has, for each omics mi,j, a BV gi,j; and the sum of the BVs for omics gi times their effects α gives the mediated BV
the overall BV is therefore
It is worth noting that assumptions of the model lead to uncorrelated ar and am. This can be understood as follows. If gene A has action on the omics and the omics contribute to the trait, then gene A contributes to the genetic variation of am, but not to that of ar. If gene B has no action on the omics yet it contributes to the trait (e.g., because the relevant pathway is not in the omics measurement), then gene B contributes to the genetic variation of ar, but not to that of am. However, there is a correlation between each component am, ar, and overall a, as shown later. Finally, the overall residual after discounting BV is
The hierarchical model that we just presented is a generalization of models for genomic prediction: SNPs are omics measures with a heritability of 1. Alternatively, omics (m) can be seen as multiple traits, but instead of using massive multiple trait models with unstructured covariance matrices, we use a hierarchical model, which is actually a recursive model (a special case of simultaneous equation model;
used a recursive model to consider the relationship between metagenome and methane emission, but with only one measurement (relative abundance of a genera) at a time, with vague prior information on the regression coefficient. Instead of fitting one measurement at a time,
imposed a stricter prior information in which regression coefficient α values were drawn from a single distribution, as will be shown next. This allows simultaneous fitting and estimation of all omics measurements, and also an interpretation of associated variance components, as shown below.
Next, we need models to predict both α and g. First, we assume
It seems natural to assume that the effect of the transcript of one gene is a random effect. We also assume that the effect of the transcript of one gene is uncorrelated with that of another gene. However, assuming that the effect of a wavelength is different from that of a neighboring wavelength is more disputable. Second, we assume that omics measures are uncorrelated with each other; again, it is debatable whether this is reasonable or not and it needs to be verified with real data. Third, we assume constant heritability of omics (this assumption is easily removed at the cost of more complex algebra). The 3 assumptions lead to expressions for genetic evaluation that are quite easy to use and also interpretable in a quantitative genetics sense.
presented a method for prediction (GOBLUP or Genomic Omics BLUP) based on 2 successive mixed model equations (MME). This is not an approximation, because the information from each MME is disjoined.
In the first step, omics effects on data are estimated, either by estimating omics effects (similar to SNP-BLUP) or using omics similarities (similar to GBLUP):
For X and Z incidence matrices,
is a scaled omics similarity matrix,
and H is a genetic relationship (pedigree A, genomic G, or single-step H). Parameters are
the part of phenotypic variation explained by omics, and
the part of phenotypic variation explained by “nonmediated” genetic effects; this model is not new (
). These equations yield the nonmediated part of the EBV
and “improved phenotype predictions”
which are based on trait observations y and omics M, and can be seen as “y with less environmental noise,” or as a predictor trait such as SCS, which is a predictor of subclinical mastitis.
The notion of using a predictor of a trait instead of a direct measure is very old and is used, for example, for protein content (measured through milk spectra) or subclinical mastitis (measured through SCC). However, in contrast to these well-established uses, these phenotype predictions
may include animals with no phenotypes for y (which allows for early prediction of traits based on omics).
, in fact, suggested calibrating prediction equations that used near infrared or nuclear magnetic resonance and then use the prediction as a correlated trait. However, this implies that predictions are portable through environments, years, and genetic backgrounds; the
In the second step, once the phenotype predictors
are obtained, they are used as pseudo-traits in a second MME to extract the heritable part,
being the design matrices for omics records, and parameter
being the heritability of omics measurements. Total EBV is
The method has, in principle, been extended to single-step cases (not all animals are omics phenotyped), meaning that all cases are possible: animals with or without phenotypes, genotypes, or omics in all possible combinations. Extensions to more effects, multiple traits, and more complex covariance structures are immediate. Bayesian regressions such as Bayes B are also doable without much difficulty.
The whole procedure is called GOBLUP. Thus, the basic machinery for omics-based selection is there, even if omics features have not (yet?) been massively produced, with the possible exception of those in crop plants (
). The next sections will explore the a priori usefulness of omics-based selection and illustrate some results from existing studies.
First, the linear model above with the simplifying assumptions explains the variance decomposition of 2 more popular models. First, GBLUP, with
which is the classical analysis, and second, so-called GMBLUP or GTBLUP, where M stands for microbiome or metabolite and T for (gene) transcript (
(remaining heritability when omics are included). It has been empirically observed that moving from GBLUP to GTBLUP implied a drop in estimates of heritability (because omics are heritable) and a decrease in residual variance (
in other words, omics capture
of the total variability, which, times a fraction
(heritability of omics measures), represents the genetic variation of the omics-mediated phenotype, whereas the nonmediated genetic part explains
In contrast, the ratio of residual variance to total variance reduces from
in other words, conditional on omics, the trait is better explained. All of this has implications for selection that we will detail later.
Use of SNP chips for selection raises no questions in dairy cattle, but for species with a lower ratio of reproducer value to genotype cost, its use had to be considered. Similarly, we need to evaluate whether omics-based selection is useful given the cost of omics phenotyping and selection plans. In other words, is this a technology worth betting on?
The case for omics-based selection is similar to that for SNP-based selection. The breeder wants a measurement of the BV that is either more accurate or available earlier. Note that this is somehow different from plants or other uses (e.g., medical applications) where one is interested in the prediction of phenotype.
First, we want to know whether the omics-predicted phenotype is a good predictor of the actual phenotype; to give an example, can we predict phenotype of feed intake based on phenotypes of MIR spectra (
)? The squared correlation between the actual and omics-predicted trait is simply
the part explained by omics. To complete the preceding expressions, the squared genetic
correlations of the omics-predicted and the actual trait are derived. The squared genetic correlation is
In other words, when
tends to 0, the genetic correlation tends to 1. Note that
is (also) the squared correlation between the omics-mediated BV am and the overall BV a,
As for the squared residual correlation, this is
After an individual is phenotyped for omics, the omics measurements m are obtained. Plugging in estimates of omics effects
a phenotypic prediction of
is obtained. This is similar to indirect predictions on genomic selection based on markers. Then a prediction of BV can be obtained using y,
or both. In turn, this allows predictions for the trait of interest y and also BV prediction. We use this framework to characterize in which cases the omics feature is of interest using selection index theory. Assume that the unobserved omics trait u can be perfectly “predicted” conditionally on m; in other words, every αi is perfectly estimated. This will be the case, loosely speaking, when the product
by the number of independent records is large; that is, the omics effect can be accurately estimated from records, and the trait of interest y has been recorded in a large number of individuals, and these individuals cover a large variation of the breed across herds, regions, and background genetics. In this case (α being perfectly estimated), phenotype prediction has reliability
This is already the case for traits that are very well predicted from milk spectra, such as fat content (
To get some perspective on reliability using omics data, we derived upper bounds of reliabilities considering simple examples of single animals. Ultimately, accuracies of bulls with daughters are a function of the number of daughters and the accuracies of these daughters; the same applies for marker estimates.
Cow Artxueta has a single record for y. Reliability of the EBV is simply
Heifer Bustintza has no record for y but has been properly phenotyped for omics, and α values are exactly known, so we have a perfect measure of u. The reliability of the phenotype prediction is
However, reliability of the EBV for u is actually the heritability of omics measurements
In turn, the reliability of the EBV for y is the reliability of the EBV for u, which is actually its heritability,
times the squared genetic correlation
In this case, we can see that the space in which recording omics m is more reliable than measuring y is as follows:
The breeder is therefore interested in using a set of omics measurements conceived such that all the genetic variation is mediated through omics
because, in that case, the ratio
tends to 1, and this increases accuracy based on omics measurements. Also, having heritable omics
is more important than omics explaining a lot
but again, we assumed that data sets were so large that α was correctly estimated anyway.
These ideas are reflected in Figure 1, which shows the reliability using omics (Relm) for a low heritability (h2 = 0.10), in which case, Rely = 0.10. The space in which omics are more accurate than the observation of the trait is wider when
is high. This is exactly the case with genomic selection: SNPs have
when they explain all genetic variation of the trait.
Now consider cow Chinebral, which has both the record for y and the (perfect) prediction for u. According to selection index theory (
analyzed a trait (days to silking) with
for which the heritability estimate dropped to
after fitting transcriptome measurements, which were highly explanatory
and were themselves quite heritable
In a study in mice,
report for the trait BW10,
from which we deduced
Then we considered the case of a low-heritable trait
for which there are 2 options. An omics measure of low heritability
explains a good portion of the phenotypic variation
An alternative omics measure of high heritability
explains a small portion of the phenotypic variation
With these elements (presented in Table 1) and assuming that omics effects can be perfectly estimated, we can estimate the reliabilities using either an animal's own phenotype, omics data, or both (Table 2). For the real-data cases in mice and maize, using the omics record is not more accurate for EBV estimation than the phenotypic record, which is itself rather heritable. However, the EBV omics prediction is quite reliable and could be used if it were less expensive or could be measured earlier in life (which is often the case in crops). When variance components resemble the mice case, our results show that combining information from the actual phenotype and record would yield more accurate predictions.
Table 1Scenarios with different variance components for phenotype and breeding value prediction
The invented trait gives more insights. The omics with high
is quite reliable for phenotype prediction but not as reliable for BV prediction. In the case where omics explain less of the trait but are more heritable, the phenotype prediction is not particularly good but the BV prediction is quite accurate. (A caveat here is that this is somehow misleading, because in practice the accuracy of estimation of omics effects α, which we assumed to be perfect, depends on
In any case, Table 2 illustrates that for selection purposes, it is more important to have heritable omics measures than explicative ones.
Finally, there is abundant literature related to phenotype prediction (
), obtaining biochemical measures from grains is easy. However, studies focus mainly on phenotypic prediction because, on the one hand, crop breeders tend to analyze single-generation experiments (unlike dairy cattle breeders) and, on the other hand, field trials are expensive and complicated to set up, so a phenotypic prediction is very useful. The literature in livestock genetics is less abundant because the only cheap available data are milk spectra (
suggested, in a microbiota context, that selecting mediated BV (am) will change microbiota composition (which may compromise rumen health), whereas selecting residual BV (ar) “will likely improve the trait by improved metabolic efficiency” (which may compromise overall health). These aspects could be taken into account for the construction of selection indices.
Overall, using omics or high-throughput measures may not be a “one size fits all” method but we consider it worth further exploration. The theory presented in this paper for BV prediction and the theory sketched for reliability of such predictions can help researchers determine when using omics or high-throughput measures is worthwhile for selection.
This study received no external funding.
The authors have not stated any conflicts of interest.
Invited review: A comprehensive review of visible and near-infrared spectroscopy for predicting the chemical composition of cheese.