An alternative interpretation of residual feed intake by phenotypic recursive relationships in dairy cattle

Graphical Abstract Summary: Genetic evaluation on residual feed intake (RFI) often takes 2 stages. Combining these 2 modeling stages leads to one-step linear regression, eliminating the need to estimate the residuals as the RFI phenotypes specifically. However, fitting phenotypes as regressor variables in a standard linear regression is criticized because phenotypes are subject to measurement errors. Multiple-trait models have been proposed, which give the genetic values of RFI through a follow-up partial regression procedure. By rearranging the linear regression equation, we came across an alternative, causal RFI interpretation by phenotype recursiveness between DMI and energy sinks. In this technical note, we propose a Bayesian recursive structural equation model (RSEM) for directly evaluating RFI, extending its analytical capacity to multiple-trait analysis.

R esidual feed intake was initially proposed by Koch et al. (1963) as the residuals from linear regression (LR) of feed intake on various energy sinks. It represents a resource allocation theory, which partitions feed intake into the feed intake expected for the given production level and a residual portion (Herd, 2009). Genetic evaluation on RFI often takes 2 stages (Berry and Crowley, 2013). In the first stage, DMI is taken to be a linear function of variables (i.e., energy sinks) to account for body tissue mobilization. In dairy cattle, energy sinks often include metabolic body weight (MBW), milk net energy (MILKNE), and changes in BW (ΔBW) (e.g., Pryce et al., 2015;Tempelman et al., 2015;Løvendahl et al., 2018;Islam et al., 2020). In the second stage model, the computed RFI phenotypes are fitted by a mixed-effects model to estimate RFI genetic values and relevant genetic parameters. Combining these 2 modeling stages leads to one-step LR for RFI (e.g., Løvendahl et al., 2018), eliminating the need to estimate the residuals as RFI phenotypes specifically. Fitting phenotypes as regressor variables in LR has been criticized  because standard regression models assume that regressor variables have been measured precisely. In reality, however, phenotypes are subject to measurement errors. Multiple-trait models have been proposed that obtain RFI genetic values indirectly through followup partial regression (Kennedy et al., 1993;Lu et al., 2015;Islam et al., 2020;Tempelman and Lu, 2020). With various methods available, the biological implications for the computed RFI remain to be exploited (Martin et al., 2021).
By rearranging the LR equation, we came across an alternative, causal interpretation of RFI by phenotype recursiveness between DMI and energy sinks. Consider a single animal, say i. Let y i1 be a variable for DMI phenotypes and let y i2 , …, y ik be variables representing the phenotypes for k − 1 energy sinks, all measured on this animal. A simple energy model includes only energy sinks as fixed effects plus the residual (r i ) as a RFI phenotype (Løvendahl et al., 2018): where λ 1j quantifies the effect of energy sink j on DMI. The energy sink model may include additional covariates or factors (Pryce et al., 2015;Tempelman et al., 2015). Then, the residual is fitted by a mixed-effects model: where μ 1 is the overall mean, β 1 is a vector of fixed effects, a 1 is a vector of random animal additive genetic values, x i1 and z i1 are the corresponding incidence vectors for animal i, and e i1 is an error term. We did not consider random environmental effects in model [2] but they can be included similarly.
Note that in [3], y i1 is a phenotype for DMI, but the fixed and random effects (i.e., β 1 and a 1 ) pertain to the system phenotype, y y i t k t it 1 2 1 − = ∑ λ , which is RFI. This feature contrasts that of a multiple-trait mixed effects model in which the model parameters belong to DMI and energy sinks. Hence, the recursive model can directly estimate RFI genetic values and genetic parameters without taking follow-up partial regression or reparameterization. Next, each energy sink trait, say j, is described similarly by a mixed-effects model: An alternative interpretation of residual feed intake by phenotypic recursive relationships in dairy cattle , and β and a are vectors of appropriate lengths containing the fixed and random effects, respectively. The vectors of fixed and random effects are resorted for each effect by traits within individuals, and the incidence vectors are set up accordingly. Let the incidence vectors of fixed effects be the same between traits for each animal. That is, , where x ij is an incidence vector linking fixed effects to the jth phenotype and x i is the common incidence vector. Then, X i = x i ⊗ I, where I is a k × k identity matrix. Similarly, we have Z i = z i ⊗ I. Finally, the structural matrix (Λ) is defined as follows: This recursive model belongs to the broad category of recursive and feedback systems for describing the phenotypic relationships between diseases and production in animal breeding (Gianola and Sorensen, 2004;Wu et al., 2007Wu et al., , 2008. Jamrozik et al. (2017) applied recursive modeling to analyze ratio traits (e.g., y 2 /y 1 ), where y 1 is taken to be the baseline trait with an assumed recursive effect on y 2 . Lu et al. (2015) applied a modified Cholesky decomposition of covariance matrix between DMI and 2 energy sinks. The reparameterization implied fully recursive effects from energy sinks and DMI. Neverthelesss, they did not follow structural equation modeling but retained a multiple-trait mixed-effects model and obtained partial regression coefficients from estimated covariance matrices. The Bayesian implementation of the recursive model for RFI follows Gianola and Sorensen (2004) and Wu et al. (2007Wu et al. ( , 2010. A simplified algorithm is described below. A detailed description is available at https: / / redmine .uscdcb .com/ documents/ 259). The RFI phenotypes i.e., y y i t k t it 1 2 1 − ( ) = ∑ λ are uncorrelated with the phenotypes of energy sinks (Kennedy et al., 1993). According to the path theory, a zero phenotypic correlation (r p ) between RFI and an energy sink (indexed by j, for j = 2, …, k) implies either (1) RFI j or (2) r e e RFI j = 0 and r a a RFI j = 0,, where h is the square root of heritability, and r a a RFI j and r e e RFI j are the genetic and residual correlations, respectively, between RFI and energy sink j, assuming a total determination by these 2 components. The former relationship in (1) is a strong assumption, which states that genetic and residual correlations between RFI and energy sinks were highly coordinated and they do not necessarily equal zero. We took the latter approach, option (2), by forcing the genetic (G) and residual (R) covariance between RFI and energy sinks to be zeros, because we intended to have RFI as a measure of net feed efficiency, independent of energy sinks. That is, ..
The genetic and residual covariance matrices between DMI and energy sinks are given by .
The conditional posterior distribution of structural coefficients does not depend on any unknown parameters of energy sinks, assuming zero genetic and residual correlations between RFI and energy sinks. This feature drastically simplifies the posterior inference of structural coefficient matrix and unknown parameters for We assumed a multivariate normal prior distribution (MVN) for λ. That is, λ λ λ | ,~, , I is a (k -1) × 1 (k -1) identity matrix, and λ 0 and τ 2 are hyperparameters. Then, the conditional posterior distribution of λ is also an MVN distribution (Gianola and Sorensen, 2004;Wu et al., 2007), independent of the equations for energy sinks. The conditional posterior means of λ are   where E is the expectation, else represents the data and all other unknown model parameters, and w y , for i = 1, …, n. Similarly, the conditional posterior distribution of location parameters (i.e., fixed and random effects) and scaling parameters (variance components), respectively, for RFI does not involve any unknown parameters for energy sinks either.
To see the link between the recursive model and linear regression for RFI, consider equation [3] and replace the structural coefficients, λ 1j , by partial regression coefficients, b j , for j = 2, …, k. If we move all the fixed and random effects to the left-hand side of the equation and keep the energy sinks and the residual on the right-hand side, it becomes [8] Then, the least-squares solutions of the partial regression coefficients are as follows: where w y . Note that [7] coincides precisely with [9] if we let τ 2 → ∞ in [7], which is equivalent to assigning flat priors to structural coefficients in [9]. In other words, the conditional posterior means of structural coefficients agree with (or are asymptotically equivalent to) the partial regression coefficients in one-step LR, given μ 1 , β 1 , and a 1 , if we ignore the prior values (or the impact of priors diminishes when the data dominate the posteriors). Likewise, the same conclusion holds for the location and scaling parameters between the recursive model and one-step linear regression for RFI.
The data set consisted of 645 first-parity cows with phenotypes, derived from 125 sires and 477 dams, and raised in the USDA Beltsville Agricultural Research Center (BARC) Dairy Herd (Beltsville, MD). The phenotypic data included DMI, MBW, MILKNE, and ΔBW, all obtained as averages over a 42-d trial.
Their means (SD) were 28.9 (3.81) kg/d, 113.8 (6.71) kg 0.75 , 21.1 (2.18) Mcal, and 0.47 (0.22) kg. Phenotypes were standardized to means of zero and unit variance to facilitate comparing the estimated effects between traits not affected by the units of the traits. The data standardization did not change the phenotypic correlations, which were 0.441 (DMI vs. MBW), 0.556 (DMI vs. MILKNE), 0.166 (DMI vs. ΔBW), 0.132 (MBW vs. MILKNE), 0.193 (MBW vs. ΔBW), and −0.036 (MILKNE vs. ΔBW). We compared 2-stage models and one-step models for RFI, implemented by LR and Bayesian RSEM, respectively. Model LR1 was the stage-one model of the 2-stage linear regression, with MBW, MILKNE, and ΔBW as fixed effects. Model LR2 was a one-step linear regression with 3 energy sinks and DIM (DIM = 71,72,73,74,75,76,77) as the fixed effects and individual animal effects as random variables. Model LR3 had all the model parameters in LR2, plus test weeks (i.e., 143 levels) as an nongenetic random variable. Models RSEM1, RSEM2, and RSEM3 were the Bayesian recursive equation models of LR1, LR2, and LR3, respectively, but with phenotypic recursive effects assumed from energy sinks to DMI. For a Bayesian recursive model, we ran 30 parallel Markov chain Monte Carlo (MCMC) chains, each consisting of 2,200 iterations, with a burn-in of 2,000 iterations and thinned every 2 iterations. We also ran single-trait mixed-effects model analyses on each of these traits, and a multiple-trait, mixed-effects model (MT), with DIM as a fixed effect and test-week and animal effects as random effects. Markov chain Monte Carlo convergence was examined for the model parameters using the shrink factor (Gel- Phenotypic/Genetic = partial regression coefficients based on phenotypic and genetic variance-covariance matrices from a multiple-trait model (MT), respectively. DIM were included as fixed effects and TW and individual animal genetic values were included as random effects.
man and Rubin, 1992). The MCMC chains, which were initialized randomly, converged quickly. The shrink factor dropped below 1.1 after 200 iterations and approached 1.0 after 1,000 iterations (see the graphical abstract). Saved posterior samples after 1,000 iterations were pooled and used to make the posterior inference of unknown parameters. The estimated effects from energy sinks to DMI (Table 1) agreed well between LR and recursive models with similar settings (e.g., between LR1 and RSEM1). On standardized phenotypic scales, MILKNE had the largest effects on DMI (0.51 to 0.53), followed by MBW (0.31 to 0.35), and ΔBW had the smallest effect on DMI (0.12 to 0.13). Including different sets of fixed and random effects led to varied RFI definitions, and the estimated partial regression coefficients (or structural coefficients) varied accordingly. Nevertheless, the estimated RFI genetic values agreed very well between a 2-stage model and a one-step model. The Spearman correlation of the estimated RFI genetic values was close to 1 between LR1 and LR3 and between RSEM1 and RSEM3, and rerankings happened rarely. The Spearman correlation between LR3 and RSEM3 was 0.998 ( Figure 1, panel A). The differences were primarily due to Monte Carlo errors.
The MT model allows for distinguishing between genetic and residual effects. The genetic partial regression coefficients did not agree with the partial regression coefficients (or structural coefficients) obtained from single-trait LR (or recursive models). Nevertheless, the partial regression coefficients estimated from phenotypic (co)variance agreed very well with the partial regression coefficients from LR1 (or the structural coefficients from RSEM1). The multiple-trait, mixed-effects model assumed correlational relationships between the traits, which has no causal interpretation. Nevertheless, a fully-recursiveness system can be assumed based on the modified Cholesky decomposition . Consider the phenotypic relationships, for example, in the present example. The L L Σ ′ decomposition implies fully recursive relationships for the traits (ordered by y i4 , y i3 , y i2 , and y i1 ). Here, L is the unit lower triangular matrix, which corresponds to the structural coefficient matrix in RSEM, as follows: where b jj ′ is the effect (i.e., partial regression coefficient) from trait identical to those for λ 12 , λ 13 , and λ 14 in [7]. However, they differed in the assumed relationships between energy sinks. For example, the model by Lu et al. (2015) assumed recursive effects from milk energy on MBW, and the relationships between energy sinks are correlational in a recursive model. The estimated partial regression coefficients from the MT model based on the phenotypic covariance matrix coincided precisely with the structural coefficients, showing differences only after the third decimal point, but they can vary depending on the data. Overall, the heritability estimates obtained from RSEM3 and single-trait LR were moderate to high for DMI (0.40-0.49) and MBW (0.59) but low for ΔBW (0.002-0.04; Table 2). The heritability estimate for RFI was 0.392 by one-step LR and 0.240 by RSEM3. The heritability estimates for energy sinks were within comparable ranges of previous studies (e.g., Berry and Crowley, 2013;Tempelman et al., 2015). The RFI heritability estimates were similar to those reported by Connor et al. (2013). They reported an RFI heritability of 0.36 using only the USDA AGIL (Animal Genomics and Improvement Laboratory) data for early lactation cows. Tempelman et al. (2015) reported lower RFI heritability estimates (0.18 ± 0.02) and country-specific estimates ranging from 0.06 to 0.24. Residual feed intake heritability was 0.16 in data sets that included the AGIL data Li et al., 2020). Genetic correlations were moderate to high between DMI and RFI, MBW, and MILKNE (0.44 to 0.72), and low between DMI and ΔBW (0.13-0.20) and between MBW and MILKNE (0.15-0.16) ( Table 2). The genetic correlations between RFI and energy sinks were forced to be zeros with RSEM3, but they had small values (−0.03 to 0.01) based on the multiple-trait model.
The present study assumed a single, homogeneous structural coefficient matrix. Model expansion to account for heterogeneous structural coefficient matrices is straightforward, where the conditional distributions for structural coefficients take the same formula but are sampled separately for each subpopulation (Wu et al., 2010). The MT model also allows for distringuishing between genetic and residual relationships. Which assumption is more plausible remains a topic for further investigation. Modeling simultaneous effects between energy sinks and DMI and between energy sinks is possible, subject to the model identifiability.

Notes
This study had no external sources of funding.