Daily milk yield correction factors: What are they?

Graphical Abstract Summary: Cows are typically milked 2 or more times on a test-day, but not all those milkings are measured. Statistical methods have been proposed to estimate daily yields, centering on various yield correction factors in 2 broad categories: additive correction factors (ACF) and multiplicative correction factors (MCF). This research note presented a technical review of statistical models for estimating daily milk yields concerning their statistical interpretations, mode assumptions, and challenges. An exponential regression model was proposed as an alternative tool for estimating daily milk yields. The features of ACF and MCF are illustrated using simulation datasets, and their performance was evaluated by 10-fold cross-validations. The methods were described explicitly for estimating daily milk yield in morning (AM) and evening (PM) milking plans, but the principles are generally applicable to cows milked more than 2 times a day, and apply to estimating daily fat and protein yields, with some necessary modifications.

A ccurate milking data are essential for herd management and genetic improvement in dairy cattle. Cows are typically milked 2 or more times on a test-day, but not all these milkings are sampled and weighed. This practice started to supplement the standard supervised twice-daily monthly testing scheme in the 1960s, motivated by reducing the visits by a national DHIA supervisor and lowering the costs to dairymen (Putnam and Gilmore, 1968). The initial AM-PM milking plan alternately sampled the morning (AM) or evening (PM) milking on a test-day throughout the lactation. Daily yield (milk, fat, and protein) was estimated by 2 times the yield from single milkings on each test-day, assuming equal AM and PM milking intervals (Porzio, 1953). Let x ij be a known AM or PM yield for animal i on a test-day, where j = 1 for AM milking or j = 2 for PM milking. Then, the total test-day yield is estimated by Various methods have been proposed afterward, mainly to deal with varied milking intervals (see graphical abstract). The landmark developments date to the 1980s and 1990s, focusing on yield correction factors in 2 broad categories: additive correction factors (ACF) and multiplicative correction factors (MCF). In AM-PM plans, ACF provide additive adjustments to 2 times AM or PM milk yield as the estimated daily yield, computed specifically for each milking interval class (MICL), say k. That is, Here, Δ jk represents an ACF for milking interval k of milking j, and ŷ ijk is the estimated daily yield for cow i.
By noting that a daily milk yield equals the sum of the AM and PM milk yield, we obtain the following 2 equations given AM and PM milk yield: [4] Thus, ACF are evaluated by the population averages of the differences between the AM and PM milk yield, coupled with other categorical variables (Everett and Wadell, 1970). By adding equation [3] to equation [4], it shows that the sum of AM and PM ACF specific to each MICL equals zero: [5] A general form of an equivalent ACF model with daily yield as the response variable is the following: Daily milk yield correction factors: What are they?
Similarly, a linear regression (LR) model can be implemented as an ACF model in which f(θ) is a linear function involving milking interval and DIM, and the regression coefficient for AM or PM yield is estimated from the data. Let t ij be a milking interval time and d ij is the DIM for the test date when x ij is sampled and weighed; all are pertaining to milking j. Then, the LR model involving milking interval time and DIM alone is the following: Here, α j is an overall mean specific to milking j, and β, γ, and b are common regression coefficients for milking interval, DIM, and single milking (AM or PM) yield, respectively. Note that LR models can be defined with varying complexity by adding or dropping regression variables as appropriate (Liu et al., 2000). As a typical LR approach, daily yield is estimated directly given the estimated model parameters in [8]. Alternatively, ACF can be obtained by evaluating the expected values of f(θ) for discretized MICL locally. Assume that E(d ij − d 0 ) = 0. Let there be k = 1, …, K classes for AM (or PM) milking. Then, ACF are computed for 2 × K classes. [9] The above holds assuming E t t where E t t j k j k ( ) ( ) ( ) = is a midpoint of each MICL. Throughout this paper, we use a superscript k to indicate a discretized MICL because it is not a variable index in the data model. In contrast, a subscript index k is reserved for an index for a categorical variable in the data model. Then, the daily yield is given by the sum of b x ij and the ACF, ∆ j k ( ) , explicitly computed for the kth MICL. That is, Within each MICL, the following holds in a joint analysis using AM and PM milking records, assuming a common regression coefficient for AM or PM yield.
where y k ( ) is the average daily milking yield for all the cows in MICL k. Hence, if we force b = 2, the above relationship [11] is reduced to [5].
Multiplicative correction factors, also referred to as ratio factors, are ratios of daily yield to yield from single milkings, com-puted for various MICL (e.g., Shook et al., 1980;DeLorenzo and Wiggans, 1986;Wiggans, 1986). Denote AMP and PMP for bulk AM and PM yield, respectively, such that AMP + PMP gives the test-day yield. Then, the AM and PM MCF, denoted by F 1 and F 2 , respectively, are defined as follows (Shook et al., 1980): F 2 = + AMP PMP PMP . [13] Confined to MICL k, we show the following relationship holds based on equations [12] and [13]: [14] The above brings convenience to computing. For example, given the computed PM MCF (F 2k ), AM MCF can be obtained indirectly as follows: [15] Shook et al. (1980) utilized a quadratic regression of the PM portion of daily yield on milking interval time to obtain smoothed estimates of MCF and dealt with MICL having no or insufficient milking records. DeLorenzo and Wiggans (1986) proposed an LR model without intercept to derive MCF for cows milked twice a day, as follows, assuming heterogeneous means and variances and fitted separate LR models for different MICL: [16] The above regression coefficient F jk coincides by definition with the MCF specific to each MICL (Shook et al., 1980), assuming E ijk ε ( ) = 0, because MCF for MICL k is also computed by A covariate for DIM can also be included to account for the variation of the lactation curve (DeLorenzo and Wiggans, 1986). Wiggans (1986) proposed deriving yield factors for cows milked 3 times a day through regressing the AM or PM proportion of daily yield (x ij /y ij ) on milk interval (t ij ), as follows: [18] By taking the expected value on both sides of equation [18], and letting E( Here, E t t ij j k ( ) = ( ) is evaluated locally as the midpoint of each MICL for milking j. Then, MCF are obtained as The statistical interpretations of MCF vary slightly. First, according to DeLorenzo and Wiggans (1986), an MCF is a regression coefficient specific to each MICL, defined in [16]. Second, an MCF is a ratio of the expected value of daily yield to the expected value of the yield from single milkings, defined in [17] and computed for each MICL (Shook et al., 1980;DeLorenzo and Wiggans, 1986). Third, an MCF is the expected value of the ratio of daily yield to a single milking yield according to the Wiggans (1986) model, defined in [20]. Note that the third interpretation can also be derived from the regression smoothing models by Shook et al. (1980) and DeLorenzo and Wiggans (1986), because they fitted the same or similar ratio variable in the linear or quadratic smoothing models. [21] Multiplicative correction factor models are statistically challenged by the well-known "ratio problem" because they have a ratio variable (e.g., AM or PM proportion of daily yield) as the dependent variable in the data density (Wiggans, 1986) or the smoothing functions (Shook et al., 1980;DeLorenzo and Wiggans, 1986). The consequences included possible biases in 2 aspects: omitted variable bias and measurement error bias (Lien et al., 2017). The former happens because the main model effects are missing if the model is re-arranged by multiplying the denominator variable to both sides of the equation. The latter occurs when there are measurement errors with the denominator variable of the response.
Here, we propose an alternative model by taking the logarithm of daily to single milking (say AM or PM) yield ratio as a response variable. That is, where (y ij /x ij ) is a ratio of daily yield to single milking (say AM or PM) yield. With some re-arrangements, equation [22] becomes [23] Here, log(y ij ) is the response variable, log(x ij ) and t ij are the dependent variables (i.e., main effects), and b = 1 is a constant regression coefficient for log(x ij ). In the present study, however, we relax the restriction for b = 1 in [23] and estimate it from the data. Then, taking the exponential on both sides of equation [23] gives The above is recognized as an exponential regression model. [25] Next, taking the exponential on both sides of equation [25], with some re-arrangements, gives , and E y y and AM (or PM) yield, respectively. A simulation study was conducted. Daily milk yields were simulated based on a modified Michaelis-Menten function (Klopcic et al., 2013). Daily milk yield curves were simulated for 3,000 cows (see graphical abstract), where the values for y 720 and k were simulated from truncated normal (TN) distributions: y 720 ~ TN (12, 2) and k ~ TN (0.8, 0.1). The AM milking intervals were simulated following a TN distribution with a mean equaling 12 h and a standard deviation of 1.12 h. PM milking intervals for the same cows were 24 h minus the AM milking intervals. Approximately 98.6% of the cows had AM (PM) milking intervals between 9 and 15 h (see graphical abstract). Deviations due to DIM and other system variables were all ignored.
The performance of each model was evaluated based on mean squared errors (MSE) and accuracies of estimated daily milk yields obtained from 10-fold cross-validations, each replicated M = 30 times. The R 2 accuracy (Liu et al., 2000) was computed per cross-validation replicate and per individual animal except that the MSE was obtained from the testing sets. Hence, it can be reviewed as predictive R 2 accuracy. To further infer the origin of errors, MSE was decomposed into variance and squared bias, computed as an average of all animals per cross-validation replicate or as the average for each animal across the 30 replicates.
The variances of estimated daily milk yields were all close to zero for these methods ( Table 1), suggesting that they all had high precision for the estimates. Overall, ACF and MCF models considerably outperformed M0 (double AM or PM yields) regarding biases and accuracies (Table 1). The 2 ACF models, M1 and M2B, had larger MSE and lower R 2 accuracies than M2A. For the MCF models, M6A and M6B (Wiggans, 1986) performed slightly better than M4 (Shook et al., 1980) andM5 (DeLorenzo andWiggans, 1986). The latter 2 models, M4 and M5, performed similarly. Model M6B estimated daily milk yield based on MCF (Wiggans, 1986). The 2 exponential function models, M7A and M7B, had the smallest squared bias (and MSE) and the largest accuracies in the methods. The ranges of accuracies between cross-validation replicates were very narrow. The accuracies evaluated per animals were slightly higher than those assessed per cross-validation replicates, and the ranges of accuracies in the latter case were drastically larger. The accuracy ranges were between 0.356 and 1.00 (M7B) and between 0.363 and 1.00 (M7A), respectively, based on the exponential regression models, whereas the accuracy ranges were larger for other methods. M0 had the lowest accuracy (0.730) and the most extensive range (0.153-0.788) per animal (Table 1).
Estimated model parameters for 4 models were obtained in one cross-validation replicate and shown in Table 2. The correlation coefficients between actual and estimated daily milk yields were high for all the models. Yet, the fitted linear regressions between actual and estimated daily milk yields varied considerably between these models. The ACF model (M1) had larger intercepts than the MCF model. The LR model with continuous milking interval (M2A) had a significantly smaller intercept. The exponential regression model, M7A, had the smallest intercept, and the regression coefficient was close to 1. Therefore, the ACF model M1 had the largest biases and the worst accuracies. The exponential model M7A had the least biases and the greatest accuracies. Hence, correlations are not an appropriate measure of accuracy for estimating daily milk yields because biases are not considered.
The impact of discretizing milking intervals on estimating daily milk yields was evaluated through 4 pairs of models. Each pair had the same model settings, except daily milk yields were calculated using different strategies. The models labeled A (M2A, M3A, M6A, and M7A) estimated daily milk yields directly based on estimated model parameters. Instead, the models labeled B (M2B, M3B, M6B, and M7B) first computed ACF or MCF for discretized MICL. Then, daily milk yields were estimated based on the computed ACF or MCF per discretized MICL. The models in group A consistently had smaller MSE and greater predictive R 2 accuracies than their counterparts in group B. We thus concluded that discretizing milking interval time led to increased biases and, therefore, loss of accuracies in estimated daily milk yields. How systematic biases arose from discretizing milking interval is ana-  Shook et al. (1980); M5 = MCF model according to DeLorenzo and Wiggans (1986); M6A = MCF model implemented as linear regression according to Wiggans (1986); M6B = MCF model according to Wiggans (1986) Biases due to discretizing milking interval also existed with the model M6B (Wiggans, 1986). Nevertheless, the biases from discretizing milking intervals were relatively small in the present study.
Additive and multiplicative correction factors were characterized and compared in Figure 1. The ACF computed from the 2 ACF models, M1 and M2B, were comparable, except that ACF from model M2B were smoothed. But they did not agree with ACF computed from the LR model M3B. The average difference of ACF per MICL between M2B and M3B was 1.402/2 = 0.701. Numerically, we approximated the results by deriving a similar average difference of ACF between these 2 models: Multiplicative correction factors were computed and compared between 4 models: M4 (Shook et al., 1980), M5 (DeLorenzo and Wiggans, 1986), M6B (Wiggans, 1986), and M7B. Overall, MCF computed from different models are highly comparable for milking intervals between 9 and 14 h, but they showed relatively larger differences out of this range (Figure 1). All the computed MCF provided estimates of the ratios of daily yield to yields from single milkings, though their precise statistical interpretations varied. ACF from the model M1 and M2B were zero when AM and PM milking intervals were both approximately 12 h, meaning that close to zero adjustments were added to 2 times AM or PM milk yield as the estimated daily milk yields. MCF were all close to 2.0 with approximately 12-12 h equal AM and PM milking intervals. The MCF became smaller or larger than 2.0 as AM or PM milking interval departure from 12 to 12 h. Hence, doubling AM or PM milk yield gave an approximate estimate of daily milk yield with equal AM and PM milking intervals. Still, it was subject to large errors when AM or PM milking interval deviated from 12 h.  By noting e ≈ 2.718, we show that the exponential function is analogous to an exponential growth function: where y 0 = x b is the initial value, r = 1.718 is the rate of change, tuned by a time function as a linear function of milking interval and DIM. Thus, the proposed model (M7A and M7B) postulated exponential growth dynamics between daily milk yield and milking interval time, given known AM or PM yield as the initial value.
Finally, we reviewed and evaluated ACF and MCF models using simulation data, yet to be verified by actual milking data. In a continuing effort, large-scaled high-resolution milking data are being collected for follow-up studies, jointly supported by the US Council on Dairy Cattle Breeding, the USDA Agricultural Genomics and Improvement Laboratories, and the National Dairy Herd Information Association. The methods were explicitly described to estimate daily milk yield in AM and PM milking plans. Still, the principles generally apply to cows milked more than twice daily and apply similarly to the estimation of daily fat and protein yields with some necessary modifications.