Re-sequencing of the casein genes in Swedish Red cattle giving milk with diverse protein profiles and extreme rennet coagulation properties

Graphical Abstract Summary: Impaired rennet coagulation properties in milk could lead to prolonged processing times and production losses. Heritability for impaired coagulation has previously been estimated to be 0.28 to 0.45, indicating that genetic selection can be used to manipulate this trait. To better understand the effect of genetic variation in the casein genes on milk coagulation, blood samples from 30 cows with divergent coagulation properties were sequenced. We found 121 genetic variants, including 12 that have not been previously described and 1 synonymous variant in CSN1S1. Three genetic variants were found in the untranslated regions of exons in 3 caseins. Genetic variants could be potential markers for protein composition and coagulation.

R ennet coagulation of milk is crucial to cheese production and poorly as well as non-coagulating (NC) milk could lead to prolonged processing times and production losses.In Swedish Red Dairy Cattle (RDC) up to 37% of the cows produce poor or NC milk (Gustavsson et al., 2014;Nilsson et al., 2019).It has previously been shown that the heritability for NC is 0.28-0.45for NC milk (Duchemin et al., 2020;Gustavsson et al., 2014) and 0.32-0.56for other coagulation traits (Duchemin et al., 2020).Both Gustavsson et al. (2014) and Duchemin et al. (2020) showed moderate to strong phenotypic and genetic correlations between relative proportions of the different caseins (CN) in the milk and the coagulation traits.There are 4 different CNs in bovine milk, α S1 -, β-, α S2 -and κ-CN, with the corresponding genes CSN1S1, CSN2, CSN1S2, and CSN3 all located in the CN cluster on bovine chromosome 6.There are several genetic variants of each CN and the prevalence of these varies among different cow breeds (Caroli et al., 2009).Identification of genetic protein variants can be done by protein analysis relying on differences in electric charge, molecular weight, or their isoelectric point.However, in some cases it is not possible to distinguish between 2 genetic protein variants, such as β-CN F and G that have the same mass and isoelectric point.
There is also exon variation in the CN genes which does not affect the mature protein perse, such as synonymous variants (differences in DNA sequence which will not lead to a change in the translated AA sequence), SNPs in the untranslated regions (UTR) of the exons which can affect protein expression, and SNPs in splice regions which can lead to splice variants of the proteins (Boeva, 2016).UTR are part of the mRNA but occur before the start codon signaling for translation of mRNA to protein or occur after the end codon.These regions can affect mRNA stability, mRNA localization, and translational efficiency (Barrett et al., 2013).Nilsson et al. (2020) studied different milk protein variants and their posttranslational modifications in relation to rennet coagulation using liquid chromatography high resolution mass spectrometry (LC-HRMS).It was shown that protein variants β-CN A2 and κ-CN A was more common in NC milk and that β-CN A1 and κ-CN B was more common in coagulating milk.These effects were mainly believed to be indirect as the CSN2 genotype influences relative concentrations of α S2 -and κ-CN and the CSN3 genotypes affects relative concentrations of α S1 -and κ-CN.Meanwhile, Lewerentz et al. (2023) found that relative concentrations of κ-CN explained much of the variation in rennet-coagulation properties.To better understand the effect of genetic variation in the CN genes on milk coagulation, blood and milk samples from 30 Swedish red dairy cattle (RDC) with divergent coagulation properties is chosen from a previous observational study on 600 RDC cows (Duchemin et al., 2020).The DNA will be sequenced for the CN genes to determine the theoretical AA sequence and to look for genetic variation in introns and untranslated exon regions.The aim of this study is to confirm the protein-based genetic variants previously reported, while also searching for additional genetic variation, specifically synonymous variants in the exons of 30 cows.Furthermore, we hypothesize that the discovery of SNPs in the untranslated regions (UTR) of exons and introns could have an effect on protein expression.
Of the 600 RDC cows included in our previous observational study in December 2015 to April 2017 (Duchemin et al., 2020), 30 cows were selected based on their milk properties with regards to rennet coagulation properties, protein genetic variants and posttranslational modifications (Nilsson et al., 2020).The selected cows came from 26 of the 31 conventional farms in the South of Sweden included in the previous study (Duchemin et al., 2020) where 2 cows/farm were selected in 4 farms and one cow/farm was selected in the rest of the farms.All 30 cows had different sires and dams.For the 30 cows, milk samples were collected at d 194 ± 78 (average ± standard deviation) in the lactation cycle and in parity 2.2 ± 1.6 which is in range compared with what was found in the full data set (Nilsson et al., 2020).Of the cows investigated in this study, 13 cows gave milk that did not coagulate within 40 min after rennet addition (RCT >40 min) and 17 cows gave milk with intermediate to good coagulation properties defined by gel strength (G' max = 412 ± 226 Pa) and rennet coagulation time (RCT = 10 ± 1.1 min).Furthermore, previous information of genetic protein variants and post-translational modification variants in Nilsson et al. (2020) was used to choose a subset with a large genetic variation regarding protein.Hence, the allele frequency of the more rare genetic protein variants is somewhat higher in the subset than in the full sampleset of 600 cows from Nilsson et al., 2020.The methods for measuring rennet coagulation and protein composition can be found in Nilsson et al. (2020).DNA from the 30 cows was available in the bio-bank at the Swedish University of Agricultural Sciences, Uppsala, and was sequenced using primer sequences from the exons of all 4 CN genes according to Gallinat et al., (2013) resulting in 34 sequencing fragments.Sanger sequencing was done using the BigDye® Direct Cycle Sequencing kit (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA).We followed the sequencing protocol as described in the BigDye® instruction manual.Purification of the sequencing reactions was performed with BigDye® XTerminator kit (Applied Biosystems, Thermo Fisher Scientific).The sequencing reactions were then run in the Genetic Analyzer 3500xL (Applied Biosystems, Thermo Fisher Scientific).Sequence data was checked and edited manually using the free trace viewer Chromas 2.5 (Technelysium Pty Ltd., Brisbane, QLD, Australia) and BioEdit 7.2.(Hall, 1999).This was also the step where the variant calling was performed through manual inspection of the Sanger sequences.The obtained sequences were aligned across all animals and the reference sequence in BioEdit.This allowed the identification of any sequence variants among the animals in relation to the reference sequence.All exons in the CN genes which are translated to proteins are covered in the sequencing.Introns directly preceding and follow-ing translated exons are covered giving an average of 37% intron coverage per gene.The sequences were aligned and compared with the latest version of the bovine genome sequence ARS-UCD1.2and the reference genes ENSBTAG00000007695 (CSN1S1), ENSBTAG00000005005 (CSN1S2), ENSBTAG00000002632 (CSN2) and ENSBTAG00000039787 (CSN3).The positions of the detected SNPs were checked against the Ensembl genome database (release 109, Feb 2023; www .ensembl.org/Bos _taurus/ ) to determine if they had been previously described and find the correct reference SNP number (rsID).Chi-squared analysis was performed using Minitab 20 (Minitab Ltd.) to investigate if there were any association between the allele distribution of the newly identified SNPs and the 2 coagulation groups.
The sequencing of the casein genes in the 30 individual RDC yielded 116 SNPs where each SNP was present in at least one of the individuals.Furthermore, there were 5 SNPs identified where all 30 individuals differed from the reference sequence ARS-UCD1.2.The majority of the SNPs, 84%, are located in the introns on positions without any expected consequence to transcription or translation (Figure 1).Furthermore, 6% of the SNPs can be classed as splice polypyrimidine tract variants which are SNPs that falls in the polypyrimidine tract at the 3′ end of intron between 17 and 3 bases from the end (SO: 0002169; Eilbeck et al., 2005).The rest of the SNPs are situated in the exons where 2.5% of the SNPs present in the subset can be found in the 5′-or 3′-UTR of the exons, 1.7% of the SNPs are synonymous variants and 6% of the SNPs are missense variants concurring with the known protein variants for CSN1S1, CSN2 and CSN3.
The SNPs found in exons are presented in Table 1.The variants CSN1S1 C , CSN2 B and CSN2 F had an allele frequency of 0.03, 0.02 and 0.02 in the subset whereas the frequency was substantially lower in the collected sample set where the protein variants were determined using the protein phenotype (Nilsson et al., 2020).The allele frequency of the protein variants of the rest of CSN1S1 and CSN2 as well as all protein variants of CSN3 was very close to that of the full sample set presented in (Nilsson et al., 2020).As the 30 animals used in this study were chosen with regards to extreme coagulation properties as well as achieving a large variation in genetic protein variants it is likely that the more uncommon genetic protein variants is overrepresented compared with the RDC population.
In exon 17 of CSN1S1 we confirmed the presence of the, for RDC, rare variant CSN1S1 C in the 2 animals which was previously observed in the protein analyses (Nilsson et al., 2020).The remaining animals were homozygous for CSN1S1 B .The 2 animals which were heterozygous for CSN1S1 C also showed a mutation in the 5′ UTR in exon 2 in CSN1S1, located 1 bp upstream the start of the translated sequence (rs110899610T > C, Table 1).This genetic variant was also found in a study on Beninese cattle breeds by Vanvanhossou et al. (2021).CSN1S1 C was present in the data set but the authors do not present any data regarding if there is any linkage between rs110899610C and CSN1S1 C .Vanvanhossou et al. (2021) performed an analysis to investigate SNPs in binding sites of transcription factors (TF) or microRNA (miRNA).They found that the change from C to T in rs110899610 could cause a loss of binding of miRNAs bta-miR-2420 and bta-miR-453 as well as a gain of binding of bta-miR-376a.As miRNA can affect gene expression, the mutation in the 5′ UTR region of the CSN1S1 could affect the composition of produced milk (Boeva, 2016).Interestingly, a synonymous variant was found on exon 3 in CSN1S1, position 6:85415215 (ss9732325453, Table 1), changing the DNA sequence from T to C in one of the 30 animals.In both genetic variants the AA sequence remains the same with His on position 4 in the mature protein.This synonymous variant is not present in the Ensembl or NCBI databases and has, to our knowledge, not previously been reported.
All animals were found to be homozygote for the CSN1S2 B , which was consistent with the protein analyses (Nilsson et al., 2020).There was one SNP (rs109900747T > C, Table 1) found in the 5′-UTR region on exon 2 in CSN1S2, positioned 5 bp upstream the translated sequence.This genetic variant was also found in the study by Vanvanhossou et al. (2021) and the authors suggest that a change from rs109900747T to C results in a gain of binding bta-miR-452 and loss of binding of several TF.There is considerable variation among the animals in this study regarding this SNP as 47% of the animals are heterozygous, 23% are homozygous for the T allele and 30% are homozygous for the C allele (Table 1).
As in the full sample set (Nilsson et al., 2020), CSN2 A1 and CSN2 A2 where the most common β-CN protein variants in the subset of RDC with a frequency of 0.40 and 0.56, respectively (Table 1).The DNA sequencing confirmed the presence of the, in RDC, rare protein variants CSN2 B and CSN2 F caused by SNPs rs43703013C > G and rs433954503C > T, located on exon 7.In CSN2 B , rs43703013G results in an Arg on AA position 122 and in CSN2 F , rs433954503T results in a Leu on position 152 in the mature protein.rs43703011A results in a His in AA position 67 in the mature protein in variants CSN2 F and CSN2 A1 whereas rs43703011C result in a Pro in the same position in variants CSN2 B and CSN2 A2 .Especially the protein variants CSN2 A1 and CSN2 A2 have caught attention in research in the latest years.In a systematic review of randomized controlled trials, Daniloski et al. (2021) conclude that milk with β-CN protein variant A2 can have a positive effect on human health compared with the A1 variant.Furthermore, Daniloski et al. (2022) reviewed several studies on the effect of CSN2 genetic variants on milk functionality and concluded that protein variant β-CN A2 have better foaming and emulsification properties but develops poorer rennet and acid milk gels compared with β-CN A1.
On exon 4 in CSN3, the 3 SNPs rs43703015C > T, rs43703016A > C and rs43703017A > G describes protein variants CSN3 A , CSN3 B has been associated with higher total CN but also higher κ-CN contents in milk.The loss and gain of miRNA and TF could theoretically be part of an explanation for these superior qualities associated with CSN3 B .As presented in Figure 1 there is also quite a large amount of genetic variation in the introns of the CN genes.Twelve new SNPs and indels were found with variation in the data set.Furthermore, 2 indels were the same in all 30 animals but were different from the reference gene.All in all, 109 intronic genetic variants were found.In Table 2, 9 of the new SNPs and indels are presented with their assigned ssIDs together with some known SNPs and indels which were of specific interest in relation to intron position as well as potential correlation with protein composition or rennet coagulation.In total, half of the genetic variants identified in introns were found at a very low frequency (0.02-0.08) in the 30 animals whereof some are presented in Table 2.The rest of the SNPs and indels presented in Table 2 showed more variation among the animals with allele frequencies between 0.22 and 0.67.
Half the new SNPs could be found in the CSN1S1 gene.Both rs137349211C > T and rs379988962G > T on intron 13 in CSN1S1 and intron 5 in CSN1S2 are situated in the splice polypyrimidine tract.On introns 1, 2, 4, 13 and 14 in CSN1S1 (Table 2) there are deletions which have not previously been reported whereof 3 is situated in the splice polypyrimidine tract of the intron.Furthermore, the new SNPs ss9732325462, ss9732325455, ss9732325458 and ss9732325457 coincides with protein variant CSN1S1 C .The cows having CSN1S1 C differ from the reference sequence in 79 of 117 SNPs and seems to form a very clear haplotype.
The potential significance of the SNPs found in this study was investigated through a straightforward Chi-squared test and through comparisons with previous GWAS studies.Grouping the cows with regards to the rennet coagulation properties of their milk, a Chi-Square test could give indications of any possible association between individual SNPs and coagulation properties.Twelve SNPs were found to be associated with coagulation properties at a nominal significance level of P < 0.05 whereof 8 of these SNPs are presented in Tables 1 and 2 together with the allele frequency for the group of cows (n = 13) giving NC milk.The SNP rs109900747, presented in Table 1, was found to be associated with coagulation (P < 0.03) where the frequency of the C allele was higher than the expected frequency in the cows giving NC milk.It is plausible that the suggested SNP variation (Vanvanhossou et al., 2021) could have an effect on α S2 -CN expression which in turn could affect  coagulation properties.Furthermore, the Chi-squared analysis showed that SNPs rs110808655, rs133836703 and rs110466181 (Table 2) were associated with coagulation (P < 0.03) and that the frequency of the T allele was higher than expected in NC cows.Also the SNPs rs110291532 and rs136419748 (Table 2), were found to be associated with milk coagulation (P < 0.03) where the insertion of A (rs110291532) and the T allele (rs13641974) is more frequent in NC cows.As the Chi-Square test was performed on a large number of SNPs and a small number of animals these results should be seen as an indication and need to be confirmed in a larger sample set.We have found that SNPs found in this study has been shown to be significant for protein composition and rennet coagulation traits in several GWAS studies.Buitenhuis et al. (2016) who performed a GWAS on milk from Danish Holstein found that the SNPs rs110808655 and rs110122319 were associated with the relative concentration of α S2 -CN.Furthermore, rs110808655 has been shown to be associated with the concentration of β-CN in another study (Huang et al., 2012).Pausch et al. (2017) and Sanchez et al. (2017) found that the SNP rs109193501 was associated with total protein and α S2 -CN concentration respectively.Dadousis et al. (2017a) as well as Pegolo et al. (2018) found that rs110440863 was associated with relative concentrations of κ-and β-CN.Interestingly, the same SNP has also been shown to be associated with rennet coagulation time (Dadousis et al., 2017b) and curd firmness (Dadousis et al., 2016).This suggests that some of the SNPs in this small subset could be potential markers for protein composition and rennet coagulation.It must be noted that the Chisquared analyses presented here is only a preliminary analysis with limited statistical power that also cannot account for the variation between the different farms.We will do an in-depth study of the phenotypes by proteomic studies of the resulting CNs and their post-translational modifications using LC-HRMS which will be published in a later stage.In a small sample set of 30 cows with divergent phenotypes, genetic variation was shown in the CN genes and the variation predominantly occurs in the introns.Genetic variation in the exons confirmed the genetic protein variants of the CNs that we have previously reported from analyzing the phenotypic variation in the milk from the same animals.The genetic variants in the 5′-and 3′-UTR in CSN1S1 and CSN3, rs110899610T and rs109787476T, are found together with CSN1S1 C and CSN3 B , respectively.As both these variants are associated with gain and loss of different miRNA and TF it could explain differences in expression of the genetic protein variants.Chi-squared analysis as well as comparisons with previous GWAS studies show potential connections between the identified SNPs and coagulation properties of milk.Further research will focus on phenotypic variation in the milk protein sequences with investigations of detailed variation in posttranslational modifications.
Figure 1.Distribution of the different DNA variant types in the 121 found SNPs in the 30 RDC cows.

Table 1 Description of the genetic variants found in exons of the four casein genes CSN1S1
Lewerentz et al. | Sequencing of casein genes in Red cattle , CSN1S2, CSN2 and CSN3.SNPs are identified with rsID/ssID, genome position, exon, and allele frequency (AF) in 30 animals, allele frequency of the 13 animals producing non coagulating milk (NC) together with the resulting protein variants numbers retrieved from the Ensembl database; ss-accession numbers of new variants identified from the present sequencing assigned by the European Variation Archive; 2 bp = SNP position according to the ARS-UCD1.2genome; 3 ref/alt = reference/alternative allele; 4 Amino acid (AA) position in mature protein; 5 5′-untranslated (UTR) variant; 6 Synonymous variant with the same AA but different alleles; 7 3′-untranslated (UTR) variant.* Significant association between coagulation trait and allele frequency (P < 0.05).

Table 2 Genome positions for SNPs found in introns of the four casein genes
Lewerentz et al. | Sequencing of casein genes in Red cattle CSN1S1, CSN1S2, CSN2 and CSN3 including

allele frequencies (AF) in 30 animals and in the 13 animals producing non coagulating (NC) milk.
The presented SNPs in this table are either new SNPs with ssID, classed as splice polypyrimidine (pol.)tract variants, or SNPs of specific interest discussed in the present paper with rsID 1Variant ID: rs-numbers retrieved from the Ensembl database; ss-accession numbers of new variants identified from the present sequencing assigned by the European Variation Archive;.