Major Element Composition of Sediments in Terms of Weathering and Provenance: Implications for Crustal Recycling

The elemental composition of a sediment is set by the composition of its protolith and modified by weathering, sorting, and diagenesis. An important problem is deconvolving these contributions to a sediment's composition to arrive at information about processes that operate on the Earth's surface. We approach this problem by developing a predictive and invertible model of sedimentary major element composition. We compile a data set of sedimentary rock, river sediment, soil, and igneous rock compositions. Principal component analysis of the data set shows that most variation can be simplified to a small number of variables. We thus show that any sediment's composition can be described with just two vectors of igneous evolution and weathering. We hence define a model for sedimentary composition as a combination of these processes. A 1:1 correspondence is observed between predictions and independent data. The log ratios ln(K2O/MgO) and ln(Al2O3/Na2O) are found to be simple proxies for, respectively, the model's protolith and weathering indices. Significant deviations from the model can be explained by sodium‐calcium exchange. Using this approach, we show that the major element composition of the upper continental crust has been modified by weathering, and we calculate the amount of each element that it must have lost to modify it to its present composition. By extrapolating modern weathering rates over the age of the crust, we conclude that it has not retained a significant amount of the necessarily produced weathering restite. This restite has likely been subducted into the mantle, indicating a crust‐to‐mantle recycling rate of 1.33 ± 0.89 ×1013 kg·year−1.


Introduction
The elemental composition of fine-grained sediments is principally controlled by the composition of the rocks from which they formed (provenance) and subsequent reactions with fluids, which transform these source rocks into solutes and neoformed minerals in the surface environment (weathering) (Nesbitt, 1979;Nesbitt et al., 1980Nesbitt et al., , 1996Taylor, 1985). Sediment composition is further altered during transport by hydrodynamic sorting (Bouchez et al., 2011;Cullers et al., 1987;Garzanti et al., 2009;von Eynatten et al., 2012von Eynatten et al., , 2016, cation exchange with natural waters (Cerling et al., 1989;Sayles & Mangelsdorf, 1977) and during burial by reaction with porewater fluids (Fedo et al., 1995;Nesbitt & Young, 1989). A challenge for sedimentary geochemistry is to identify the contribution of each of these factors to the formation of any given sediment. Elemental sedimentary compositions can then be used to derive useful information about past climates, crustal processes, and sedimentary source regions.
Unravelling these factors is made harder by the nature of compositional data. All compositions are subject to the constraints that all components are strictly positive and must sum to a total "closure" value (e.g., 100%) (Pawlowsky-Glahn, 2015). These constraints mean that raw compositional data are not Euclidean variables and therefore, standard statistical techniques cannot be applied to them. If such techniques are used on raw data, the results can be misleading (Buccianti, 2013). Aitchison (1986) introduced log-ratio transformations of compositional data, which allow standard techniques to be applied. Once compositional data are transformed into log-ratio space, simple linear trends can be used to predict geological processes, for example, weathering, sorting, and igneous differentiation (Bloemsma et al., 2012;Ohta & Arai, 2007;von Eynatten et al., 2003). Furthermore, Ohta and Arai (2007) indicate that the majority of compositional variation in a large data set of soils can be explained considering only two processes: weathering and igneous differentiation. These findings suggest that, despite apparent complexity, sedimentary compositions are predictable and determined by only a relatively small number of processes.
In this study we build on these advances to create a general model for the composition of fine-grained sediments in terms of the composition of the rocks from which they derive and the weathering processes that modify them. First, we compile a data set of major element compositions from fine-grained sedimentary rocks, river muds, and soils. This data set is assumed to be a representative population of sedimentary compositions.
To investigate to what extent the dimensionality of this compositional data can be simplified, we use principal component analysis (PCA) on the log-ratio-transformed sediment compositions. We hence find that this data set can be well described using a small number of latent variables. This motivates us to describe compositions explicitly in terms of just two carefully calibrated compositional trends: the trends of igneous evolution (to describe sediment protoliths) and chemical weathering. To constrain the compositional trends of weathering and igneous differentiation, we gather data from an individual soil profile and an igneous rock suite, respectively. Hence, we derive a two-dimensional model, which explains the majority of the total variability in the data set of sediment compositions. The fitness of this model is then evaluated by analyzing residuals between the theoretical and observed sedimentary compositions. Finally, we demonstrate the utility of our model by analyzing the major element composition of the upper continental crust (UCC) in terms of the processes that have helped form it.

Data
All elements in this study are represented as weight percent (wt.%) oxides (SiO 2 , Al 2 O 3 , Fe 2 O 3 , MgO, Na 2 O, CaO, and K 2 O), where Fe 2 O 3 represents total iron. We exclude oxides, which were below detection limit or not analyzed to overcome the issue of zero values in log-ratio transformations (Martín-Fernández et al., 2003). Given that a full suite of major elements are typically measured, few samples are excluded from our compilation.
The composition of UCC is a useful reference point to compare to other compositions (e.g., weathering products and igneous protoliths) and calculated compositional vectors (e.g., weathering and protolith trends). Moreover, determining where the composition of UCC sits within the range of compositions observed is a useful guide to the processes that have generated and modified UCC. We take the composition of UCC defined by Rudnick and Gao (2003) on a carbonate free basis. The data utilized, as well as information about the original sources, are freely available (see Acknowledgements for details).

Sedimentary Rocks
We compiled 2,822 major element compositions of fine-grained sedimentary rocks from diverse geographical sources and of Archean to Cenozoic ages. To minimize the effect of grain size, we include only fine-grained sediments. We note that in most cases grain size, if recorded, is given qualitatively in the original source (e.g., "fine grained" and "mudrock") and that quantitative grain-size data are rarely available. To minimize the effect of carbonate contamination, we exclude samples that are reported to contain carbonate. However, because it is not always possible to detect carbonate without a mineralogical analysis, we also exclude samples with a CaO of >10 wt.% (defined with the major elements normalized to 100%).

River Sediments
Preexisting river sediment compilations (Gaillardet et al., 1999;Martin & Meybeck, 1979;Viers et al., 2009), and additional literature data are compiled to produce a data set of 261 samples (see Acknowledgements for data availability and sources). As a qualitative control for grain size, we only include suspended sediment compositions and deposited sediments described as fine grained (e.g., "bank muds") or those that were filtered to exclude the coarse fraction (e.g., <150 μm). Data collected from river depth profiles are used to explore the compositional effect of grain size (Bouchez et al., 2011(Bouchez et al., , 2012Lupker et al., 2012). These depth profiles sample river sediment at different heights in the water column where sediment grain size varies systematically as well as bed load.

Soils
A geochemical data set of 130 soil compositions is compiled from sources listed in Ohta and Arai (2007) and includes samples from a range of different climatic zones and igneous rock protoliths. A compositional trend for weathering is isolated using samples taken from different depths of a single soil profile, which are assumed to have the same protolith and therefore be affected only by differing amounts of weathering. One such sample suite is the Toorongo Granodiorite weathering profile in Southern Australia, first studied by Nesbitt (1979). This soil profile is formed on a Devonian granodiorite, dominated by chemical weathering and contains corestones. Using data from Nesbitt and Markovics (1997), who take soil samples from a pristine corestone to highly weathered material infilling joints, von Eynatten et al. (2003) found that the compositional trend defined by samples from this profile is able to reproduce the compositional variability of river sediments from across the globe. This result suggests that this trend might be a globally applicable weathering signal. Consequently, we use this same data set to define the compositional trend of weathering. It contains 15 samples collected at different locations in the Toorongo weathering profile.

Igneous Rocks
We also need to isolate the compositional trend defined by evolution from a mafic primary melt to an evolved felsic melt. This provides the compositional array of unweathered protoliths, which weather to form sediments. To do this, we compile igneous rock compositions from Crater Lake (Mount Mazama) in Oregon, USA. Crater Lake is a volcanic cluster within the Cascades volcanic arc, with well-studied igneous petrology and showing a wide range of igneous rocks from tholeiitic to calc-alkaline suites and ranging from 47-to 75-wt.% SiO 2 (Bacon, 1990). Data were obtained from the GEOROC (http://georoc.mpch-mainz.gwdg.de/ georoc/) database, which was queried with the location names "MOUNT MAZAMA(CRATER LAKE), OREGON" or "MOUNT MAZAMA, OREGON" for whole rock analyses of major elements. Samples with incomplete compositions were removed to yield 135 samples of major element compositions. Making the assumption that all the igneous rocks from Crater Lake are derived from an identical primary melt, which has not varied in time, the only process differentiating the compositions is igneous evolution. The compositional trend then will represent igneous evolution.

Centered Log-Ratio Transformation
Formally, we define a composition and indicates the closure value to which compositions are normalized for intersample comparison. In this study we use = 100 because we present compositions in terms of percentages. We emphasize that our analysis is independent of the specific value of and that the transformation we apply removes the need for samples to be normalized prior to analysis. We utilize the centered log-ratio (clr) transformation defined by Aitchison (1986). The clr transformation transforms a D-part compositional vector into a vector of D coordinates in a real space. Each initial component of a composition (e.g., CaO) corresponds to a clr coefficient, indicated by the notation c(x i ), for example, c(CaO). The transformation is described by the following expression: with g(x) representing the geometric mean of x given by The inverse operation to recover the initial composition x from the clr coefficients x ′ is simply where  is the closure operation, indicating normalization to the closure value .
As an example, Table 1 shows the raw composition (in wt.%) and also the clr coordinates for the composition of UCC. The coordinates represent the relative, not absolute, variation of each component with respect to the geometric mean of the composition, which in this instance is 6.47 wt.%. Components with values greater than the geometric mean have positive coordinates (e.g., SiO 2 ). Components with values less than the geometric mean have negative coordinates (e.g., MgO). The value of the clr coordinates is independent of the closure value chosen.
The clr transformation has two particularly important characteristics. First, it removes the effect of the closure constraint, which precludes the application of statistical procedures designed for Euclidean variables.
For example, Figure 1a shows the raw SiO 2 and Al 2 O 3 wt.% of the compiled sedimentary rock composition data set plotted against each other. Because of the closure constraint, SiO 2 and Al 2 O 3 can never sum to more than 100%, meaning that the points in Figure 1a can never lie above the line labeled 100%. In contrast, the clr coordinates can take any value and occupy the entire space shown in Figure 1b. Second, clr coordinates are sensitive to relative not absolute variation. To demonstrate the importance of relative variation, consider the following example: SiO 2 in a suite of samples varies from 60 to 65 wt.%, and CaO varies from 1 to 6 wt.%. Both SiO 2 and CaO have an absolute variance of 5 wt.%, but SiO 2 has a relative variance of 1.08, in contrast to 5 for CaO. In this instance the relative variance is important, and this is captured in the clr coordinates.
A drawback of the clr transformation is that the covariance matrix of the transformed data is 0, making some analyses, in particular formal correlation analysis, challenging (Filzmoser & Hron, 2009). An alternative transformation, which does not suffer from this limitation, is the isometric log-ratio (ilr) transformation (Egozcue et al., 2003). We note that in this study using the ilr and clr transformations yielded identical results. We choose the clr transformation as it retains a direct link between an individual compositional component The constant sum constraint means that SiO 2 and Al 2 O 3 can never sum to more than 100%, indicated by the black line. This constraint prevents application of standard statistical techniques. (b) Crossplot of centered log ratio, clr, SiO 2 and Al 2 O 3 coordinates from the same data set as Figure 1a. Unlike the raw wt.% data, the clr coordinates can occupy the entire sample space without restrictions, in this case highlighting a correlation between c(SiO 2 ) and c(Al 2 O 3 ), as opposed to the apparent negative correlation in the raw data. and a transformed coordinate, making it more intuitive to interpret than the ilr transformation. We use the clr transformation and inverse as implemented in R by the package compositions (R Core Team, 2018;van den Boogaart et al., 2015).

Principal Component Analysis
In many data sets variables covary in response to underlying processes, and it is frequently possible to simplify data sets in terms of fewer variables than the initial data set. Dimension-reducing techniques are useful ways to visualize compositional data sets of many parts and are increasingly used in the geosciences, for example, in sedimentary provenance analysis (Aitchison, 1983;Aitchison & Egozcue, 2005;Vermeesch & Garzanti, 2015). PCA performs dimension reduction by rotating the data set onto a series of orthogonal vectors along which the variance is maximized. These are known as principal components. By isolating components, which contain the most variance, the data set can be simplified. If PCA is successful in reducing the dimensionality of a data set, a large amount of variance will be contained on a small number of principal components. An alternative approach is to standardize each clr variable to a constant variance (i.e., "whitening") prior to applying PCA. However, because the variables in this study have the same units, they are comparable without standardization. Furthermore, because we want to determine the "strength" of processes that impart variance to sedimentary compositions, we are concerned that whitening the data would result in information loss. We apply PCA to the clr-transformed data set via eigenvalue decomposition using the princomp function in R (R Core Team, 2018).

Results
After applying PCA to the clr-transformed sedimentary rock compositions, we find that 84% of the total variation is contained in the space defined by the first three principal components ( Figure 2a). This means that the data set can be visualized in three dimensions with 16% loss of information ( Figure 2b). An interactive 3D version of these figures (Figures 2b and 3), which is easier to interpret than the 2D projections shown here, can be found in the supporting information. From these visualizations, we draw the following key observations: • The compositional trend of igneous evolution, as defined by the Crater Lake suite, is approximately linear in clr space (green circles in Figure 2b). • The compositional trend of weathering defined by the Toorongo soil profile is also approximately linear in clr space (hot pink circles in Figure 2b). • Soils (blue circles), sedimentary rocks (gray circles), and river sediments (light pink circles) are translated parallel to the Toorongo soil profile trend from a point of origin along the Crater Lake igneous rock trend (Figures 3b and 3c). This offset corresponds to decreasing c(Na 2 O) and c(CaO). • The sediments are also distributed parallel to the igneous evolution trend corresponding to increasing c(K 2 O), c(SiO 2 ), c(Al 2 O 3 ), and c(Na 2 O) relative to other components (Figures 3a and 3c). • Modern-day soils lie on a plane fit through UCC (black square) that is parallel to the igneous and weathering trends (e.g., the blue plane in Figures 3b and 3c). Sedimentary rocks deviate from this plane ( Figures  3a and 3b). This deviation involves an increase in c(Na 2 O) and a decrease in c(CaO). • River sediments occupy a similar space to sedimentary rocks but are only present on the side of the fitted plane higher in c(CaO) (Figure 3e). • Present-day UCC lies close to the igneous trend but translated a small amount along the weathering trend (Figures 3c and 3d).

A Model for Sediment Composition
Based on the observations presented in the previous section, we derive a predictive model for the composition of a sediment. The structure of the data shown in Figure 3 indicates that sedimentary composition is strongly controlled by the composition of its protolith and alteration during chemical weathering. At this stage we proceed assuming that these are the two most important processes that control sedimentary composition. The validity of this assumption will be subsequently tested by its ability to recover structure from the data.
We define the following equation for a clr-transformed sediment composition, x ′ , in terms of linear, unit-length, weathering,ŵ, and protolith,p, vectors the composition of UCC, and a misfit term, E. Note that the vectorsŵ andp are not required to be orthogonal to each other. Thus, This model is visualized in Figure 3a.
The vectorsŵ andp must be independently calibrated to the processes they represent, that is, weathering and protoliths, respectively. The calibration is performed by applying PCA to the Toorongo data set, for weathering, and separately performing PCA again to the Crater Lake data set, for provenance, and extracting the loadings for the first principal components of each data set. These components contain 98% and 95% of the total variation for the Toorongo and Crater Lake data sets, respectively (Table 2). To confirm that these vectors are good universal descriptors of weathering and protolith variation, we compare them to independent data sets. Forŵ we find that soil profiles on diverse lithologies are all distributed parallel to theŵ vector derived from the Toorongo profile (Figures S1a and S1b in the supporting information), which  (1) and relates sediment composition to weathering and protolith. Composition of upper continental crust, UCC, taken from Rudnick and Gao (2003). See supporting information for an interactive 3D version of this figure, which is easier to interpret than the 2D projections shown here.
is a necessary (but not sufficient) condition forŵ to be approximately the same at each locality. A further test is performed by calculating comparableŵ vectors on two other densely sampled soil profiles (Turner et al., 2003;White et al., 2001). This procedure produces similar vectors to that derived from the Toorongo soil profile ( Figure S1c). To confirm the universality ofp we calculate the and values of more than 5,000 recent North American igneous rocks from the NAVDAT data set and find that they lie parallel to our value ofp, suggesting that it is generally applicable (see also section 7.1, Figure 10). Furthermore, ifp is derived by extracting the first principal component of the NAVDAT data set, the resulting vector is practically identical to that derived from the Crater Lake suite ( Figure S2), which suggests that the calculated value is universal. We note thatp is not an attempt to model any particular genetic process (e.g., fractional crystallization) by which igneous rocks evolve their composition. Instead, it is a statistical trend derived from observations of the igneous rocks that are the most important protoliths for sedimentary rocks.
In equation (1), represents the intensity of weathering experienced by a sediment, and corresponds to the composition of the igneous protolith. The model described by equation (1) allows us to characterize any sediment composition simply in terms of the coefficients and . We choose UCC as an origin, that is, ( , ) coordinates of (0, 0), which means that primary igneous rocks have less than 0 (i.e., less weathered than UCC), the implications of which are discussed in section 7. The misfit term E is the perpendicular vector between x ′ and the model plane (i.e., E ·ŵ = E ·p = 0). Factors not explicitly considered by the model, which affect composition, will cause the misfit magnitude to rise. This could include any other sedimentary process or protolith rock compositions not described by our simplified mafic to felsic trend.

Calculating and
We can quantitatively determine the different contributions of weathering and provenance to a sediment's composition by calculating and . This solution, which calculates the coefficients for any major element composition, has been implemented in R and is freely available (see Acknowledgements for details).
We consider a measured major element sediment composition x and the corresponding clr coordinates x ′ . We want to find the values of and such that where x ′ UCC is the clr-transformed composition relative to UCC. The optimal solution to this equation is one where and minimize the misfit, E. This minimization occurs when E is perpendicular to the plane defined byŵ andp.ŵ andp are of unit length and not perpendicular to each other.
We now define a pair of orthonormal vectors,â andb, which lie in theŵ,p plane.
We can rewrite equation (2) in terms of these orthonormal vectors as follows:  (1). Changing the composition of the sediment's protolith acts to move a composition up (more felsic) and down (more mafic). Weathering a sediment causes a composition to move to the right. Green circles are compositions from Crater Lake suite of igneous rocks. Pink circles are compositions from the Toorongo soil profile. Compositional contours are calculated for the model plane (visualized in Figure 3). The origin corresponds to the composition of the upper continental crust, UCC.
Given that â ·b =b · E = â · E = 0, and can be found with However, the goal is to find and . Substituting equations (3)-(6) into equation (7) gives Comparing equation (8) to equation (2), we can find and by Using the above analytical solution, we can calculate and for any major element composition. The calculated values of and for sedimentary rocks are shown in Figure 4. Each point in this − space corresponds uniquely to a composition as indicated by the contours. Projection of samples in this space allows compositions to be related to weathering intensity and provenance, relative to the reference point of UCC.

Evaluation of Model
From equation (1), the modeled clr vector is and thus, the modeled composition x ′ fit is simply clr −1 (x ′ fit ). Modeled compositions are compared against observed compositions for sedimentary rocks in Figure 5. These compositions lie close to the 1:1 trend, indicating a good fit.
Figures 5 and S3 indicate that some elements are better explained by the model than others. CaO and Na 2 O are best fit. In other words elements like Na 2 O, which are distributed along the 1:1 (observation:prediction) line, appear to be mostly controlled by weathering and provenance. An explanation for the relatively poor fit of predicted SiO 2 and Al 2 O 3 is that their distributions are principally controlled by processes not considered in our model, such as sorting. We note that the elements that are most poorly fit by the provenance-weathering model typically have low total logarithmic variances; they do not vary significantly from their central tendency. In other words, their distributions are relatively small and as such are unlikely to deviate significantly from any model prediction.
Another way to evaluate the success of the model is shown in Figure 6. This figure demonstrates the effect that changing and has in terms of the raw compositional components of the soil data set. The observed and predicted soil values are shown to be well fitted.

Coefficients of Determination
A useful measure that quantifies effectiveness of a model in describing a data set is the coefficient of determination, R 2 . For a given data set, the R 2 is the proportion of variance explained by the model relative to the total observed variance of a data set. It is calculated to be the ratio of the total sum of the squares of modeled variation over the total sum of the squares of observed variance, both relative to the data set mean.
The R 2 values for different data sets are shown in Table 3. In all instances the model captures the majority of observed variation. The soil data set has the highest R 2 value, indicating that our model captures 88% of the total variation in soils. River sediments and sedimentary rocks have lower R 2 values consistent with observations of the distribution of the different data sets relative to the model plane in Figure 3. Soils lie close to the plane, but the river sediments and the sedimentary rocks both show significant out-of-plane variation. Of river sediments and sedimentary rocks, the latter shows more out-of-plane variation consistent with a lower R 2 . That these data sets have different coefficients of determination might indicate that some other geological processes are acting on sedimentary compositions. To further investigate this possibility we analyze the misfit between the modeled and observed values, the residuals, for the different data sets.

Residual Analysis
Analysis of the model residuals indicates the nature of the variation that is not being captured by the model and may indicate what processes cause the different data sets to have different R 2 . For a given x ′ , the residuals Note. The R 2 value is the proportion of the total observed variance that is explained by the model, ranging between 0 and 1. are given by Kernel density estimates of residuals for each element calculated from the sedimentary rocks are shown in Figure 7. For c(Al 2 O 3 ), c(SiO 2 ), and c(K 2 O), the residuals are approximately normally distributed around 0, but this is not the case for the other elements.
Plotting the oxide residuals against each other indicates a series of systematic relationships between the residuals of c(CaO), c(Na 2 O), c(MgO), and c(K 2 O) ( Figure 8). These observations suggest that there is a process that systematically affects the composition of sediments but which is not explicitly accounted for in the model described in equation (1). We consider two possible processes that might create these trends: calcite addition and cation exchange.

Calcite Addition
Equation (1) only considers the siliciclastic portion of a sediment; however, detrital and authigenic carbonate minerals, in particular calcite (CaCO 3 ), are important constituents of many sediments. Although addition of calcite adds only calcium to the major elements of a sediment, through the closure constraint, this process also affects the observed value of the other components. The modeled effect of calcite addition on the residuals is shown in Figure 8. The modeled calcite addition trend is close to the observed trends in both river sediments and sedimentary rocks for positive c(CaO). However, given that UCC is calculated on a carbonate free basis (Rudnick & Gao, 2003), it is meaningless to consider a negative amount of calcite. Hence, it is impossible to generate the observed negative c(CaO) residuals for the sedimentary rocks and some river sediments. The strong relationships between residuals are surprising given a lack of clear relationship between the raw clr variables (e.g., Figure 8e). These residual relationships are compared to the modeled effect of calcite addition and cation exchange (green and purple lines, respectively). First, we consider a sediment, which lies on the model plane. To model calcite addition we systematically vary the amount of Ca 2+ in compositions. For cation exchange, Ca 2+ and Na 2+ were exchanged in molar quantities according to a 1:2 stoichiometry. The residuals of these new modified compositions are then calculated. The cation exchange trend closely fits the observed residuals for both positive and negative c(CaO) residuals. The modeled trends are independent of the initial composition of the sediment. Figure 8b is a close-up of Figure 8a.

Cation Exchange
An alternative process is cation exchange. Cations adsorbed to the surface of clays exchange with those in solution to maintain equilibrium with the surrounding fluid. Such reactions are important components of the geochemical cycling of a number of elements (Cerling et al., 1989;Sayles & Mangelsdorf, 1977). In natural sediments these reactions primarily, but not exclusively, occur when a fluvial sediment is deposited in seawater. Due to the sudden change in salinity and solute chemistry, clay-bound cations exchange with those in the seawater. The clays in the sediment are observed to exchange their Ca 2+ for Na + , K + , and Mg 2+ from seawater (Lupker et al., 2016;Sayles & Mangelsdorf, 1977). As a result, a sedimentary rock deposited in a marine setting will have an excess of Na, Mg, and K relative to the river sediment from which it derived.
While all cations can exchange, the primary reaction observed in nature is an exchange of Na + in solution for Ca 2+ in clays according to the reaction: The modeled effect of this reaction upon the residuals closely matches the observed trends ( Figure 8). Unlike carbonate addition, this can also explain negative c(CaO) residuals. This process also explains the distinct residual distributions of river sediments and sedimentary rocks.

Effect of Grain Size
Hydrodynamic sorting of sediments fractionates different mineral phases and changes the composition of a sediment (e.g., Bouchez et al., 2011;Garzanti et al., 2009). We want to know what effect sorting and grain size have on a composition as described by our model. For example, sorting might aliasŵ orp or contribute to out-of-plane misfit. We can constrain this effect using river sediments sampled at different depths in the water column, which will have experienced natural hydrodynamic sorting (Bouchez et al., 2011(Bouchez et al., , 2012Lupker et al., 2012). Samples collected from the same depth profile are assumed to have the same provenance and to have been differentiated only by sorting. Figure 9 shows the observed ranges for and within a single depth profile, including the bed load. This is a measure of the intrasite variability introduced by sorting. On average, is most strongly affected by sorting while is more weakly affected by sorting. In general the suspended sediment appears more weathered than the bed load. Both of the intrasite variances are relatively small compared to the total observed variation in and (∼4 and ∼6, respectively). We therefore conclude that sorting does affect composition but that it can be considered as a relatively minor effect superimposed on top of the larger trends described by our model.

Residual Magnitude
There is a systematic variation of the total error magnitude with ( Figure S4a). This variation indicates that the value ofŵ as derived from the Toorongo profile is not perfectly calibrated for representing average weathering. For highly weathered samples the reliability of the calculated value of will therefore be reduced. The miscalibration may be due toŵ being derived from just one individual site, which increases the likelihood of random noise distorting the signal. The value used in this study, however, probably captures most of the important variability ( Figure S1). There is no observed systematic variability with changing ( Figure S4b). . Dashed line indicates a mixing trend between a rhyolite and a basalt ( coefficients of −2 and +2, respectively, black squares). Arrow indicates the offset between UCC and pristine igneous rocks. NAVDAT samples with > 0.5 and < −1, as well as from two studies with high numbers of anomalous samples (Heliker, 1995;Ogden, 1979), are indicated here but excluded from further analysis. (b) Black line = kernel density estimate of the NAVDAT values, excluding anomalous samples indicated in Figure 10a. Vertical dashed line = mean NAVDAT value (−0.304), gray band = 1 confidence region for mean. Colored vertical bands are the values for different estimates of UCC. Differences between these estimates are discussed in the main text. All estimates of UCC lie offset positively, that is, more weathered, than the NAVDAT mean. Arrow indicates the offset between the value of UCC we use and pristine igneous rocks.

Application of Model: Composition of the Upper Continental Crust
We now demonstrate the utility of our model (equation 1) by describing the average composition of the UCC in terms of the processes from which it formed. The present-day composition of UCC is felsic, in contrast to the mafic composition of the oceanic crust. This felsic UCC is argued to have existed since ∼2.5 Ga, prior to which it was more mafic in compostion (Tang et al., 2016), although others have suggested an early felsic crust (Greber & Dauphas, 2019;Greber et al., 2017). The present-day felsic composition however is difficult to reconcile with the mafic composition of mantle primary melts (Rudnick, 1995). One theory is that a primary basaltic crust undergoes intracrustal separation driven by density contrasts, resulting in a mafic lower crust and a felsic composition for the UCC (Hawkesworth & Kemp, 2006). However, it has also been proposed that the composition of UCC has been modified through time by chemical weathering (e.g., Lee et al., 2008). Chemical weathering enriches the weathering residue (i.e., sediments) in relatively insoluble elements, which may then be incorporated into the crust by accretion at subduction zones or by incorporation into metamorphic rocks, such as S-type granites (Rudnick, 1995). Recycling of weathered sediments back into the crust thus causes the continental crust to become progressively enriched in insoluble elements over time. This hypothesis is supported by independent isotopic proxies (Liu & Rudnick, 2011; Note. x 1 is the present-day composition of UCC. x 0 is the predicted pristine precursor to the upper crust calculated using the model. f is the fractional mass loss of each estimate assuming Al 2 O 3 immobility. M 0 is the estimated mass of each element of the unweathered continental crust. ΔM is the calculated mass loss of each element given in both oxide and elemental form. Details of the calculation are given in the main text. Simon & Lácuyer, 2005). Our model can in principle disentangle the effects of igneous and weathering processes using purely the major element composition.

Weathering of the Upper Continental Crust
If we want to interrogate whether UCC has been affected by weathering, we need to define where pristine unweathered igneous rocks lie in , space. The Crater Lake suite of igneous rocks likely lies approximately on this value, but to buffer our value against possible local effects, we gather a much larger database of igneous rocks. We analyze a data set of North American igneous rocks extracted from the NAVDAT database (http://www.navdat.org/). We query for whole rock analyses of major elements for recent (<1 Ma) plutonic and volcanic rocks from anywhere in North America. This yielded 5,398 compositions once incomplete compositions were removed. The and values for these values are shown in Figure 10. As expected, these compositions lie on a linear trend of constant , consistent with the Crater Lake suite. There are a number of outliers to this trend, many of which come from two particular studies, which are treated as anomalous and excluded (Figure 10a). The distribution of the values for these samples is shown in Figure 10b, alongside the calculated values for different estimates of UCC (Condie, 1993;Rudnick & Gao, 2003;Taylor, 1985;Wedepohl, 1995).
In clr space, mixing of compositions is nonlinear, and therefore, mixing trends will map onto a curved path in -space (Weltje, 2012). This curvature means that mixing of different pristine protoliths could introduce spurious changes in . This effect is important to constrain as mixing between primary mafic and more evolved melts is a possible cause for the intermediate composition of the UCC (Rudnick, 1995). A synthetic mixing trend between basaltic and rhyolitic compositions is shown in Figure 10a indicating that while this curvature does introduce a slight bias to higher values, the magnitude of this artefact is small.
The mean value for the pristine igneous rocks is −0.309 ± 0.137. This is less than the values for all the different estimates of UCC considered here (Figure 10b). All of these estimates of UCC therefore appear more weathered than the average pristine igneous rock. There are differences between the UCC estimates however. The estimates of Taylor (1985) and Wedepohl (1995) lie within one standard deviation of the mean pristine igneous rock. In contrast, the estimates of Condie (1993) and Rudnick and Gao (2003) have higher values, with the latter lying two standard deviations from the mean pristine igneous rock values.
The differences between the estimates partly reflect different methods in calculating the composition of UCC. Rudnick and Gao (2003) calculate UCC by averaging a large number of surface outcropping rock compositions, including sedimentary cover. Condie (1993) also includes sedimentary rocks in their calculation but introduces a weighting to the average composition by areal outcrop of different rock types. In contrast, Taylor (1985) and Wedepohl (1995) calculate their compositions of UCC using the same large-scale sampling survey of the Canadian shield. That these estimates appear less weathered than the others may reflect sampling of a smaller proportion of sedimentary rocks due to the geology of the Canadian shield (Rudnick & Gao, 2003). We use the estimate of Rudnick and Gao (2003) as the canonical composition for UCC as it is calculated by averaging the largest number of samples from across different continents.

Quantification of Mass Loss due to Chemical Weathering
Taking the estimate of Rudnick and Gao (2003) as the value of UCC, we can estimate how much mass must have been removed from UCC by chemical weathering (i.e., purely in dissolved form, not mass loss due to physical erosion). We define a pristine precursor rock for UCC by projecting the UCC composition to the value defined by the pristine igneous rock trend. This gives a composition, x 0 , as shown in Table 4 alongside the observed present-day UCC composition, x 1 . This composition suggests a dacitic (SiO 2 = 65.7%) igneous precursor to the present UCC, which is not a primary mantle melt. Therefore, we assume that a process of igneous differentiation acted upon a primary mantle melt, to create this dacitic precursor, prior to it being weathered to its present-day composition. Changing the order in which these processes occur does not affect the results.
The compositions of the UCC and its pristine precursor only contain information about the relative mass of its components. To calculate the absolute mass difference between the two compositions we need to identify an element to act as a reference point. Aluminum is typically not mobilized during chemical weathering, remaining almost totally in the weathering restite, so we make the assumption that the total mass of Al 2 O 3 in the UCC has been constant. Making this assumption, for an observed composition, x 1 , we can calculate the fractional mass change of each element, f, relative to the initial composition x 0 as follows: where M 0 is a vector of the initial mass of each element, i, and M is a vector of the absolute mass change of each element. We note that this is an identical construction to the mass transfer coefficient, , and comparable to the chemical depletion factor (Brimhall & Dietrich, 1987;Riebe et al., 2003). The calculated values of f for the upper crust are shown in Table 4.
To calculate the absolute amount of mass lost from the crust due to chemical weathering, we require an estimate of the initial mass of the crust prior to weathering. To calculate this value we multiply estimates of the volume of the UCC by an appropriately chosen density. The initial composition of the precursor rock, x 0 , is felsic, so we assume an initial density of 2,700 kg·m −3 . The volume of the total continental crust is given by Cawood et al. (2013) to be 7.2 × 10 18 m 3 of which UCC makes up the upper 30% of the thickness. This means that UCC has a volume ∼0.3 times that of the total continental crustal volume, giving 2.16 × 10 18 m 3 . Using a density of 2,700 kg·m −3 , we estimate that UCC had an initial mass of 5.83 × 10 21 kg, assuming a constant crustal volume (i.e., isochoric weathering). The initial mass of each element can be calculated by multiplying this value by the composition x 0 , shown in Table 4. The absolute change in mass for each element, M i , can thus be calculated by The calculated mass loss for each oxide is shown in Table 4, also displayed converted to elemental mass losses.
This estimate makes the assumption of negligible mass contribution of other minor elements. Given that the major elements, by definition, make up the bulk composition of a sediment, this assumption is probably appropriate. In addition, this estimate only considers the silicate portion of the UCC. Much of the calcium, and some of the magnesium, lost from the silicate crust by weathering is reincorporated as carbonate sediments, which are not considered in the bulk composition of UCC considered here (Rudnick & Gao, 2003).

Implications for the Fate of Crustal Sediments
We have calculated the mass of each element lost by weathering required to evolve a juvenile felsic igneous protolith to the modern UCC composition. Now, we compare these values to the present-day mass fluxes derived from silicate weathering shown in Table 5 (data for Fe and Al were not available). We can use these two values to calculate a "weathering age" of the UCC for each element, which is the time it takes, at present weathering rates, to modify the UCC composition to its present state. This age is calculated by dividing the mass loss for each element by the dissolved load flux for that element. However, when these values are calculated, they are considerably less than the expected age of the crust. For example, Chauvel et al. (2014)  Note. The total weathering mass loss is the amount of mass removed from crust in solution at present weathering rates using a model crustal age of 1.8 Ga (Chauvel et al., 2014). The "unbalanced" dissolved mass is the difference between the total mass loss and the mass loss that can be explained by the weathered present-day UCC composition (Table 4). Assuming a characteristic ratio for how much of each element is produced in solution relative to solid restite, we can thus calculate the net transport of each element off the upper continental crust as weathering restite, that is, sediments.
Using the average composition of the sedimentary rocks in our data set, we can turn each of these estimates into a total mass of the "missing" sediment. All the estimates agree within an order of magnitude giving an average value of 2.41 ± 1.60 × 10 22 kg. a Values calculated from Berner (2012) derived from global river compilations, correcting for human influence. b Calculated from Berner (2012) using the average ratio of mass transported in solution (silicate-derived fraction) relative to suspended sediment in rivers globally. This assumes steady-state weathering. c Calculated using the arithmetic mean of the raw wt.% elemental composition for the sedimentary rock data set.
provide a model crustal age based on Nd isotopes of 1.8 Ga, but the weathering ages of the UCC range from ∼800 Ma for Na to ∼120 Ma for Mg. Ca, Si, and K return ages of ∼550, ∼480, and ∼200 Ma, respectively.
The discrepancy between these ages of the order 100 Ma and the model crustal age of 1.8 Ga could be a result of the assumption in the calculation that all of the solid weathering restite is retained on the crust during weathering. If the restite, however, is transported with the dissolved weathering products off the continents, the net composition of the crust does not change. Should this be the case the weathering ages calculated above will be an underestimate, because this makes weathering less efficient in evolving crustal composition than the dissolved elemental fluxes alone imply. Solid weathering products, restite, are evidently transported to ocean basins by rivers and glaciers. This fact likely explains why our weathering age estimates do not agree with externally calibrated ages of the crust. This hypothesis is supported by the fact that most sediments stored on the continental crust are Phanerozoic in age, consistent with the weathering ages of order 100 Ma that we derive above (Peters & Husson, 2017). The next logical step is to estimate the amount of this restite that has not been incorporated into the crust.
The mass of dissolved material required to modify the present-day composition of the UCC (Table 4) can be considered "balanced" by a restite retained on the continents and contributing to its present composition. The amount of "unbalanced" dissolved weathering product is therefore the difference between these balanced values and the total amount of dissolved weathering product produced since the extraction of the crust at ∼1.8 Ga, that is, the amount of weathering where both the dissolved load and the restite were transported to the oceans. For this calculation we only consider a model crustal age, that is, the average time since the crust was extracted from the mantle. This instantaneous crustal formation scenario is a simplification, and more detailed growth scenarios can be produced (see Hawkesworth et al., 2019). As these scenarios are broadly consistent with a model age of 1.8 Ga, the simplification will not significantly affect our results. An estimate of the time-integrated dissolved product for each element is shown in Table 5, calculated by simply extrapolating present-day silicate weathering rates over 1.8 Ga.
This unbalanced dissolved weathering product can be used to estimate the amount of unbalanced solid weathering restite, by assuming a characteristic ratio of dissolved to solid products produced during silicate weathering. In this instance we calculate the ratios using the average amount of mass of each element carried in suspension relative to solution in rivers globally, using values from Berner (2012). These ratios are used to calculate the total amount of restite (i.e., terrigenous sediments) in excess of that restite, which has been incorporated into the upper crust, as shown in Table 5. These can be used to estimate the total mass of the restite, if we assume that it has the typical composition of a sedimentary rock. For this typical composition we use the arithmetic mean of the compositions in our sedimentary rock data set. The total amount of mass is found by dividing the modeled mass of each restite element by the proportion within a sediment. All of the elements independently give similar values, with an average value of 2.41 ± 1.60 × 10 22 kg.

Implications of UCC Composition for Crustal Recycling
Using the composition of UCC, we have estimated the size of a reservoir of weathering restite that has been produced by chemical weathering over the age of the crust but has not been incorporated into the upper crust itself. This still leaves the location of this sedimentary reservoir unresolved. One possibility is that the sediment has simply been deposited in the ocean basins. The volume of sediment deposited on continental margins has been estimated by Straume et al. (2019) to be 1.43×10 17 m 3 . Assuming a density of 2,500 kg·m −3 , this corresponds to a mass of 3.58 × 10 20 kg, which is only 1.5% of our estimate of the size of the reservoir. We thus discount the ocean basins as a potential long-term sink. Furthermore, the oldest oceanic crust is ∼200 Ma in age, precluding it as a long-term sink for sediments in the context of the 1.8-Ga model age of the continental crust. Consequently, there is ∼10 22 kg of sediment that has been incorporated neither into the upper continental crust nor at present stored in the ocean basins. We suggest then that the sediment has likely been removed from the Earth's surface via subduction zones.
For such a volume of sediment to be subducted is striking, as the mass we calculate is approximately four times our estimate of the present-day mass of the UCC (Table 4). This would suggest that since its extraction from the mantle, on average 1.8 Ga, the upper crust has been recycled multiple times, with the solid product subducted into the mantle (or at least isolated from the upper crust, e.g., by incorporation into the lower crust). How does this value compare to other estimates of crust-to-mantle recycling rates? Assuming the sediment reservoir is recycled uniformly over 1.8 Ga, this gives a recycling rate of 1.33±1.00×10 13 kg·year −1 . This estimate is slightly higher than but, within error, comparable to other estimates of crust-to-mantle recycling rates, which range from <0.11 to 0.84 × 10 13 kg·year −1 (McLennan, 1988).
Is it possible to subduct sedimentary material at this rate? Assuming that the subducted ocean crust has a plausible thickness of 1 km of crustal derived sediments deposited on it, this requires an oceanic crust subduction rate of 5.32 km 2 ·year −1 . Bird (2003) estimates that present-day subduction zones are 6 × 10 5 km in length. Assuming this value is approximately stable through time, to subduct the material requires a plate convergence rate of 8.9 cm·year −1 . This convergence rate is comparable to average subduction velocities calculated from plate and geodynamic models (e.g., Behr & Becker, 2018). This simple calculation shows that the mantle could be the repository for our sediment reservoir.
It is important to consider alternative explanations to significant subduction of sediments, especially explanations that could explain why our estimate of crust-mantle recycling rate is larger than previous estimates. One possibility is that the calculated composition of UCC in Rudnick and Gao (2003) is not representative and in reality, the UCC has incorporated more restite, but it is not represented by the composition at the surface. This could occur if the restite is preferentially incorporated below the surface. Given that sediments however are generally deposited in sedimentary basins and stored near the surface of the crust, this seems unlikely.
Another possibility is that the value of UCC we use is not representative of the upper crust. In other words, the restite is incorporated heterogeneously laterally and has so far not been sampled for estimates of the average composition of UCC. Such large-scale heterogeneity is conceivable, as prior estimates of UCC composition, which sampled smaller areas of the surface, are observably different to the Rudnick and Gao (2003) estimate we use because of its greater data inclusion (Figure 10b). For this effect to explain the totality of the missing reservoir, the unsampled portion of UCC would have to contain the total amount of restite we consider "missing," which is implausible. However, it is possible that our estimate of recycling rate could be be brought closer to alternate estimates, should future estimates of UCC composition be found to be more intensely weathered than Rudnick and Gao's (2003) estimate.
Mixing between pristine basaltic and rhyolitic compositions introduces a small bias to higher values, which results in the resulting mixture appearing slightly more weathered than it is in reality (Figure 10a). While we have shown that this effect is unable to explain the observed value of UCC, it does mean that the value of could be a slight overestimate, which would reduce the size of the subducted sediment reservoir. A further possibility is that present-day weathering rates cannot be extrapolated over 1.8 Ga and that they were lower in the past. Such a conclusion is difficult to rule out because quantitative constraints on weathering rates are lacking in deep time. However, one constraint arises from the role weathering plays as a long-term carbon sink. To maintain a stable climate over 1.8 Ga, the chemical weathering rate must broadly be set by the rate of mantle CO 2 degassing (Broecker & Langmuir, 1985). Consequently, unless CO 2 degassing rates were much lower in the past, it is difficult to conceive how the weathering rate also was at a lower level for sustained periods of time. While this seems difficult to conclusively rule out, on the basis of a broadly stable climate, we consider that sustained low weathering rates are unlikely.
Finally, we test the robustness of our finding against different values of crustal model age. Varying the crustal model age from 1 to 3 Ga results in an estimate of subducted material changing linearly from 1 to 4.5×10 22 kg ( Figure S5). This range of crustal model ages is larger than typically considered, and so this range of subducted masses should be considered as conservative (see Hawkesworth et al., 2019).
Thus, while we consider the potential limitations of the data used, if one assumes that the calculated composition of UCC is correct and also that global silicate weathering rates have remained approximately constant since 1.8 Ga, the conclusion is that there is an amount of sediment, equal in mass to four times the present day upper crust, that has likely been removed from the Earth's surface and inserted into the mantle by subduction.

Quantifying Confidence Intervals
The approach we define here makes predictions of sediment composition by projecting observed data onto a plane defined by weathering and provenance vectors. It is therefore important to test the uncertainties of these predictions. The root-mean-square (RMS) misfit of each element can be considered as the standard deviation of observed values relative to the predicted value in clr space, if a normal distribution of residuals is assumed. When these values are converted back to wt.% space, we can observe the uncertainty ranges for the model predictions for a particular element.
These uncertainty ranges are displayed for Na 2 O and SiO 2 in Figure 11, using the composition of UCC as an arbitrary initial composition and the clr RMS misfit of the sedimentary rock data set. The 1 variability clearly lies within the observed range for each element, although this demonstrates that some elements are relatively better fitted by this model than others (see also Figure 5).
While this approach of varying individual elements within its individual range of uncertainties is an easy way to visualize confidence intervals, it has drawbacks. Our model makes use of relationships between elements to extract information about the origin of a sediment, which provides more information than elements treated in isolation can provide. For example, while the uncertainty of predicted SiO 2 may be relatively large, if the predicted Na 2 O and K 2 O wt.% values are high (and with a smaller confidence interval), we can be more confident that the sediment is derived from a felsic rock than we could if using the SiO 2 prediction alone.

Recycling and Metamorphism
Sediments derive from a range of different rocks, which can broadly be divided into primary igneous rocks, recycled sedimentary rocks and metamorphic rocks formed by melting of preexisting crustal rocks. Each of these groups shows internal compositional variability. How can this complexity be reconciled with the observation that assuming a one-dimensional protolith vector successfully explains much of the compositional variation in sediments? First, we have shown primary igneous variation to be well approximated by a single vector (Figure 10a). This suggests distinctions such as alkaline/subalkaline are minor variations superimposed on the larger igneous evolution trend. Second, sedimentary recycling can be considered in our framework in terms of linear addition. Consider, for example, a sediment that lies on the 2D plane defined byŵ andp. If this sediment is recycled, any derived sediment will also lie on this plane, possibly translated along theŵ vector. Thus, it is not possible (from major elements alone) to distinguish a recycled sediment from a "primary" sediment. This inability to identify recycled sediments is not a limitation with our approach specifically but rather a larger problem with using purely elemental geochemistry. Finally, metamorphic processes occurring in a closed system will not affect the protolith's major element chemistry and are therefore compatible with our assumptions. However, if the metamorphosing sediment melts and this melt was extracted (e.g., to produce an S-type granite), this could cause a change in major element chemistry not accounted for in our model. That the model is still successful, despite such processes, may be explained first, by partial melting during prograde metamoprhism evolving compositionally parallel to the primary igneous vector,p. Second, it could indicate that partial melting is not ubiquitous enough to generate significant misfit when considering geographically extensive data sets. Finally, it could be that metamorphic partial melts remain close to their source rocks on the regional scale of sedimentary routing systems, such that during erosion and transportation, the separated compositions are recombined.

Alternative Methods of Sedimentary Geochemical Analysis
Compositions only contain information about the relative values of their components, as opposed to absolute values (Aitchison, 1982). Consequently, a common approach to analyze sedimentary compositions, and geochemical data in general, is to use indices derived from elemental ratios as proxies for specific processes. For example, numerous weathering indices have been previously developed and used, such as the Chemical Index of Alteration (CIA), the Chemical Index of Weathering (CIW), and the " " group of weathering mobility indices (Gaillardet et al., 1999;Harnois, 1988;Nesbitt & Young, 1982). Comparing the value that we derive to the CIA and CIW shows a good correspondence, as expected, if all these indices reliably track weathering (Figures 12a and 12b). Our index in design is most similar to the W index of Ohta and Arai (2007), who use PCA on a suite of soil compositions to define weathering and provenance indices, which have a number of advantages over elemental ratios. However, Ohta and Arai (2007) directly interpret the first and second principal components of their data set as weathering and provenance vectors. While this might be a useful approximation, it is inappropriate if the vectors are not orthogonal, which our analysis indicates to be the case. A novel advantage of our methodology is the ability to restore the full major element composition of a sediment's unweathered protolith, something not possible using indices constructed from ratios alone.
While our methodology has a number of advantages, we appreciate the convenience and ease of use, which geochemical ratios offer as data exploration tools. We have therefore identified three log ratios of wt.% data, which are closely linearly related to weathering, provenance, and cation exchange, respectively (Figures 12c-12f, Table 6). These are easy to calculate and can act as approximate "rules-of-thumb" indicators and data exploratory tools.

Conclusions
Weathering and provenance are the dominant controls on the major element composition of sediments. We have developed a model that describes a sediment's composition in terms of these processes. When applied to fine-grained sediments, the model successfully explains the majority of observed compositional variation in all types of sediment. Deviations from model predictions indicate what secondary processes affect sediment compositions. Inspection of these deviations suggests that cation exchange of sodium and calcium could play a role in setting sediment composition. The model can be used to calculate the composition of a sediment's protolith and the intensity of weathering it has experienced.
The utility of our approach is demonstrated by showing that the composition of the upper continental crust has been modified by weathering, relative to a pristine igneous precursor. We quantify the mass of each element that has been removed by weathering to sufficiently modify the composition of UCC to its present state