The Crossover Time as an Evaluation of Ocean Models Against Persistence

A new ocean evaluation metric, the crossover time, is defined as the time it takes for a numerical model to equal the performance of persistence. As an example, the average crossover time calculated using the Lagrangian separation distance (the distance between simulated trajectories and observed drifters) for the global MERCATOR ocean model analysis is found to be about 6 days. Conversely, the model forecast has an average crossover time longer than 6 days, suggesting limited skill in Lagrangian predictability by the current generation of global ocean models. The crossover time of the velocity error is less than 3 days, which is similar to the average decorrelation time of the observed drifters. The crossover time is a useful measure to quantify future ocean model improvements.


Introduction
Improvements in ocean current modeling are important for numerous applications from tracking marine pollution Marta-Almeida et al. (2013) to understanding the global circulation Melsom et al. (2009). Therefore, the benchmarking of ocean current forecasts is important in quantifying improvements.
Persistence has been extensively used in meteorology ( Van den Dool, 2007) but less in oceanography (Farrara et al., 2013;Oey et al., 2005;Rowe et al., 2016;Shriver et al., 2007;Solabarrietaa et al., 2016;Vandenbulcke et al., 2009) as a performance benchmark. For quantifying improvements in ocean current forecasts, particularly the Lagrangian predictability, persistence has been typically taken as either a constant last known position (Barron et al., 2007;Kuang et al., 2012;Schmidt & Gangopadhyay, 2013;Ullman et al., 2006) or constant last known velocity (Paldor et al., 2004;Rixen & Ferreira-Coelho, 2007) of a surface-drifting buoy held constant. Surface-drifting buoys provide unique observations of the ocean currents capturing various scales of motion (Lumpkin et al., 2017). The observed trajectory paths can be compared with simulated trajectories from the model providing a stringent test of the accumulation of errors and the chaotic nature of Lagrangian flow (Özgökmen et al., 2000).
There are a number of knowledge gaps we aim to fill. All these studies where done for specific regions, and here we present a first global analysis of ocean model performance against persistence. So far, the evaluation of persistence has been given as an error or skill score for different times. This lacks an easy physical interpretation. Here we propose a new diagnostic in quantifying improvements in ocean modeling, denoted as the crossover time. This represents the time scale after which the model performs better than persistence for any chosen metric.
The paper is organized as follows; the crossover time, persistence, and evaluated ocean model are presented in section 2, the crossover time analysis in section 3, and discussion in section 4 followed by the conclusion in section 5.

Methodology
The crossover time provides a way of understanding the relative predictability of a model with respect to the benchmark forecast of persistence. For this study, persistence is formulated as keeping the last known velocity, V 0 . This constant velocity is then persisted throughout the prediction, representing perhaps the simplest forecast that can be made from the drifters themselves. This is a natural first simple forecast for a numerical model to outperform. We then define the crossover time as the time, T c at which the model is just about to outperform persistence, that is, when the model and persistence performance are equal: where M is a measure of goodness as a function of time, which can be flexibly defined and based on any chosen error or skill metric. The subscripts p and m denote that M has been estimated using the persistence and model results, respectively, and T c is the crossover time given that the persistence (M p ) and model (M m ) performance are now equal, that is, the moment the model is about to "crossover," becoming the best predictor.
The better the model performs, the shorter time it takes to outperform persistence and thus the shorter is T c . Conversely, the better persistence performs (linked to the memory of the local circulation), the more challenging it is for the model and thus the longer is T c .
An example of a reasonable choice of M is the commonly used Lagrangian separation distance (Barron et al., 2007;Berta et al., 2015;Castellari et al., 2001;Liu et al., 2014;Phillipson & Toumi, 2017;Rixen & Ferreira-Coelho, 2007;Muscarella et al., 2015), that is, the separation distance between simulated trajectories (computed via an advection scheme) and observed drifters (km): where the subscript x represents the longitudinal position and y the latitudinal position at time t. S is the simulated trajectory for either the model (M m ) or persistence (M p ), and O is the observations.
To demonstrate the flexibility of the concept, the T c is also calculated for the root-mean-square error (RMSE) of the model (persistence) velocities and inferred velocities of the observed drifters (m/s): where S changes from equation (2) and instead denotes either the model-simulated velocity fields (U and V) interpolated spatially to the location of the drifters and temporally to the 6-hourly output frequency (M m ) or the persistent velocity (M p ) for every i drifter, at time t.
T c (d) therefore refers to the crossover time of the Lagrangian separation distance d (equation (2)), and T c (RMSE) refers to the crossover time of the velocity RMSE (equation (3)).
For illustration, Figure 1 is a schematic of the crossover time calculated using the separation distance T c (d) between a model, persistence, and drifter trajectory. Here T c (d) occurs where the separation distance for the model is equal to the separation distance for persistence. T c (d) is about 3 days here. For a perfect model, the drifter would represent the exact path. And therefore, always outperform persistence with T c (d) equal to 0.
To quantify and test the concept of the T c , the global 1/12 ∘ resolution MERCATOR ocean model forecast and analysis (http://www.mercator-ocean.fr) are assessed for July 2015 and January 2016 using Surface Velocity Program (SVP) drifters maintained by the Global Drifter Program's (GDP) Drifter Data Assembly Centre (Lumpkin & Pazos, 2007). The model forecast is run out 7 days, so we compare four model forecast and analysis cycles 7 days apart in each month. This provides over 8,000 independent trajectories globally, allowing for robust statistics of the metric. The model analysis contains the assimilation of satellite sea surface temperature (SST) and sea surface height (SSH).
To compute the separation distance (equation (2)), the trajectories for both the persistence and MERCATOR models are advected using a fourth-order Runge-Kutta advection scheme (Abdel Karim, 1966). For simplicity, only the advection (i.e., the forcing of the ocean currents alone) acts on the simulated trajectories.

Global Summary
The persistent trajectories outperformed both the model analysis and forecast trajectories, for the majority of the prediction of separation distance. The instantaneous persistence (last known velocity) performed significantly better of up to 6 days compared to the model analysis and greater than 6 days compared to the model forecast. Calculating an average persistence through averaging prior observed drifter velocities degrades the persistence performance (not shown). The last known velocity is typically the optimal choice.
The improvement of the persistence trajectories over the model analysis trajectories initially increases with time ( Figure 2a), peaking at approximately 3 days with a separation distance as much as 13.5 km (Figure 2a). The separation distance increases with time, and so does the relative difference between the model analysis and persistence. However, the separation distance of persistence increases at a slower rate, so persistence is a more effective predictor on short time scales. Beyond 3 days, the improvement of the model analysis increases with time and is the best predictor on average after about day 6 ( Figure 2a). This T c (d) of 6 days represents the limits of persistence as a useful predictor compared to the model analysis. The model forecast fails to ever reach a crossover time for the entire span of the available prediction. During the study period the average decorrelation times of the drifter velocities was found to be 1.2 days (not shown) in approximate agreement with Poulain et al. (1996), LaCasce (2008), Döös et al. (2013), and Lumpkin and Johnson (2013). This time scale is very similar to the e-folding time scale of persistence (black line in Figure 2a).
It is worth noting that while we concentrate on the average of the trajectory error sample, the spread is often larger than the value of the median. This large spread is expected for chaotic trajectories. Nevertheless, the average is useful to demonstrate the concept of the crossover time.
For the velocity error and T c (RMSE) (Figures 2b and 2c), we find a significantly shorter crossover time of about 1-3 days, close to the average decorrelation time of the drifter velocities (1.2 days). Beyond this time, both the model analysis and forecast velocities were significantly more skillful than the persistent velocity by 0.05 m/s on average (25-30%). The velocity RMSE of the model forecast and analysis are very large (0.15-0.2 m/s) and do not increase as the prediction horizon increases. This behavior suggests that the error may represent the unresolved subgrid-scale processes of the model, that is, the model noise.
The T c (d) of 6 days for the model analysis trajectories of 6 days corresponded to a drifter displacement (a measure of how far on average the drifters have displaced from their initial location) of approximately 90 km (Figure 2a). A second time of interest is when the model analysis begins to start at least to improve on persistence; this occurs near day 3 when the typical drifter displacement was about 50 km or nearly 5 times the global model resolution. However, for T c (RMSE) (Figures 2b and 2c), at the shorter crossover time (1-3 days) the drifter displacement corresponded to about 20-30 km-approximately 2-3 times the model resolution.

Spatial Distribution
The spatial variability of the crossover times is next explored. Although a whole year would be preferable, 2 months is a computational compromise to capture adequate variability and demonstrate the concept further.
On day 3 the model analysis trajectories (Figure 3a) generally have more skill in the open ocean and less skill in the more energetic regions such as the boundary currents. Similarly, the persistence trajectories ( Figure 3c) also have larger relative separation distances in energetic regions. However, in a direct comparison (comparing Figures 3a and 3c), persistence shows more skill over the model analysis independent of location to first order.
On day 6 the model analysis trajectories (Figure 3b) perform significantly better in comparison to the persistence trajectories (Figure 3d) than at day 3 with a similar proportion of lower separation distance regions. At this later stage in the prediction, the inhomogeneity of the differences becomes more noticeable. Regions of higher kinetic energy (Ishikawa et al., 1997) were better captured by the model analysis especially the energetic region of Atlantic Circumpolar Current (ACC). However, it should be noted that even at this later prediction time some regions are still strongly favored by persistence such as the Peru Basin, South Equatorial current, the subpolar North Pacific, and central subtropical North Atlantic. Figure 3e shows the spatial distribution of T c (d). The majority of the ocean model has a T c (d) of greater than 3 days. However, large local variability is evident; such is the case for the Pacific Ocean, for example.  Qualitatively, some regions can be highlighted where the T c (d) is consistently large or small. The Peru Basin, for instance, has the most uniform cluster of larger T c (d). Conversely, groups of shorter T c (d) were located in parts of the ACC between South America and Africa.

Geophysical Research Letters
For T c (RMSE) (not shown) the majority of the ocean has a shorter T c (RMSE) of less than 3 days. Areas of comparatively larger T c (RMSE) closely resemble those of larger T c (d) (Figure 3e) such as in the Peru Basin.

Discussion
The globally averaged T c (d) is 6 days for the model analysis and even longer for the model forecast. Therefore, on average the persistence trajectories outperforms both models for the majority of the available days. By defining persistence as the last known velocity persisted, the skill of persistence relies on the memory of the local circulation. Through testing the averaging of the velocities of the drifters back in time (previous 1 to 3 days, for example), we found that taking the last known velocity with no prior averaging was the optimal choice for persistence skill. This averaging could be degrading persistence skill due to the short average decorrelation time (1.2 days) and sampling inconsistencies.
Persistence uses the observations directly with no interpolation. Thus, persistence can initially capture scales of motion not represented by the numerical model horizontal grid-scale resolution of about 8 km. Many studies (Griffa et al., 2004;Huntley et al., 2011;Putman & He, 2013) have noted the effect of the horizontal grid resolution in ocean models on Lagrangian predictability. Recently, Carlson et al. (2016) investigated Lagrangian transport in the Adriatic Sea using Regional Ocean Modeling System (ROMS) with a high horizontal resolution of 2 km. They noted separation distances between simulated trajectories and observed that drifters were still around 5-6 km d −1 but approximately 60% lower than for the global model evaluated in this study (12-14 km d −1 ). This highlights the importance of unresolved subgrid-scale processes on Lagrangian predictability.
Despite the resolved initial conditions, a major limitation of persistence is the simple linear nature of the prediction with no dynamical circulation and thus completely depends on retaining the memory of the local ocean velocity. Persistence typically has very high skill at the start of the forecast, t 0 which naturally degrades as the correlation between t 0 and t 0+i reduces. The extent of this correlation can be quantified via the decorrelation time of the drifter velocities (1.2 days on average). This is similar to both the e-folding time scale of the Lagrangian persistence trajectories (Figure 2a) and the average crossover time of the velocity error, T c (RMSE) (1-2 days). This similarity is perhaps unsurprising and highlights the link between the memory of the ocean circulation and the performance of the persistent velocity as compared to the model noise.
T c (d) is much longer than both T c (RMSE) and the decorrelation time at 6 days. This difference could be the result of the accumulation of errors when integrating velocities (Huntley et al., 2011). The improvement of the persistent velocity over the model velocities for the first 1-2 days (approximately the length of decorrelation time) is thus accumulated, translating to much longer time advantages in the Lagrangian framework, and then T c (d) is on average 6 days. An alternative interpretation is that T c (d) represents the scale at which the observed drifters are no longer following a straight line. The influence of larger-scale circulation patterns is then much more important which can be captured by the dynamical model.
The displacement scale of the observations at T c (RMSE) is approximately 20-30 km, close to the model resolution or twice the size of the model grid separation (8 km). The persistent velocity is initially a more accurate approximation of the subgrid-scale motion than the model via directly sampling the observations. However, once the drifters have displaced sufficiently outside the model resolution, this advantage is lost. At this point, the drifter is no longer within the initially sampled area, and the initial velocity is less likely to still hold true. The model has the advantage of a dynamically evolving large-scale circulation (capturing processes that persistence cannot capture) but has limited resolution of the subgrid-scale processes (Huntley et al., 2011).
The core advantage of persistence is the highly resolved sampling of the local motion which is adequately representative of the actual drifter for the first few days. Therefore, with dramatic improvements in ocean model resolution, the model should perform better against persistence. The crossover time is a useful metric to demonstrate model improvements, particularly increasing resolution.
There are large spatial variations in the separation distance (Figures 3a-3d) and T c (d) (Figure 3e). A large T c (d) can be due to the actual local motion (high persistence) and/or model error. The model trajectory error may produce variability in T c (d) that cannot be explained in terms of local dynamical features. However, some dynamical features can be broadly recognized. The open ocean has a lower crossover time than the energetic boundary current regions. Gille and Kelly (1996) concluded that the ACC scales are determined by local instability mechanisms with short spatial decorrelation scales of about 85 km and longer time scales of 34 days. Shorter T c (d) present in many parts of the ACC could be linked to these shorter spatial scales. Lumpkin and Johnson (2013) have identified regions of "eddy deserts" with significantly low time mean eddy speeds.
Here these regions, that is, the Peru Basin, South Equatorial current, the subpolar North Pacific, and Central subtropical North Atlantic typically have larger T c (d). The local circulation memory is likely maintained for longer in slowly evolving dynamics, improving the performance of persistence trajectories. Therefore, the ocean model has a tougher persistence benchmark where Lagrangian trajectories are inherently more predictable in the model, that is, in areas of lower square root of the velocity variance (Barron et al., 2007;Özgökmen et al., 2000).

Conclusion
A new evaluation metric, the crossover time, is introduced to examine the performance of a global ocean model with respect to persistence. Persistence represents the most straightforward forecast that can be made directly from the observations. The crossover time was estimated using the Lagrangian separation distance and velocity error, for a global ocean model forecast and analysis. For the Lagrangian separation distance, the persistence trajectories outperform the model analysis trajectories for the majority of 7 day prediction, with a corresponding average crossover time of approximately 6 days. For the velocity error, the crossover time is significantly smaller at about 1-3 days, similar to the decorrelation time scales of the observed drifters.

10.1002/2017GL076075
The significant difference between crossover times in the separation distance and velocity error can be understood in terms of the nature of trajectory prediction and their inherent accumulation of velocity errors.
Spatial variations in the crossover time are explained in terms of the model error and some regional dynamical features.
The crossover time can be used to evaluate ocean models giving simple, intuitive, and physical insight. The crossover time concept could also be useful to evaluate the forecast performance of any other ocean variables such as sea surface temperature and the mixed layer depth. Persistence, a simple benchmark forecast, is a good short-term predictor of Lagrangian trajectories but as numerical models improve (increased resolution, improved model physics, etc.) the crossover time is expected to shorten.