The Wide Swath Significant Wave Height: An Innovative Reconstruction of Significant Wave Heights From CFOSAT’s SWIM and Scatterometer Using Deep Learning

The accuracy of a wave model can be improved by assimilating an adequate number of remotely sensed wave heights. The Surface Waves Investigation and Monitoring (SWIM) and Scatterometer (SCAT) instruments onboard China‐France Oceanic SATellite provide simultaneous observations of waves and wide swath wind fields. Based on these synchronous observations, a method for retrieving the SWH over an extended swath is developed using the deep neural network approach. With the combination of observations from both SWIM and SCAT, the SWH estimates achieve significantly increased spatial coverage and promising accuracy. As evidenced by the assessments of assimilation experiments, the assimilation of this “wide swath SWH” achieves an equivalent or better accuracy than the assimilation of the traditional nadir SWH alone and enhances the positive impact when assimilated with the nadir SWH. Therefore, insights into the better utilization of wave remote sensing in assimilation are presented.

• An innovative method for extending significant wave height (SWH) from nadir to a wide swath is presented • A deep neural network model is developed based on simultaneous observations from the nadir beam and wind scatterometer of the CFOSAT • Significant positive impacts are found in the assimilation of the "wide swath" SWH compared to the assimilation of the Surface Waves Investigation and Monitoring nadir only wave observations (Raghukumar et al., 2019), the spatial coverage of wave buoys is severely limited because of being the point measurements. The rapid development of wave remote sensing is extremely meaningful to wave forecasting and related studies. The spaceborne radar altimeter has become the major instrument for the acquisition of SWH observations (Dobson et al., 1987;Fedor et al., 1979;Hayne, 1980). Through decades of improvement in the related hardware and processing algorithms (Cotton et al., 1994;Liu et al., 2016), modern operational altimeters, such as Jason series (Abdalla et al., 2010;Nerem et al., 2010), HY2 series (Jiang et al., 2012) and SARAL/AltiKa (Abdalla, 2015, Jayaram et al., 2016, provide global high accuracy SWH observations on ocean surface that are vertical below the satellite track, known as the "nadir" tracks. The significant increase in the quantity of wave observations from altimeters has surely led to valuable improvements in the impacts of data assimilation (Aouf et al., 2015;Bhatt et al., 2005;Breivik et al., 1994;Cao et al., 2015;Emmanouil et al., 2007;Lionello et al., 1992). However, altimeter observations are still limited to their nadir tracks, limiting the number of observations.
The Surface Waves Investigation and Monitoring (SWIM) instrument carried by China-France Oceanic SATellite (CFOSAT) was launched in 2018. SWIM is a new and unique instrument that can provide two additional sets of observations of wave spectra observations at wavelengths from 70 to 500 m from each side of the altimeter nadir (Hauser et al., 2016;Xu et al., 2019). Moreover, CFOSAT also carries a microwave scatterometer (SCAT) that works simultaneously with SWIM to obtain a wind field with a wide swath of approximately 800 km and standard deviation of wind speed less than 1.2 m/s (Lin et al., 2018;Liu et al., 2020).
Wind is the source of wave energy, the wind observation from SCAT contributes to the estimation of wind sea. With the swell information from SWIM, the possibility exists for SWH observations to be extended from the nadir to a swath covered by SCAT, the so-called "wide swath SWH". The wide swath SWH, which has scarcely been investigated and discussed heretofore, is obtained by extracting synchronously observed information from both SWIM and SCAT. Therefore, together with the SWIM nadir, the addition of the wide swath SWH could significantly increase the quantities of wave observations comparing to that of originally designed SWIM products, which would potentially further improve wave forecasts with data assimilation.
In this study, we present a novel method for obtaining the SWH over a wide swath from the synchronous observations of SWIM and SCAT. A retrieval model based on a deep neural network (DNN) is established and trained using the SWH acquired from Jason-3 and SARAL/AltiKa, which are the state-of-the-art altimeters with high accuracy of bias 0.05 and 0.08 m, root mean square error (RMSE) 0.23 and 0.20 m, respectively (Sepulveda et al., 2015;Yang et al., 2020). The method and data set used in the DNN to estimate the SWH over SCAT grid points at a swath distance of up to 200 km are described in Section 2. The wide swath SWH is validated in Section 3. In Section 4, the results of assimilation experiments are described to provide a comprehensive and fundamental confirmation of the impact of the newly estimated wide swath SWH. The validation of the assimilation of the wide swath SWH reveals an improvement in the model accuracy compared with the assimilation of the nadir SWH only, which offers evidence for the benefit of the wide swath wave product.

Data Setup and Method
The setup of the observations from CFOSAT is indicated in Figure 1a. The unique synchronous observations from the SCAT and SWIM instruments used to obtain simultaneous wind and wave information are described here. First, SWIM can provide the nadir total SWH similar to traditional altimeters. In addition, SWIM observes two additional "boxes" containing wave directional spectra distributed on either side of the nadir track. However, only the waves whose wavelengths range from 70 to 500 m can be observed in these boxes (Hauser et al., 2020). The distance between the boxes and the nadir track is approximately 50 km. Second, a wide swath of the wind field, including both wind speed and wind direction data, can be obtained from SCAT.
The wind observations cover a larger region than the SWIM nadir and boxes. The key idea of retrieving the wide swath SWH is to acquire the total SWH by extracting the information from both wind and wave observations, in other words, to obtain more SWH observations by converting some "wind grids" from SCAT into "SWH grids" from the combination with SWIM observations. Being a widely used method in the field of deep learning, the DNN is a powerful technique for refining features and information from big data, and the efficiency and robustness of DNN have been demonstrated in classification, data mining and other fields (Najafabadi et al., 2015). DNN has been shown to be effective when applied to wave remote sensing (Wang et al., 2020). Thus, we build the wide swath SWH retrieval model based on DNN.
The structure of the DNN model is presented in Figure 1b. Seven parameters are used as the inputs. Wind speed data from the SCAT grid can be seen as information highly related to the windsea. The equivalent SWH and peak period from the SWIM boxes can provide the wave information (over wavelengths from 70 m to 500 m) for the DNN model, approximately compensating for the missing wave energy if the wave information is obtained only from the SCAT wind speed grid. The nadir total SWH from SWIM is used as an important reference for the estimation of the wide swath SWH. The sigma0 (backscattered cross section) from the nadir track is also an important parameter that is highly related to the sea surface roughness, giving the model the information regarding the sea state as it is related to both the wind speed and the SWH. R1 and R2 are also included in the DNN model as indicators of the impacts from the SWIM boxes and nadir.
The model comprises six layers of neurons. The rectified linear unit (ReLU, Nair et al., 2010) is used as the activation function in each neuron of the DNN model. The parameters of DNN, such as the weights between the neurons of layers, are determined by "supervised training", that is, an iterative algorithm that updates these parameters depending on the calculation of the "loss" between the DNN output and the truth. The SWH data from the collocated observations of Jason-3 and SARAL/AltiKa altimeters are used as the truth to train the wide swath DNN model. The distance between the SCAT grid and altimeter is limited to less than 12.5 km, and the time window is ±30 min. The periods of the collocated data range from April to June 2019 and from January to February 2020. There are 6,090 match-ups between CFOSAT and Jason-3/ SARAL/AltiKa; 75% of these match-ups are used to train the DNN model, while the other 25% are used as the independent data set for validation.

Wide Swath SWH Estimation and Accuracy
An example of the geographical coverage of actual CFOSAT observations is presented in Figure 2 to show the distributions of nadir and non-nadir data. The SWH estimates from the SWIM nadir beam are oriented along the nadir track (blue line in Figure 2), and spectral wave information is given up to approximately 50 km on either side of the nadir track (red squares in Figure 2), while wind speeds and directions are provided from SCAT measurements along a swath of approximately 800 km. As shown in Figure 2, some SWIM and SCAT data points are missing because they have been rejected during quality control. As is typical for the spatial criterion of collocation during the assessment of altimeter measurements, we assume that the SWH observations inside a 50 km radius are highly related. Therefore, considering the relevant radius of wave impact, we limit R1 to 50 km, which makes R2 equal to 100 km. Under this setting, the SCAT wind grids within a distance of 100 km from nadir are transitioned into wave grids for the DNN model. Therefore, we now obtain a 200 km swath of wave observations at a resolution of 25 km (the spatial resolution WANG ET AL. 10.1029/2020GL091276 3 of 9 of SCAT), which are marked as gray circles in Figure 2a. It can be clearly noted that the spatial coverage of wave observations is significantly increased compared to the original SWIM observations. The distributions of wide swath SWH observations acquired over a 24-h period are presented in Figure 2b. There are 42,176 samples from the SWIM nadir and 47,560 samples from the wide swath SWH. It should be noted that the wide swath SWH can not only provide the same amount of data as the SWIM nadir but also cover a larger area of the ocean surface.
In addition to the significant improvements in the amount of data and spatial coverage, the accuracy of the wide swath SWH is also validated against the independent match-ups with Jason-3 and SARAL/AltiKa. Five statistical parameters, namely, the bias of difference (BD), mean absolute difference (MAD), root mean square difference (RMSD), normalized root mean square difference (NRMSD) and scatter index (SI), are used in the validation . Indicated in Figures 2c and 2a good scatter pattern of 1,580 samples is achieved with BD of only 0.001 m, obtaining small MAD, RMSD, NRMSD and SI values of 0.181 m, 0.257 m, 8.2%, and 8.2%, respectively. An unbiased SWH and a reasonable RMSD can be achieved over most of the SWH range, which are presented in Figure 2d. The satisfactory NRMSD and SI values (both under 10%) can be achieved when the SWH is above 1m. The validation of the wide swath SWH demonstrates an accuracy equivalent to that of the SWIM nadir SWH or state-of-the-art altimeters.
Consequently, the wide swath SWH, which is retrieved by combining the observations of both SWIM and SCAT from the DNN model, provides not only a significantly improved spatial coverage but also an accuracy comparable to that of altimeter observations.

Impact of Wide Swath SWH on Data Assimilation
With the increased spatial coverage and good accuracy, wide swath SWH data have the potential to enhance the assimilation effect in wave models compared with the assimilation of nadir SWH only. Therefore, a set of assimilation runs is performed to investigate this topic. The assimilation experiments are implemented using the wave model MFWAM, which is a third-generation numerical wave forecast model that is applied during the operational forecasts of Meteo France. The assimilation system of the MFWAM model can jointly use altimeter SWH and directional wave spectra parameters from SAR or CFOSAT observations (Aouf WANG ET AL. 10.1029/2020GL091276 4 of 9 As indicated in Table 1, five runs are performed, including four runs with the assimilation of: wide swath and nadir SWH (Run A), nadir SWH and SWIM wave spectra (Run B), wide swath SWH only (Run C), and nadir SWH only (Run D); in addition, a control run without any assimilation is also conducted (CTRL). The SWH observations are assimilated into MFWAM by using optimal interpolation (Aouf et al., 2015) with a 3-h time window (±1.5 h). The model runs globally with a spatial resolution of 0.5° and is forced by 3-hourly wind and sea ice fraction fields provided by the IFS-ECMWF atmospheric system. The time period of the model experiments is May 2019. The SWH observations from National Data Buoy Center (NDBC) buoys are used as the reference to assess the accuracy of each run. A total of 45 NDBC buoys with offshore distances in excess of 60 km are selected for the assessment. Standard Deviation of Difference (SDD) in order to discuss the scatter without accounting of bias. The SDD is defined as follows: where the S i and R i is the SWH from MFWAM and references (SWH from NDBC buoys or Jason-3 and SA-RAL/AltiKa), S and R is the mean of SWH from MFWAM and references.
From the 7,377 matchups with buoys, the results of the validations are presented in Table 1. First, all four runs with assimilation achieved improved difference statistics compared to the control run, reflecting the positive impacts of assimilation. From the comparison between Run C and Run D, which employ an approximately equal number of SWH observations, the assimilation of the wide swath SWH resulted in a lower SDD, NRMSD and SI than the assimilation of the nadir SWH only and is degraded only in the BD. Therefore, we can say that, with the increased number of observations and the acceptable accuracy, the newly retrieved wide swath SWH obtains an almost equivalent assimilation effect as nadir observations. It can be noted that, the addition of the SWIM spectra assimilated besides nadir SWH only (Run B) lowers the SDD, NRMSE and Scatter Index with respect to Run D, albeit very limited. The assimilation of nadir and wide swath SWH (Run A), can still achieves better accuracy compared to Run B and Run D. Therefore, it is reasonable that Run A, which assimilates both wide swath and nadir SWH, achieves better values of the SDD (which improved from 0.317 to 0.299 m), NRMSD (from 18.88% to 17.85%) and SI (from 18.74 to 17.65) than the assimilation of nadir SWH only (Run D), although the BD is nonsignificantly degraded from −0.038 to −0.044. Consequently, the addition of wide swath SWH enhances the positive impact of the assimilation of traditional nadir observations.
As the NDBC buoys are located mainly in the Northeast Pacific and Northwest Atlantic, the assimilation impact on the global wave system is further investigated by using Jason-3 and SARAL/AltiKa altimeters. As the significant positive impact of assimilation is clearly seen in Table 1, here, we focus more on the improvements between Run A and Run D. To illustrate the positive assimilation effect of the addition of the wide swath SWH in a more obvious way, the improvements in the BD and SDD are defined as follows: WANG ET AL. SWH, significant wave height.

Table 1 Setups of the Assimilation Runs and Their Validations Against National Data Buoy Center Buoys
where the subscript "imp" indicates an improvement and the subscripts "A" and "D" indicate the parameters from Run A and Run D, respectively.
The global distributions of the improvements due to the addition of the wide swath SWH are presented in Figure 3. Red color represents a positive improvement, that is, a lower BD or SDD, while blue reflects degraded accuracy. With the assimilation of both wide swath SWH and nadir SWH, improvements can be found in most of the global ocean, as red is dominant in both Figures 3a and 3b, especially in the midlatitude regions. As indicated in Figure 3a, the most significant improvement in the BD occurred in the mid-latitude oceans of the Southern Hemisphere between 40°S and 60°S, where the BD was reduced by an average of 0.1 m. Obvious BD improvements are also observed in the North Pacific and most of the Atlantic. Slight degradations in the BD appear mainly in the tropical oceans, where the SWH is lower than in the WANG ET AL.
10.1029/2020GL091276 6 of 9 subtropical and mid-latitude oceans. The distribution of the SDD improvement shows general improvement globally. Specifically, positive impacts on the BD and SDD are achieved for 65.0% and 58.4% of the global ocean, respectively, when the wide swath SWH is assimilated with the nadir SWH. A slight degradation is observed in some ocean areas (e.g., tropical region) which might be explained by the assumption of constant and equal errors of wave model and remotely sensed SWH (e.g., the wide swath SWH) in the optimal interpolation of the assimilation scheme. The assimilation may degrade the accuracy if this "equal-error" assumption clearly fails. This behavior might be rectified by implementing a spatial dependence on model errors (Greenslade & Young, 2004) or using an ensemble approach to estimate them.

Conclusions
The accuracy of wave simulations from numerical wave models can be effectively improved by assimilating available observations, including the remotely sensed SWH from spaceborne altimeters. And the quantity of the observations is always an important factor affecting the wave assimilation effect.
CFOSAT is a new and unique oceanographic satellite equipped with two major sensors, namely, SWIM and SCAT, which provide simultaneous observations of waves and surface winds. Although SWIM provides information along the nadir track in addition to two rows of "boxes" on each side of that track, the spatial coverage of SWIM data remains limited. SCAT observations, in contrast, provide wind observations over a wide swath. Considering the wind speeds observed by SCAT which are skilled to describe the wind sea and SWIM wave observations to capture swell dominant sea, a retrieval method is constructed based on a deep neural network to retrieve the wide swath SWH from the simultaneous observations of SCAT and SWIM. The major inputs of the DNN model are the SWH and sigma0 from the SWIM nadir observations, the SWH and peak period from the wave spectra in the SWIM off-nadir boxes, and the wind speed from SCAT. The training of the deep neural network is carried out by using collocated independent SWH altimeter observations. The model is then used to estimate the SWH at SCAT grid points to provide the SWH over an extended spatial coverage.
In addition to the significantly increased number of observations, the wide swath SWH has been shown to achieve good accuracy based on independent validations against altimeters. Then a set of assimilation runs is implemented to assess the potential impact of the wide swath SWH. Promising results are found from a validation against NDBC buoy wave observations. The assimilation of the wide swath SWH achieves an equivalent positive impact to the assimilation of SWIM nadir SWH observations. It should be noted that the assimilation of the wide swath SWH achieves lower values of the SDD, NRMSD and scatter index. Assessing the SWH from altimeters, a global validation also presents a satisfactory conclusion that, together with traditional nadir SWH observations, the addition of the wide swath SWH does enhance the positive impact of the assimilation. The improvements of bias difference and SDD produced by assimilating the wide swath SWH occur mainly in the subtropical and mid-latitude oceans.

Discussion
The success of the wide swath SWH estimation comes from the setup of CFOSAT payloads, which provide synchronous observations of waves and winds from SWIM and SCAT, respectively. To a certain extent, the wide swath SWH combines the advantages of both SWIM and SCAT, thereby obtaining significantly increased spatial coverage and RMSD 0.257 m with little bias of difference. As evidenced by the assimilation experiments, the wide swath SWH also has an enhanced positive impact on the SWH estimations of wave model by assimilations. Therefore, this research provides one of the insights into how we can increase the positive impact of wave remote sensing.
It is also worth noting that CFOSAT is not the only satellite carrying both wind and wave instruments. The HY2 series, including the HY2A (Wang et al., 2013), the HY2B  and the recently launched HY2C satellites, are all equipped with both an altimeter and a scatterometer, giving them the ability to monitor nadir waves and wide swath winds simultaneously. The method of wide swath SWH is preliminarily applied on HY2B and achieves the accuracy of 0.012 m BD, 0.290 m RMSD, and 10.3% Scatter Index if the swath is set to be 200 km. So it can be inferred that the wide swath SWH method has valuable potential of obtaining more wave observations from other missions.
Although the acquisition of the wide swath SWH provides evidence for the potential of these synchronous observations, the wide swath SWH estimation must be further perfected. For instance, the swath of the SWH is a critical factor to determine its accuracy. When the swath is set to be 100/350/500/600 km for CFOSAT, the RMSD would be 0.231/0.267/0.284/0.302 m. Although a wider swath would achieve more observations, a swath that is too wide would also degrade the accuracy because greater distance from the nadir track may lead to larger deviations for the assimilation. Therefore, more work should be conducted to optimize the SWH swath to obtain the maximum positive impact on wave assimilation.

Data Availability Statement
Center National d'Études Spatiales (CNES) provided the CFOSAT SWIM and SCAT data (accessible at ftp-access.aviso.altimetry.fr for the science team members of CFOSAT). Data from Jason-3 and SARAL/ AltiKa altimeters (accessible at aviso-data-center.cnes.fr). National Data Buoy Center (NDBC) provided the buoy data (accessible at ndbc.noaa.gov).