The Tensor‐based Feature Analysis of Spatiotemporal Field Data With Heterogeneity

Heterogeneity is an essential characteristic of the geographic phenomenon. However, most existing researches concerning heterogeneity are based on the matrix. The bidimensional nature of the matrix cannot well support the multidimensional analysis of spatiotemporal field data. Here, we introduce an improved tensor‐based feature analysis method for spatiotemporal field data with heterogeneous variation, by utilizing the similarity measurement in multidimensional space and feature capture of tensor decomposition. In this method, the heterogeneous spatiotemporal field data are reorganized first according to the similarity and difference within the data. The feature analysis by integrating the spatiotemporal coupling is then obtained by tensor decomposition. Since the reorganized data have a more consistent internal structure than original data, the feature analysis bias caused by heterogeneous variation in tensor decomposition can be effectively avoided. We demonstrate our method based on the climatic reanalysis field data released by the National Oceanic and Atmospheric Administration. The comparison with conventional tensor decomposition showed that the proposed method can approximate the original data more accurately both in global and local regions. Especially in the area influenced by the complex modal aliasing and in the period time of the climatic anomaly events, the approximation accuracy can be significantly improved. The proposed method can also reveal the zonal variation of temperature gradient and abnormal variations of air temperature ignored in the conventional tensor method.


Introduction
The geographical phenomenon is the comprehensive interaction result of various geographical elements. With the different effects of space, time, and attributes, the geographical phenomenon shows the characteristic of heterogeneity, which reflects the uneven distribution, spatially nonhomogeneity and temporally nonstationary of geographical data within an area (Mashhoodi et al., 2019;Shi et al., 2019;Wang et al., 2017). However, the existence of heterogeneity also makes geographic data to be the hybrid mixture of various feature signals, which increases the complexity of analyzing the feature signals from the original geographic data. Thus, the feature analysis of geographical data with heterogeneous variation becomes a hot issue in current research (Li et al., 2016;Pradhan et al., 2014).
The existing methods to explore data features with heterogeneous variations are mainly from the perspective of the continuous spatiotemporal field. They can be roughly divided into the geostatistics analysis and statistical regression-based method (Huysmans et al., 2014). In geostatistics analysis, such as the Kriging (Kleijnen, 2009), generalized Kriging (Xu & Shu, 2015), Bayesian Maximum Entropy (Yu & Wang, 2013), etc., the spatiotemporal heterogeneous variation is considered where the temporal variation is a function of time distance and the spatial variation is a function of spatial distance, and the covariance function is used to describe the structure of heterogeneity (de Marsily et al., 2005). However, for the spatiotemporal heterogeneity, the construction of spatiotemporal covariance often faces problems, such as the inconsistent dimensions of space and time (or the unity of "distance" units of time and space) (Kleijnen, 2009;Xu & Shu, 2015). Although, some spatiotemporal covariance models, such as the separable model (Graeler et al., 2016), nonseparable model (Ruiz-Medina et al., 2016), and product-sum model (De Iaco et al., 2001), are proposed to try to solve the above problems. The matrix basis in these models makes them still challenging to capture spatial and temporal variations simultaneously, and result in problems such as the loss of spatiotemporal interaction information and high construction complexity (Hristopulos & Tsantili, 2017).
The statistical regression-based methods, such as spatial autoregressive local estimation models and geographically weighted regression (GWR) (Lu et al., 2019;Suesse, 2018), embed the spatial location of the data into the linear regression models. The spatial variation of the estimated regression parameters at each observation position is then analyzed to reflect the spatial heterogeneity. However, these methods only consider the spatial structure, and ignoring the temporal variation cannot well support the spatiotemporal field data with temporal heterogeneity (Peng et al., 2019). geographically temporally weighted regression (GTWR) is proposed based on GWR by taking the instability of time into account, and the spatiotemporal distancebased weight matrix is constructed to analyze the variation of parameter estimation to explore the heterogeneity (Chu et al., 2018). Yet, in GTWR, the spatial distance and temporal distance are measured separately. The underlying assumption is the temporal independence, which means that the spatiotemporal field data are seen as a set of independent 2-D spatial data. Thus, it is difficult to sufficiently capture the spatiotemporal features (Liu et al., 2016).
Recently, the tensor-based comprehensive spatiotemporal feature analysis of multidimensional data has been popularly developed (Afra & Gildin, 2016;Kotov & Paelike, 2019). The tensor is a multidimensional array, which can be seen as an extension of the matrix in high-order space (Leibovici, 2010a). The tensorbased feature analyses mainly utilize the tensor decomposition, which exploits the data features along with each mode and the corresponding coupling relationship by considering multidimensional data as a whole. Hence, the tensor-based feature analysis methods can estimate the intrinsic structure of spatiotemporal field data ignored in metric model (Kolda & Bader, 2009). In the tensor-based geoscience application, the tensor structure is first utilized to integrate the spatial and temporal information in a unified framework. Then, with the capability of the tensor decomposition for feature extraction, the nonlinear signal extraction (Lu et al., 2018), feature-based compressed storage (Yuan et al., 2015), and dimensionality reduction (Gao et al., 2015), can be achieved by absorbing the prominent spatiotemporal features and removing the redundancy. All these tensor-based spatiotemporal analyses show the state-of-the-art performance for spatiotemporal data (Leibovici, 2010b).
Although the tensor-based methods have been widely used in various spatiotemporal field data studies, their capability for dealing with heterogeneity is limited. This is mainly because tensor models always focus on the global tensor structure, implicitly assume that the observed data are globally low rank with homogeneous relationships among different dimensions (Liu et al., 2013). For the spatiotemporal field data with heterogeneous variation, the uneven distribution makes the data tend to be locally structured, which makes it challenging to meet this assumption. Besides, this global model deals with the multidimensional data as a whole, the local structure can be easily concealed in this process. The obtained results tend to a certain "average" within the research area that will bring in the bias for the feature estimation (Scholz et al., 2018). Therefore, the conventional tensor decomposition methods still show insufficient support for the spatiotemporal field data with heterogeneous variations.
In this paper, based on the multidimensional feature capture of tensor decomposition, an improved tensorbased feature analysis method is constructed for the spatiotemporal data with the heterogeneous variation. The remainder of this paper is organized as follows. In section 2, we give the concrete construction of the proposed method. In section 3, a case study is showed based on the National Oceanic and Atmospheric Administration (NOAA) reanalysis data. Finally, the conclusions and discussions of this paper are provided in section 4.

Method
Based on the fact that the spatiotemporal observations obtained at nearby locations and time stamps are similar to each other, the spatiotemporal data tend to be the local structured but weak global correlation. A feasible approach is to apply the tensor decomposition to these local data with a relatively consistent structure to remove the affection of heterogeneous variations within the data. Additionally, this idea has been widely applied in the feature extraction of complex two-dimensional data. For instance, in the background estimation of a complex scene, Matteoli.et al proposes a method that divides all the scenes into the local matrix, and then the accurate estimation is achieved by calculating the similarity among the local matrices (Matteoli et al., 2010). In the feature study of complex terrain, the terrain is divided into a series of grid units, then through the revelation of the topographic relief variation in each grid unit area to characterize its overall morphological characteristics and variation rules (Nijzink et al., 2016). All of these works show that the local process can effectively reduce the feature estimation bias in global analysis for heterogeneous data. However, they are all designed for the two-dimensional spatial data, and the multidimensional extension is needed for the spatiotemporal field data.
For the feature analysis of spatiotemporal field data with heterogeneous variation, it is essential to partition the original heterogeneous data into the local data with a relatively consistent structure. Then the feature analysis can be achieved by applying the tensor decomposition to each local data. In this method, a data partition strategy should be carefully designed. Based on the fact that geographical data in the nearby regions are similar, the similarity-based partition strategy is proposed here. That is, the original data are first partitioned into the local data, which are then reorganized according to the similarity measurement to maximize the within-local data similarity and between-local data difference. Since the internal structure of these recognized data tends to be consistent, the tensor decomposition can be applied to these recognized data to avoid the affection of heterogeneous variation. So the whole process can be divided into (1) the partition of original data; (2) the similarity measurement for partitioned data; and (3) the data reorganization and tensor analysis.

The Partition of Spatiotemporal Field Data
For the convenience of expression, a three-dimensional spatiotemporal field data A ∈ ℜ 2 × T is adopted here, where ℜ 2 denotes the spatial space, and T refers to the temporal space. Since the spatiotemporal field data are often stored in the form of multiple-way arrays, it can be naturally seen as a multidimensional tensor A∈ℜ dX ×dY ×dT , where d X , d Y , and d T are the spatial and temporal dimensions. To detect the similarity and difference within the data, the following terms are defined.

Definition 1 (Subtensor)
For the tensor A ∈ ℜ 2 × T, it can be seen that it is composed of a series of subtensors with the same spatiotemporal reference. These subtensors are defined as the smallest unit in the original data that is indivisible in size and can be obtained as follow: (1) Here, Block( ) is the function that partition the original tensor A to subtensors A i f g m i¼1 , and each subtensor A i includes local spatial and temporal information, also, m is the number of subtensors. Due to the differences in each dimension, the subtensor number in each dimension can be customized for different circumstances. To maintain the natural structure of the data, the partition of the time dimension is usually a factor in the data update interval. For spatial dimensions, it is often a regular block of the same size in coordinate space. The whole process is shown in Figure 1.
Thus, to measure the similarity and difference within the data, the original data can be first partitioned to a series of subtensors. Then the similarity among these subtensors can be used to reflect the similarity and difference within the data.

The Similarity Measurement for Subtensor
In order to partition the original heterogeneous data to the local data with a relatively consistent structure, the similarity among the above subtensors should be measured to distinguish the similarity and difference within the heterogeneous data.
Since these partitioned subtensors still maintain the integrity of the spatiotemporal reference, they are essentially multidimensional spatiotemporal field data. For the similarity measurement for these subtensors, the traditional similarity measurements may be invalid. For example, the commonly used similarity measurement is designed with distance measures such as Euclidean or Manhattan distance . Since the relative difference between the nearest and farthest distance of the data will disappear as the dimension increases, the validity of this measurement is difficult to be guaranteed for multidimensional data (Aggarwal, 2001). The other commonly used measurement first transforms the multidimensional data to low-dimensional space by expanding into matrices or vectors according to a certain dimension, then uses the similarity measurement in low-dimensional space. However, this method not only destroys the spatiotemporal coupling structure among the features but also yields high dimensionality (Khokher et al., 2019). Therefore, the existing methods still show insufficient support for the similarity measurements of spatiotemporal field data.
To measure the similarity of spatiotemporal field data, the following term is defined first. 2.2.1. Definition 2 (Latent Factor) Similar to the principle component obtained by matrix decomposition for two-dimensional data, the latent factor in tensor decomposition can also be seen as the principle component for multidimensional data. Here, taking the simplest tensor decomposition-CP decomposition as an example. For above subtensors A i f g m i¼1 , the latent factors can be obtained by CP decomposition as follows: Here, x ir f g R r¼1 , y ir f g R r¼1 , and t ir f g R r¼1 are the vectors that denote the r th (r = 1,2,…,R) latent factors of subtensors A i f g m i¼1 in spatial and temporal dimensions, respectively. Where R represents the factor numbers, λ ir f g R r¼1 and res i are the weight coefficients and residual tensor of subtensor A i f g m i¼1 . ∘ represents the vector outer product, which means that each element of the result is the product of the corresponding vector elements (Acar et al., 2011).
With the latent factors in CP decomposition, the data dimensions of original data can be reduced by retaining the principle factors and removing the noise factors. And then, the similarity measurement can be constructed based on the dimension reduced space. Taking the first latent factor in the temporal dimension t i1 f g m i¼1 as an example, since they are essentially time series that may exist as a certain delay or time lag, the similarity calculation based on the original time series directly may be biased (Atluri et al., 2018). Hence, a common strategy is to translate one of the time series with a range of candidate values of time lag as T(t i1 , h), then the time lag that maximizes the similarity is chosen as follows: The whole calculation process is shown in Table 1. Here, Cor() is the correlation calculation function for vectors, and it can be determined according to the different application scenarios and analysis requirements. Thus, the similarity sequence of subtensors A i f g m i¼1 composed of the similarity between the adjacent subtensors can be obtained as follows: Earth and Space Science ρ 1;2 ; ρ 2;3 ; …; ρ n;nþ1 È É (4)

The Data Reorganization and Tensor Analysis
For reducing the bias in tensor decomposition caused by heterogeneous variation within the data, the partitioned subtensors can be reorganized to maximize the within-subtensors' similarity and between-subtensors' difference, so that the reorganized subtensor can have a relatively consistent structure. Then the tensor decomposition can be applied to these reorganized subtensors to achieve the feature extraction.
Considering that in different analysis applications, the determination criterion of heterogeneity degree is different, the threshold value judgment is designed to construct the data reorganization strategy. That is, when the similarity of the adjacent subtensor is greater than the given similarity threshold (δ), it should be merged and processed as a whole. Otherwise, they will be treated separately as follows: Here, ⊕ indicates a join operator that appends data along a specific dimension. After this process, the sub- The whole process is shown in Figure 2.
Since each reorganized subtensor is a multidimensional tensor with a relatively consistent structure, tensor decomposition can be used directly. Then, the tensor analysis, such as the feature extraction and data approximation, can be achieved based on decomposed results and tensor reconstruction.
Taking the CP decomposition as an example, each reorganized subtensor can be decomposed as (2). Since the decomposed latent factors are obtained by considering the coupling relationship among data in each dimension, they are suitable for the feature extraction of data in a specific dimension combination and the data approximation in all dimensions. Based on the decom- , can be reconstructed as follows: The data approximation can be reconstructed as ,T ] //The time point marker for a sequence 2: d = T/h//Total movement number 3:for j = 1:d//Move the sequence successively 4: t i1 = [t i1,(j * h+1) , …, t i1,T , t i1,1 , t i1,2 , …, t i1,(j * h) ] 5: Cor(t i1 ,t i+1,1 ) //Calculating correlation coefficient 6: end Output: ρ i,i+1 = Max{Cor(t i1 ,t i+1,1 )}

Research Data and Experiment Configuration
The 2.5°× 2.5°air temperature (Air) from 1 January 1948 to 31 December 2010 released by NOAA was selected as the experimental data. This temperature data are global daily mean meteorological reanalysis data sets formed by the fusion of multisource data sets through climate models. In this study, air temperature data are stored as a tensor Air ∈ ℜ 144 × 73 × 365 with the dimensions of longitude, latitude, and time.
The following experiments were performed. (1) CP decomposition is used to verify the advantages of the proposed method in the aspect of data approximation and feature extraction. For convenience, the CP decomposition applied to reorganized blocks is abbreviated as the local decomposition, and the CP decomposition applied to the original data is abbreviated as the global decomposition. (2) The original data was partitioned into blocks according to the data size balance, and blocks according to the heterogeneous variation. Then the tensor decomposition is applied to verify the proposed data partition strategy to better support the tensor analysis.

Data Partition and Reorganization
The original data are the multiyear average data, they are susceptibly influenced by many common climateforcing factors (such as terrain or climate events in different periods of the year), showing the significant heterogeneity in temporal behavior (Zhao et al., 2017). In the absence of prior knowledge, the original data are partitioned into 12 subtensors according to 12 months, and then they are reorganized based on the proposed data reorganization strategy. Thus, the original data Air ∈ ℜ 144 × 73 × 365 is finally reorganized as {Air 1 ∈ ℜ 144 × 73 × 65 , Air 2 ∈ ℜ 144 × 73 × 30 , Air 3 ∈ ℜ 144 × 73 × 120 , Air 4 ∈ ℜ 144 × 73 × 60 , Air 5 ∈ ℜ 144 × 73 × 90 }; that is, the data of December and January are combined as the first subtensor Air 1 , the data of February is the second subtensor , the data of March, April, May, and June are combined as the third subtensor Air 3 , the data of July and August are combined as the fourth sub-tensor Air 4 , while the data of September, October, and November are combined as the fifth subtensor Air 5 . The result is shown in Figure 4.

The Comparison of Data Approximation Performance
To verify the data approximation performance, the global residual tensor Res G obtained by global decomposition and each local residual tensor Res Li f g 5 i¼1 obtained by local decompositions are calculated according to equation (2). These local residual tensors are merged as Res L ¼ Res L1 ⊕Res L2 ⊕Res L3 ⊕Res L4 ⊕Res L5 . Then the data approximation bias denoted by overall and directional relative error ratio (RER) are calculated as shown in Table 2. Air X ; Air Y ; Air T represent the averaging of tensor Air along the longitude, latitude, and time, Res GX ; Res GY ; Res GT and Res LX ; Res LY ; Res LT can be defined in the same manner.
The relative error ratios of local decomposition and global decomposition are Rer L = 0.0174 and Rer G = 0.0264, respectively. It indicates that the proposed method can capture the latent factors more accurately than conventional tensor method (global decomposition) in global region, so as to improve the accuracy of data approximation.

Earth and Space Science
To test the performance of the proposed methods in the local region, the change of relative error ratio in these two methods in different dimensions are calculated as follows: For convenience, the change of relative error ratio is abbreviated as the accuracy improvement. The directional relative error ratio and accuracy improvement are depicted in Figure 5.
From Figures 5a-5c, we find that the relative error ratio in spatial and temporal dimensions obtained by local decomposition are all smaller than that of global decomposition. Figure 5c also shows that the temporal  relative error ratio of the proposed method is more stable than that of global decomposition. This may be because the temporal dimension partition makes the local data tend to be consistently structured, which reduces the affection of heterogeneous variation to the tensor decomposition. Thus, the proposed method can also approximate the original data more accurately in the local regions, as well as more stably than conventional tensor method.
From Figures 5d-5f, the accuracy improvement is significantly different in each dimension. The accuracy improvement in meridional distribution in Figure 5d shows that, in the range of (0, 112°E) and (135°W, 93°W), the accuracy improvement is gradually increasing with the longitude and reach a peak accuracy improvement as 0.014 and 0.011, respectively. In the range of (112°E, 135°W) and (93°W, 0), the accuracy improvement is gradually declining with the longitude. These patterns seem to follow the global distribution of land and ocean. In general, land occupies a larger proportion of the land-sea distribution in the range of (0°, 112°E) and (135°W, 93°W), and sea occupies a larger proportion in the range of (112°E, 135°W) and (93°W, 0°). It is known that the air temperature is influenced by the mode aliasing that includes land and ocean modes, and land data own a more complex mode than marine data because of the differences in thermal properties between oceans and land (Blesic et al., 2019). In the regions with larger proportion of land, the air temperature structure tends to be more heterogeneous. Therefore, it can be concluded that the proposed method can better capture the data features than the conventional tensor methods from the influence of complex mode, so as to significantly improve the accuracy of data approximation.
According to the accuracy improvement of zonal distribution in Figure 5e, we can find that high values of accuracy improvement are all in the high-latitude area. The low values are mainly distributed in the midlatitude area of the Southern Hemisphere. Generally, in high-latitude areas, the climate is primarily the cold  zone and polar day and polar night exist, which makes the significant heterogeneous variation of air temperature in this area. Whereas the midlatitude area of the Southern Hemisphere is mainly distributed by the ocean, which has a relatively simple modal structure result in the weak heterogeneous variation of air temperature (Brazel, 2006). Therefore, we can conclude that the proposed method can significantly improve the accuracy of the data with significant heterogeneous variation, which is more applicable for the feature analysis of the complex data.
From the temporal distribution of accuracy improvement (Figure 5f), we find that the low values of accuracy improvement occur in spring and autumn, and the high-value points occur in summer and winter months. In general, due to the effect of the subtropical anticyclone, the main continental regions of the Northern and Southern Hemispheres have severe climate activities in July and August and December to January (Holton, 1973). Especially in the summer, higher-incident solar radiation causes pronounced convective activity and vertical heat fluxes over land (Jain et al., 1999). These abnormal climate activities lead to the significant heterogeneous variations of the air temperature, such as the high variability of the ocean-land surface temperature contrast (Byrne & O'Gorman, 2013;Dommenget, 2012). Thus, for the complex temporal variation of air temperature, the proposed method can capture the data features more accurately than the conventional tensor methods.
These results suggest that the conventional tensor decomposition is biased toward the influence of heterogeneous variation caused by the complex mode and anomalous climatic event. However, the local tensor decomposition, by integrating the heterogeneity, can efficiently reduce the affection of heterogeneous variation and capture the air temperature feature more accurately both in the global and local regions.

The Comparison of Feature Extraction Performance
To test the performance of the proposed method in the aspect of feature extraction, the first mode in latitudetime dimension obtained by local and global decomposition, and the global averaging results are shown in Figure 6.
Figures 6a-6c all show a general downward trend of temperature from the equator to north and south. Still, some differences exist in detail. For example, the first modal obtained by global decomposition (Figure 6a) cannot reproduce high-value areas near the equator, which is well represented in the global average ( Figure 6b) and local decomposition (Figure 6a). The main reason for the difference is that the global tensor decomposition tends to focus on the capture of large-scale structures, while the detail information may be concealed by local variation.
From Figure 6c, we can find that some differences in the isotherm pattern vary with locations and times in the first modal. Especially in the second and third subtensor, the isotherms in high latitudes are dense and the temperature gradient is large. In low latitudes, the isotherm becomes sparse and the temperature gradient is small. This zonal variation in the temperature gradient is in accord with that recorded about the equator-to-pole surface temperature gradient (Polichtchouk & Cho, 2016). These zonal variation differences of temperature gradient are related to the ocean-land fraction in a latitude zone. Generally, the middle and high latitudes are significantly more continental, and the continental regions are more sensitive to temperature changes than the oceans (Lee, 2014).
The isotherms at different time periods are discontinuous, which reflects the heterogeneous variation in the long-range behavior of the global temperature data. Especially in the period of July to August (fourth subtensor in Figure 6c), the isotherm pattern is significantly different from that in other periods. The fluctuation range of the high-value isotherm is significantly reduced, and isotherm regions are also offset. This pattern is probably somewhat affected by the summer anomalies caused by atmospheric circulation anomalies in the context of the El Niño event (Tao et al., 2016). Some previous studies recorded that significant climate anomalies, such as the increased tropical tropospheric temperature, and sea surface temperature warming, persist through the summer (June-August) over the equatorial Pacific (Xie et al., 2009).
In summary, the tensor decomposition considering the heterogeneity can extract the typical isotherm pattern, which are roughly aligned with that found by the common averaging and conventional tensor method. Additionally, the proposed method can reveal the zonal variation of temperature gradient and abnormal variations of air temperature that ignored in the conventional tensor method.

The Flexibility and Smoothness of Method
For the local tensor decomposition of spatiotemporal field data, the rationality of local data partition is vital to the accurate feature estimation. Therefore, in order to verify that the effectiveness of the proposed data partition strategy, we compared our strategy with the commonly used uniformly partition under the condition of the same partitioned data number. That means the original data are partitioned into five subtensors with the same data size. Here, the basic partition strategy is defined to partition the original data according to the month. The latent factors of each monthly data, which are obtained by applying CP decomposition to each monthly data, respectively, are used as the baseline. Then the latent factors of each monthly data, which are obtained by applying CP decomposition to heterogeneous and uniformly partitioned data, respectively, are used to compare with the baseline. Taking the first latent factor in the temporal dimension as an example, the comparison results are shown in Table 3. Among them, CC1 denotes the correlation coefficient between the baseline and the temporal latent factor of heterogeneous partitioned data in the corresponding month, CC2 denotes those between the baseline and the temporal latent factor of uniform partitioned data in the corresponding month.
From Table 3, we find that CC1 and CC2 are very close to 1 in most months. It indicates that the temporal latent factors of heterogeneous partitioned data and uniform partitioned data are highly correlated with that obtained by directly applying the CP decomposition to each monthly data. However, for March, May, and October, the CC1 is significantly higher than CC2. For better analysis, the curves of these latent factors are depicted in Figure 7.
From Figure 7, we can find that, compared with the latent factors of March, May, and October extracted from the heterogeneous partitioned data, the latent factors in the corresponding month extracted from the  uniform partitioned data can easily result in the bias and even present the opposite trend. This is mainly because the uniform partition only balances the data amount of each local data but easy to process the local data with significant heterogeneous variation as a whole. Nevertheless, with the heterogeneous partition, the local data with similar structures are processed together. It not only reduces the number of operations that applying tensor decomposition to each monthly data respectively, as well as makes the internal structure of the local data more consistent to reduce the bias caused by the affection of heterogeneous variation within the data.

Conclusion and Discussion
This study focused on the problems of applying the conventional tensor method to the spatiotemporal field data with heterogeneity, such as the estimation bias and the difficulty in revealing abnormal variations. Taking the local consistency of data to the process of tensor decomposition, utilizing the tensor decomposition to the local data with a relatively consistent structure for the feature analysis. Since most spatiotemporal field data have considerable correlation in space and time, as well as the heterogeneity in long-range behavior. The tensor decomposition on local data can well capture the spatiotemperature coupling feature, as well as reduce the bias in a conventional global tensor method caused by the heterogeneous variation.
The comparison experiments based on the air temperature field data demonstrated that the proposed method can approximate data more accurate over both local regional and global domains, and achieve a more stable approximation in temporal dimension than the conventional tensor method. Through the analysis of accuracy improvement structure, we find that these significant accuracy improvements are manifested in the regions that land accounts for a large proportion in the ocean-land fraction, as well as in the time periods of anomalous climatic event. It indicates that the proposed method can efficiently improve the performance of conventional tensor method for the complex variation structure. Additionally, through the analysis of the isotherm pattern obtained by the proposed method, we find the zonal variation of temperature gradient that is consistent with the previous studies and the abnormal temporal variation of global air temperature. Consequently, the tensor decomposition by considering the heterogeneity, cannot only perform better than conventional tensor method but also reveal more fine information ignored in the conventional tensor method.
The tensor methods are more and more widely applied to the feature analysis of multidimensional data, such as the modal extraction, feature revelation, and so on. Most of these applications are based on the feature capture in tensor decomposition. Thus, the fidelity of feature capture is the key to successful tensor-based applications. This study integrates the heterogeneity to the process of tensor decomposition, which significantly improves the accuracy of feature capture. It will help to promote the application of the tensor method in the geological analysis.
In this study, the original data are partitioned into the local data with a consistent structure to reduce the affection of the heterogeneous variation in the tensor decomposition. The data partition here is based on the similarity measurement in multidimensional space, and the calculation process is complex in some degree. There is a potential to improve the parameterizations of partition strategy in the future by information fusion with some additional knowledge. Additionally, this study mainly focuses on spatiotemporal regular observations. Actually, many samples are irregularly observed. Hence, extending the method to irregular distribution also remains a task for future research.