# Using Information Flow for Whole System Understanding From Component Dynamics

## Abstract

Complex systems that exhibit emergent behaviors arise as a result of nonlinear interdependencies among multiple components. Characterizing how such whole system dynamics are sustained through multivariate interaction remains an open question. In this study, we propose an information flow-based framework to investigate how the present state of any component arises as a result of the past interactions among interdependent variables, which is termed as causal history. Using a partitioning time lag, we divide this into immediate and distant causal history components and then characterize the information flow-based interactions within these as self- and cross-feedbacks. Such a partition allows us to characterize the information flow from the two feedbacks in both histories by using partial information decomposition as unique, synergistic, or redundant interactions. We employ this casual history analysis approach to investigate the information flows in a short-memory coupled logistic model and a long-memory observed stream chemistry dynamics. While the dynamics of the short-memory system are mainly maintained by its recent historical states, the current state of each stream solute is sustained by self-feedback-dominated recent dynamics and cross-dependency-dominated earlier dynamics. The analysis suggests that the observed 1/*f* signature of each solute is a result of the interactions with other variables in the stream. Based on high-density data streams, the approach developed here for investigating multivariate evolutionary dynamics provides an effective way to understand how components of dynamical system interact to create emergent whole system behavioral patterns such as long-memory dependency.

## Key Points

- We develop a framework to characterize how the evolutionary dynamics in a multivariate system influence the present state of each variable
- In codependent long-memory processes, self-dependencies dominate recent dynamics, while cross-dependencies dominate earlier dynamics
- Partitioning information from the historical states in stream chemistry into different components shows signatures of stream solute origins

## Plain Language Summary

The observed dynamics of a variable in the natural environment is shaped by its interactions with several other variables. Our study provides a framework to determine characteristics of these interactions. It shows whether the present state of a variable is strongly influenced by its immediate past or interactions that happened in distant memory. Application of our approach to observed stream chemistry data shows that fractal signatures observed in the data are shaped significantly by interactions in distant memory. Further, the interaction structure reflects source origins of solutes where solutes with oceanic origins through atmospheric pathways and deposition have different behaviors than those originating within the watershed. This research opens up new ways to understand how component interactions give rise to whole system behavior.

## 1 Introduction

The self-organized behavior and associated patterns of form and function of watersheds, such as solute dynamics in a stream or ecohydrologic dynamics driving subsurface to atmosphere continuum, are a collective behavior resulting from the interactions among a multitude of components. Present-day advances in sensor and communication technologies and declining costs are allowing us to observe the dynamics of our environment at ever-increasing temporal frequency and spatial density. Simultaneous multivariate observations at high frequencies are opening up an unprecedented opportunity to understand and characterize deeply embedded interdependencies that govern process dynamics in our environment. How can we best use such high-dimensional data, arising from a number of simultaneously measured variables, to ask questions that take us beyond component level relationships to expose whole system behavior and enable us to identify system level attributes from component dynamics? On the flip side, can we also understand how system level constraints govern component level dynamics? In this paper we present a framework to address such questions by quantifying information flow among variables to characterize causal dependencies in complex systems (Balasis et al., 2013; Bollt et al., 2018). This draws upon the well-known idea that the whole is greater than the union of the parts. Coherent understanding of whole system dynamics from time series observations of several components in such systems can provide a unique perspective on system evolution and its dynamical characteristics.

Information is encoded in patterns of fluctuations in a signal, such as those recorded as time series by in situ instruments in a watershed. These fluctuations may be externally instigated, such as through variations in rainfall, or internally generated through nonlinearities in the system. As fluctuations propagate through different components in a watershed system, that is, one variable responds through its own fluctuation to that of another variable, we quantify it as flow of information from the latter to the former since the pattern of variability in one variable shapes that of another. Thus, information flow serves as the currency of exchange between interacting variables that fluctuate within the constraints of the conservation of mass, momentum, and energy. In a watershed, or dynamical systems in general, information flow captures the attenuation or amplification of fluctuations among variables, thereby revealing the dynamic connectivity between them (Goodwell et al., 2018). Analysis of such information flow can provide a unique vantage point for understanding watershed functions: that of quantifying multivariate causal interactions (Balasis et al., 2013; Bollt et al., 2018) that link across space and time scales of the watershed dynamics (Jiang & Kumar, 2019).

Consider a multivariate complex system with *N* variables,
, varying in time *t*. The current state of any variable
is an outcome of interactions in the entirety of all the earlier dynamics in the system. We call this prior dynamics
*causal history* (Jiang & Kumar, 2019). The interactions from the causal history can be parsed in a number of ways. The entire causal history can be divided into *immediate* and the complementary *distant causal histories*, partitioned by a time lag
(Jiang & Kumar, 2019). The quantification of the influences from immediate and distant causal histories, as a function of the partition time lag
, provides insights on the interplay between the influence of recent and prior dynamics on *Z*_{t}, since the dynamics of *Z*_{t} is sustained by the dependencies on its own past as well as the interactions with other variables. Here, we propose significant new advances that describe how such interactions can be computed from observed multivariate time series data. In essence we characterize how the outcome or state of a variable at any time can be quantified in terms of the interaction between its own history and those of other variables, both in the immediate and distant causal histories. These interactions are captured as unique, synergistic, and redundant components through the framework of partial information decomposition (PID; Goodwell & Kumar, 2017; Williams & Beer, 2010). This novel framework allows us to characterize the joint influence of self- and cross-dependencies in determining the current state of each variable. This approach enables us to explore how component interactions shape the whole system behavior and how whole system dynamics determines component dynamics.

The rest of the paper is organized as follows. In section 2, we briefly review and synthesize recent developments pertaining to information flow. These developments serve as a foundation and background of the proposed approach. Specifically, we summarize the approach of using directed acyclic graph (DAG) representation for time series (Lauritzen, 1996; Runge et al., 2012a), and its use for quantifying the influence of immediate and distant causal histories on the outcome of a variable (Jiang & Kumar, 2019). Based on these prior developments, in section 3 we present new results to capture the interaction between self- and cross-feedbacks between the target and other variables in the system. We show that this results in a challenge that is often referred to as the curse of dimensionality (Bellman, 1957), requiring us to estimate high-dimensional probability distribution functions. To address this problem, we develop an approximation to reduce the dimensionality based on weighted transitive reduction (WTR; Bosnacki et al., 2010) using momentary information transfer (MIT; Runge et al., 2012b) as weights. In section 4 we illustrate the approach through two applications: (1) an observed long-memory stream solute dynamics using published stream chemistry data (Kirchner & Neal, 2013; Neal et al., 2013) and (2) a short-memory coupled logistic model. Using the two applications, we examine how the differences between short- and long-memory processes are dependent on the self- and cross-feedback in the immediate and distant causal histories. Section 5 provides a discussion and conclusion.

## 2 Review of Information Flow in Observed Dynamical Systems

In this section, we review the framework for understanding causal dependencies in multivariate time series using information flow. That is, how different variables interact to determine the present state of a target variable in a number of ways. We first summarize its underpinning on information measures. Then, to represent the temporal dependencies of a system, a DAG representation for time series (Runge et al., 2012a; Spirtes et al., 2000) is described. Last, based on this representation of the dynamics, we summarize the information flow to the present state of a target variable through different pathways such as a directed edge, a causal path, a pair of separable causal paths, and causal history. This synthesis is then used for further developments presented in subsequent sections along with illustrative examples.

### 2.1 Background on Information Measures

*Shannon's entropy*quantifies the uncertainty of a variable

*X*

_{t}and is given by

*p*(

*x*

_{t}) is the probability of

*X*

_{t}. In a bivariate case of

*X*

_{t}and

*Y*

_{t}, the uncertainty of

*X*

_{t}that remains given the knowledge of

*Y*

_{t}can be quantified as a corresponding

*conditional entropy*:

*p*(

*x*

_{t},

*y*

_{t}) is the joint probability of

*X*

_{t}and

*Y*

_{t}. Moreover, the shared dependency between

*X*

_{t}and

*Y*

_{t}can be measured by using

*mutual information*of the two variables, which is given by

The last two equalities of equation (3) illustrate that *I*(*X*_{t};*Y*_{t}) symmetrically measures the shared dependency between *X*_{t} and *Y*_{t} or the reduced uncertainty of one variable given the knowledge of the other.

*Z*

_{t}is considered as a target influenced by two sources

*X*

_{t}and

*Y*

_{t}, the total uncertainty reduction of

*Z*

_{t}due to both

*X*

_{t}and

*Y*

_{t}is measured as the mutual information between

*Z*

_{t}and the union of

*X*

_{t}and

*Y*

_{t}, that is,

*I*(

*Z*

_{t};

*X*

_{t},

*Y*

_{t}). To further characterize different information contents in

*I*(

*Z*

_{t};

*X*

_{t},

*Y*

_{t}), PID (Williams & Beer, 2010) has been developed to decompose the total information into (1)

*synergistic information*—information jointly provided by both

*X*

_{t}and

*Y*

_{t}and not available from each variable alone, denoted as

*S*; (2)

*redundant information*—the overlapping information from the two sources, denoted as

*R*; and (3)

*unique information*—information provided by each source individually, denoted as

*U*

_{X}and

*U*

_{Y}, respectively. PID is then given by

An approach for computing these quantities that is suitable for environmental time series data is discussed in Goodwell and Kumar (2017).

### 2.2 Information Flow in DAG Representation for Time Series

To analyze how the historical dynamics shape the current state of a variable in a multivariate complex system, we represent the temporal dependencies of the system as a DAG for time series, as shown in Figure 1. The DAG represented as
is defined as follows. Each *node* refers to the state at time *t* of a variable in
. A *directed edge* in *E* linking two nodes *Y*_{t−τ} and *Z*_{t}, written as *Y*_{t−τ}→*Z*_{t}, stands for a direct causal influence from *Y*_{t−τ} to *Z*_{t}, where *τ* is a positive time lag. All the nodes directly influencing the target node *Z*_{t} form the *parent*set of *Z*_{t}, denoted as
. Besides, a source node *Y*_{t−τ} can be linked to a target node *Z*_{t} indirectly through a *causal path*
, which is a set of nodes connected by a sequence of edges linking from *Y*_{t−τ} to *Z*_{t}, that is,
. Based on the DAG representation for time series, we consider the influence of different ways in which historical dynamics can affect a target node. Specifically, by assuming that only past influences the future, information flow to a target can be classified in the following categories:

*Information flow through a directed edge*: Consider a direct causal influence, in a Granger sense (Granger, 1969), between two nodes

*Y*

_{t−τ}and

*Z*

_{t}through an edge, as shown in Figure 1a. That is,

*Z*

_{t}and

*Y*

_{t−τ}are connected with a directed edge if and only if a disturbance in

*Y*

_{t−τ}will result in a corresponding disturbance in

*Z*

_{t}when conditioned on the remaining past states of the variables in the system. Mathematically, such influence can be measured as a

*conditional mutual information*(CMI) and is given by (Runge et al., 2012a)

*Y*

_{t−τ}to

*Z*

_{t}conditioned on the knowledge of the rest of the dynamics of all interacting variables in the causal history excluding

*Y*

_{t−τ}.

*Z*

_{t}is statistically independent of its nondescendants if its parents are given, where . This property implies that

*Z*

_{t}is independent of given the knowledge of . Correspondingly, the CMI in equation (5) can be revised as

*Z*

_{t}and

*Y*

_{t−τ}are independent of the rest of the historical states if conditioned on the parents of the two nodes. An example of the condition set is illustrated as the gray nodes in Figure 1a for the influence from

*Y*

_{t−1}to

*Z*

_{t}. MIT quantifies the direct interaction between two nodes acting as source and target, by excluding any information from other nodes that may be flowing through the source or directly to the target.

*Information flow through a causal path*: In addition to a direct influence through an edge, a lagged source node

*Y*

_{t−τ}can also indirectly affect a target node

*Z*

_{t}through the corresponding causal path . An example is illustrated as the influence from

*Y*

_{t−3}to

*Z*

_{t}in the quadvariate system in Figure 1b. The quantification of the information flow through is the same as equation (5). However, the corresponding simplification of equation (5) based on Markov property is different from equation (6) and is given by

*momentary information transfer along causal path*(MITP; Runge, 2015). Note that the condition set in equation (7) is now defined by separating the union of

*Z*

_{t}and from the rest of the prior dynamics as illustrated by gray nodes in Figure 1b for the influence from

*Y*

_{t−3}to

*Z*

_{t}. Therefore, equation (7) gives the information flow from a single lagged source

*Y*

_{t−τ}to a target

*Z*

_{t}traversing only through the causal path .

*Information flow through two separable causal paths*: A natural extension of equation (7) is quantifying the information flow from two lagged sources and to a target

*Z*

_{t}through the corresponding causal paths and , respectively. For instance, see Figure 1c, which illustrates the influence from

*X*

_{t−3}and

*Y*

_{t−2}to the target

*Z*

_{t}through their corresponding causal paths. The total information given by the two sources can be quantified as the CMI between

*Z*

_{t}and the union of and given the knowledge of the prior dynamics and is given by (Jiang & Kumar, 2018):

*c*in the synergy

*S*

_{c}, redundancy

*R*

_{c}, and unique contributions

*U*

_{X,c}and

*U*

_{Y,c}represents that the decomposition is associated with causal paths. Whereas equation (4) characterizes the information contents given by two sources, equation (9) characterizes

*momentary partial information decomposition*(MPID; Jiang & Kumar, 2018) and focuses on characterizing the information only going through the pathways linking the sources and the target, with the influence from earlier dynamics excluded through conditioning. An example that illustrates the difference between equations (4) and (9) is provided in section 4.1 using observed stream chemistry data.

*Information flow from immediate and distant causal histories*: A further look at the current state of a target variable, *Z*_{t}, in Figure 1d reveals that *Z*_{t} is in fact a result of the prior states of all interdependent variables in the system, that is, the causal history
, with information flowing through a multitude of different pathways in the DAG. The causal history can be partitioned into two different complementary components: (1) a recent dynamics arising from all the previous states up to the time step
, termed *immediate causal history*, represented by a subgraph
consisting of all the causal paths from the contemporaneous nodes
to *Z*_{t}, and (2) the remaining earlier dynamics, termed *distant causal history*,
. An example of immediate and distant causal histories is illustrated in Figure 1d. With the growth of *partitioning time lag*
, the information from immediate causal history and the complementary distant causal history can be expected to increase and decrease, respectively. Especially, the decay or asymptotic convergence of the information from the distant history illustrates the memory dependency of the system (Jiang & Kumar, 2019). For example, long-memory systems have persistent nonzero information from distant causal history even with large partitioning time lag
.

*Z*

_{t}that belongs to the immediate causal history (shown as the blue nodes in Figure 1) and referring to the parent set of both

*Z*

_{t}and the immediate causal history, which belongs to the distant causal history (shown as the orange nodes in Figure 1d). The simplifications of immediate and distant causal histories into and , respectively, based on the Markov property, illustrate the aggregation of information in the DAG for time series. That is, the information from the dynamics in the system is aggregated at the nodes directly affecting the node(s) of interest (Jiang & Kumar, 2019). Therefore, equation (12b) indicates that has all the information from the distant history such that the mutual information with

*Z*

_{t}characterizes the dependency with the target node

*Z*

_{t}. Similarly, equation (12a) indicates that when conditioned on , includes all the information from the immediate causal history flowing to

*Z*

_{t}.

As summarized above, the DAG representation for time series provides effective formulations for capturing a range of different types of information flow to the present state of a target variable, which originates from one or two lagged sources or all of the historical dynamics. Especially, the causal history analysis approach provides a new way to investigate the influence of the entire evolutionary dynamics from the perspective of multivariate causal interactions. While previous study partitions the causal history based on a time lag , a more detailed examination is desired to explore the interplay between different variables in the causal history. Such exploration is crucial in that it can potentially provide more insights through a finer multivariate analysis on the whole system's dynamics. This detailed characterization is anchored on a physically reasonable segmentation of the causal history as well as the corresponding PID, which is addressed in the next section and serves as the primary contribution of this work.

## 3 Quantifying Multivariate Interactions

In addition to the temporal separation of the causal history through a variable partitioning time lag , the analysis of multivariate time series using DAG representation also allows for the possibility of partitioning of the immediate and distant causal histories into self- and cross-dependencies, as shown in Figure 2a. A study of how self-feedback and historical states of other related variables, from both recent and distant dynamics, jointly affecting the current state of a target variable would potentially help reveal how multivariate causal interactions lead to evolutionary behavior of a system. Our goal now is to make this notion precise and demonstrate the application of the developed framework.

### 3.1 Interaction Through Information Flow

*self-dependence*and

*cross-dependence*. The first considers how a variable's own history influences its present state, while the latter captures the influence of all other variables arising through interactions as depicted in a DAG. Practically, instead of the original distant causal history, , we partition containing the information from earlier dynamics that directly affects the immediate causal history and

*Z*

_{t}, into (1) self-dependence, (the orange box in Figure 2b), and (2) cross-dependence, (the dashed orange box in Figure 2b). The total information from the distant causal history, represented now by , is quantified as in equation (12b). The partitioning of into and further allows the decomposition of into synergistic, redundant, and unique contributions by using the PID framework, which is given by

*Z*

_{t}, into the self- and cross-dependencies, represented by (the blue box in Figure 2b) and (the dashed blue box in Figure 2b), respectively. The corresponding PID from the two parts of the immediate causal history is given by

In this study, we employ the rescaled approach of Goodwell and Kumar (2017) for computing the PID of
,
, and
in equations (13)–(15). The rescaled approach estimates the redundant information by considering the mutual dependency between two sources and ensures a nonnegative information partitioning. Further, the empirical estimation of all the information-theoretic measures is obtained based on the *k*-nearest-neighbor (*k*NN) method (Frenzel & Pompe, 2007; Kraskov et al., 2004). The number of nearest neighbors, *k*, is set to five to facilitate a low bias of (conditional) mutual information (Kraskov et al., 2004; Frenzel & Pompe, 2007). In the analysis presented later in section 4, we first compute
and
, along with their PIDs, in equations (13) and (14), respectively. Then, the PID of
is obtained based on the sum of
and
according to equation (15).

### 3.2 Dimensionality Reduction Using Momentary Information WTR

In addition to the availability of time series data, the validity of empirically estimating the information metrics in equations (12)–(15) also depends on the number of nodes involved in the DAG. The dimensionality required in the computation can still be high even after the reduction achieved by assuming the Markov property for DAG (Runge et al., 2012a). For example, consider the node *Z*_{t} in Figure 2. The nodes of immediate and distant causal histories required in computing
(equation (12a)) and
(equation (12b)) are now reduced into those associated with
and
(blue and orange nodes, respectively). However, the dimensions of
and
as well as those associated with computing the PID can still be high for reliable estimations of these information-theoretic metrics in equations (12)–(15). Particularly, the condition set,
, involved in computing both
and
, contains the parents of the entire immediate causal history and accounts for most of the dimensionality in the computation of
and
, as shown by the orange nodes in Figure 2b. This dimensionality can grow large quickly as the number of variables increase and/or number of lags that influence a target increases. To address this problem, we introduce a new method called the *momentary information weighted transitive reduction* (MIWTR) to further reduce the dimensionality involved in computing the information measures. It builds on the WTR (Bosnacki et al., 2010) for reducing the complexity of the DAG. Briefly, WTR enables the removal of a directed edge between two nodes if there exists a stronger pathway linking the nodes indirectly, with the strength of a path or edge assessed by its associated weight. Here, we use the WTR to simplify the time series graph, which results in reduced dimensionality for computing the information theoretic measures. The general idea is to first remove edges linking
with the immediate causal history,
and then exclude any node
not directly linked to
, thus obtaining a reduced
. We use the MIT (Runge et al., 2012b), which reflects the degree of direct dependency between two nodes, as the weight for the transitive reduction. The details of this MIWTR approach is described in the appendix.

## 4 Characterizing Multivariate Interaction in Causal History

*t*through the framework of causal history (Figure 1d and equations (10)–(12)). Causal history can be decomposed into complementary components of immediate and distant causal histories as a function of a varying partition time lag . We have shown that both immediate and distant causal histories can be further partitioned into self- and cross-dependencies (Figure 2). Each of these interaction can then be explored through PID (equations (13)–(15)). We can, therefore, ask the following:

- How does information flow, jointly provided through the entire causal history, sustain the multivariate interaction associated with the whole system dynamics?
- How does the characterization of such information at the system level reveal the unique contribution of each individual component in the system?

To address the above two questions, we implement the proposed causal history analysis approach to study the information flow in two different systems by using time series directly from observational data or synthetic data: a hydrobiogeochemical system with observed stream chemistry system known to exhibit long-term memory and a short-memory logistic model. Then, we summarize the insights obtained from these applications.

### 4.1 Stream Chemistry Dynamics

We first analyze a set of published stream solute data (Kirchner & Neal, 2013), recorded in the Upper Hafren catchment in the United Kingdom. The Hafren river is a tributary of the Severn River, located in mid-Wales and around 20 km from the western coast. The Upper Hafren catchment is covered by grassland and operates upon acidic soils (Neal et al., 1997). The coastal location and the vegetation in the catchment results in two origins of the stream solutes: the marine origin through atmospheric deposition and the terrestrial origin through both transport and biogeochemical processes in the subsurface. Stream solutes were collected and sampled every 7 hr from March 2007 to January 2009 (Neal et al., 2013). It is found that the stream solute data have 1/*f* fractal signature after correcting the flow rate influence in the observations (Kirchner & Neal, 2013). The fractal signature of the stream chemistry provides evidence of long-memory dependency (Kirchner & Neal, 2013), which is sustained by multivariate interactions within the catchment system (Jiang & Kumar, 2019), and provides an ideal test bed for analyzing component dynamics and emergent system behavior by using the causal history analysis approach developed in section 3.

We utilize the logarithm of flow rate,
, and six stream chemistry variables, Na^{+}, Cl^{−}, Al^{3+}, Ca^{2+}, SO
, and pH for analysis. The time series for both the raw data and the flow rate-corrected data are plotted in Figures 3a and 3b, respectively. The causal history analysis is conducted on both data sets, to investigate how the evolutionary dynamics influences the current state of each variable as well as how the flow rate affects these states. The DAGs for time series of the two data sets are estimated by using the Tigramite algorithm (Runge, 2015; Runge et al., 2012a, 2015, 2017). Tigramite is a modified PC algorithm (Spirtes et al., 2000) for constructing the DAG representation of time series by using MIT-based conditional independence test to remove spurious relationship between two nodes. In this analysis, each conditional independence test is performed based on 100 samples with a significance level *α*=0.05, and *k*NN approach is utilized to compute the corresponding CMI with *k* = 100 (high *k* facilitates a low variance of the estimated CMI; Frenzel & Pompe, 2007). The maximum time lag for establishing the directed edges between lagged variables in the algorithm is set to 5 in constructing each graph. The resulting time series graphs are sketched in Figures 3c and 3d. The thickness of each edge are based on the coupling strength between two connected nodes, computed by MIT in equation (6). It can be observed that while there are more cross-dependencies in the graph constructed from the raw data, self-feedback interactions are more dominant in the flow rate-corrected graph with less cross edges. The comparison between the two DAGs illustrates that flow rate is an important factor for creating cross-dependencies among the stream solutes, as expected. However, it is also remarkable that cross-dependencies exist between the variables in the absence of flow dependencies, which suggests that they may play a role in the sustenance of long-memory property.

Based on the DAG estimated with raw data in Figure 3c, we first illustrate the information interaction through causal paths, by characterizing the joint influence from lagged Al^{3+} at time *t*−2 (
) and SO4^{2−} at time *t*−2 (
) to the current state of pH (pH_{t}). Figure 4a shows the two causal paths (blue nodes) from the two sources to the target pH (black node). We first compute the PID in equation (4) by using the rescaled approach (Goodwell & Kumar, 2017) for estimating different information components. The resulting information characterization is shown in Figure 4a. We observe a very dominant unique information from aluminum, *U*_{Al,c}. This is because of the direct influence of
on pH_{t} while
indirectly affects pH_{t}, which contributes less unique information *U*_{SO4,c}. The remaining synergistic and redundant information illustrates the joint effect of the sources as well as the overlapping effect of the two, which is explained by the dominance of the stream dynamics (e.g., flow rate) in governing the trivariate interactions (Jiang & Kumar, 2019). Nevertheless, when we block the information flow from the remaining graph by conditioning on the parents of two causal paths (orange nodes in Figure 4b) and compute the corresponding MPID in equation (9), the resulting information decomposition differs. First, the total information drops significantly from 0.902 [nats] to 0.088 [nats] due to the prevention of information from earlier dynamics through conditioning. Second, the synergy now dominates, while the redundancy effect diminishes. The strong synergistic information implies the strong overall effect of aluminum and sulfate on stream pH. The decrease of redundancy is due to the conditioning stopping the dominant influence from other variables, such as flow rate. The above examples serve to illustrate the formulations presented in the review section and provide the backdrop for the framework developed in this paper.

We now further compute the information flow from the causal history over the partitioning time lag
ranging from 5 to 400 using equation (12) for both DAGs in Figures 3c and 3d. We compute the corresponding PIDs in immediate and distant causal histories using equations (13)–(15). Note that
is obtained by the sum of the estimated
and
. The computations of
and
are based on the implementation of MIWTR. For instance, to compute the information transferred to Na^{+} from immediate and distant causal histories separated with time lag
by using the raw data, we first identify
in equation (12) based on the DAG generated in Figure 3c, shown as the orange nodes at the top of Figure 5. Then, MIWTR is implemented to remove the nodes in
whose edges connected to the corresponding immediate causal history are excluded by using WTR. The resulting simplified
is illustrated in the bottom of Figure 5, showing the reduction of the cardinality of
from 31 to 17. MIWTR is implemented for each variable at each partitioning time lag
in both DAGs in Figures 3c and 3d. The corresponding cardinalities with and without MIWTR are shown in Figure A6a. It can be observed that for each variable a significant dimension reduction is achieved between 5 to 15 due to the simplification of
. The resulting information flows from causal history and their corresponding information partitioning for the systems with and without the influence of flow rate over time lag
are plotted in Figures 6 and 7, respectively.

Figure 6 shows that for each variable in both DAGs, the information from the entire causal history, , is almost invariant with the time lag . It is due to the fact that the influence from the entire evolutionary dynamics of the system is independent of the time lag used for the partition of distant and immediate histories. Furthermore, the nonzero convergence of for almost each variable implies the presence of long-memory dependence of the stream chemistry dynamics. In addition, the comparison of the results between raw data (blue region) and flow rate-corrected data (red region) shows both and drop significantly when the influence of the flow rate is corrected. It is due to the dominance of the flow rate in providing significant amount of information for the dynamics of each selected variable. All the above conclusions are consistent with the results estimated without MIWTR in Jiang and Kumar (2019), illustrating the reliability of the usage of MIWTR in reducing the dimensionality of computing information flow.

Moreover, the characterizations of the information from the entire causal history ( ), its immediate ( ) and distant ( ) partitioning in Figure 7 provide a richer picture about the influence from the system's whole evolutionary dynamics on the present state of each target. The partitioning of in Figures 7a and 7b shows that self- and cross-dependencies contribute to different information contents in of both graphs. While the unique information, , contributes most of the information from cross-dependencies for all variables, the main contributor for the influence from self-feedback interactions differs. When the influence of flow rate is included, the redundant information, in Figure 7a, is stronger in the self-feedback influence. On the other hand, when the flow rate influence is excluded, the unique information of self-dependency, in Figure 7b, dominates.

In addition to
, the information partitioning of
and
in Figures 7c and 7d delineates the contribution of different information contents from a recent and the remaining earlier dynamics of the system. It can be observed that the dominance of self-dependency in immediate history is independent of
through either unique
or redundant
information. However, the influence from distant history is attributed by both self- and cross-dependencies for small
but is dominated only by cross-dependency through its unique information
as
increases. This is because the influence from self-dependency is limited in recent dynamics such that when the separation time lag
is too small, some of the self-dependency influence will be reflected in the distant causal history and squeezed back to the immediate causal history as
grows. It also implies that the influence from self-feedback interaction dominates the recent dynamics of each target variable, while the interaction with other variables dominates the dynamics of the target in the long term. This is especially insightful in understanding the 1/*f* fractal dynamics of these stream solutes. The important role of self-feedback interaction in determining a self-similar process is well accepted; however, the role of interaction with other related variables of a system is usually not considered. The PID of causal history now allows us to explicitly quantify this role through information flow. It shows that the influence due to the cross interactions is significantly crucial in sustaining the long-memory behavior of the stream chemistry. This analysis leads us to postulate that in order to sustain the complex long-memory behavior with several interacting components, a complex system requires (1) the influence from the self-feedback interactions in recent dynamics for guiding the short-term trend of the system and (2) the cross-dependency in earlier dynamics for supporting the long-term trend.

Also, the PID results of information from the immediate and distant causal histories reveal different dynamics for the different solutes studied. In the flow rate-dominated system, most solutes show similar PID patterns as plotted in Figure 7c. That is, for each solute, the information from distant causal history,
, mainly consists of
and
, and the information from immediate causal history,
, mainly consists of
and
. Again, the larger redundant information in both distant and immediate causal histories—
and
—is due to the influence of flow rate. Meanwhile, different from the other solutes, SO
shows a stronger unique information due to its self-feedback interactions in recent dynamics,
. It implies that SO
is less subjective to the influence of flow rate than other solutes, which is evident from the negligible changes between SO
's PID when comparing results with and without flow rate in Figure 7d. However, when the influence of flow rate is removed, the PID patterns differ for each variable as shown in Figure 7d. For instance, Na^{+} and Cl^{−} show stronger unique information due to their self-feedback dynamics in both distant and immediate causal histories, represented by
and
, respectively. This is consistent with the fact that the majority of the sodium and chloride in the studied catchment, which is close to the coast, originates from the ocean and are brought inland through atmospheric deposition (Neal et al., 1997). Therefore, compared with other solutes, the states of Na^{+} and Cl^{−} are more influenced by their own dynamics and less by other variables. Meanwhile, for solutes with evenly mixed origins from both ocean and catchment, such as Ca^{2+} and SO
(Neal et al., 1997), there is higher redundant information in their distant causal histories,
. This illustrates the shared influences from the catchment processes and oceanic origins and atmospheric pathways for both Ca^{2+} and SO
, represented by
. Lastly, Al^{3+} shows dominant redundant information from both distant and immediate histories,
and
, respectively. This coincides with its solely terrestrial origin (see Table S3 in Kirchner & Neal, 2013), such that the catchment processes are a strong determinant of the state of Al^{3+}.

The PID approach in conjunction with causal history framework for the solute dynamics has enabled us to characterize the whole system behavior through the DAG representation and associated information flow (Figures 7a and 7b) as well the effect of interactions on the dynamics of each solute. We find that maintaining the whole system dynamics mainly results from a self-dependency-dominated immediate causal history and a cross-dependency-dominated distant causal history. Also, the characterization of information from immediate and distant causal histories reveal differences due to the origins of each solute.

### 4.2 A Short-Memory Dynamics: A Trivariate Chaotic Model

*ϵ*is its coupling strength. The coupled logistic model shows different degrees of synchronizations as a function of

*ϵ*(Atay et al., 2004; Paredes et al., 2013; Rosenblum et al., 1997). Its symmetric structure with lag one interaction proves to be a short-memory process, as evident from a recent study (Jiang & Kumar, 2019). That is, the current state of each variable is only controlled by a finite set of historical states. Here, we analyze the influence from causal history on a target variable,

*X*

_{3,t}, based on a mild noise effect (i.e.,

*ϵ*= 0.3). After partitioning the immediate and distant causal histories of

*X*

_{3,t}based on an earlier time step shown in Figure 8a, we identify as blue nodes, as orange nodes, and the self-and cross-dependencies of the two histories in solid and dashed boxes, respectively. , , and their corresponding information characterization are calculated for ranging from 1 to 50 based on equations (13) and (14), with 10,000 synthetic data points generated to conduct the empirical estimations for each .

The characterization of information flow from distant and immediate histories are plotted in Figure 8b (similar to that in Figure 7). Different from the long-memory stream chemistry dynamics, of the logistic model (the area above the black dotted line) converges to zero with increasing , indicating the short-term dependence of the process. Furthermore, we observe an overall very strong redundant information contributed by both an increasing from immediate causal history and a decreasing from distant causal history as grows. The opposing changes of and illustrate the exchange of redundant information from distant causal history to immediate causal history when more states are entrained into immediate causal history. The strong overall redundancy is due to the symmetrical structure of the model in equation (4.2), such that the dynamics of the three variables are similar to each other and, therefore, provide significant overlapping information to the others. In addition, the influence from cross-dependence is now dominated by immediate causal history through rather than distant causal history as observed in stream chemistry dynamics. This, again, is because of the short-term dependence of the logistic system leading to the contributions of both self- and cross-dependence interactions originating from recent dynamics.

### 4.3 Insights From the Applications

Characterizing the information flow from the causal history in the two systems reveals the whole system behavior as well as the dynamics of each of its component. Therefore, it helps address the two questions raised at the beginning of this section.

First of all, the dynamics that sustain the whole system behavior vary between the systems. For a short-memory system, such as the trivariate logistic model, the present state of each variable is maintained by the recent dynamics including both self-feedback interactions and the influence of the other variables. Meanwhile, for a long-memory system, the influences for sustaining the whole system behavior are mainly contributed by a self-dependency-dominated immediate causal history and a cross-dependency-dominated distant causal history. It implies that while the self-feedback interaction from recent dynamics is critical for the short-term dynamics of each variable, the cross-dependency interaction from distant causal history is responsible for its long-term memory.

Second, the dynamics of each component in a complex system can be indicated by characterizing the information from the interactions between self- and cross-dependencies in immediate and distant causal histories. This is complementary to the previous findings associated with the dynamics sustaining the whole system behavior. The previous conclusion depicts the dynamics maintaining the complexity at a system level. On the other hand, this conclusion details the unique dynamics of each variable through the information characterization on the system's dynamics. In the analysis of stream solute dynamics, while they have been widely and consistently found to have fractal behavior (Kirchner & Neal, 2013), the origin of each solute and how they interact with each other differs. The different origins of solutes are reflected by the proportions of mixed redundant and unique information from self-dependency when the dependency of the flow rate is excluded.

## 5 Discussion and Conclusion

This paper presents an information flow framework to understand the whole system behavior arising from the multivariate interactions occurring between component dynamics. A fundamental need driving this framework is to develop approaches for understanding how interactions between the parts creates emergent whole system behavior. Our study shows that the complexity or emergent dynamics, such as long-memory behavior, results from the multivariate interactions in the entire evolutionary dynamics of the system, or causal history.

Our approach blends the PID technique with the causal history analysis for characterizing the information flow to a target variable, from its self-feedback interactions and the cross-dependencies in both immediate and distant causal histories (see the top of Figure 2). While there are many ways to partition the causal history, we find that the proposed partitioning in terms of the self- and cross-dependencies in a recent and prior earlier dynamics is a reasonable way to reveal the key aspects of interactive dependencies. First, the difference between the influences from immediate and distant causal histories illustrates the memory dependency of the system (Jiang & Kumar, 2019). Second, the strong self-feedback interaction observed in many systems suggests that its interplay with the dynamics of other variables might be one of the keys for determining the current state of each target variable.

Based on the analyses of the observed stream chemistry dynamics and the synthetic model, we find that information characterization differs from system to system, thus illustrating their different behavior. While the future trajectory of a short-memory system is dictated by its recent dynamics as shown for the logistic model, the dynamics of a long-memory system is mainly sustained by the influences from the self-dependency-dominated immediate causal history as well as the cross-dependency-dominated distant causal history. In other words, in a long-memory system, the self-feedback interaction in recent dynamics determines the recent trend of a target variable, and the influence from the distant causal history, on the other hand, guides the long-term evolution. In the analyses of stream chemistry system, the consistent influence on long-term dynamics is evident from the strong unique information of the cross-dependency in distant causal history.

Both the structure of interaction between the variables (i.e., DAG) and the expression of dependencies between them through immediate and distant causal history interactions can be influenced by the presence of deterministic or regular patterns in the data. Examples include diurnal or seasonal cycles. The analyses with and without flow rate corrections allow us to examine the relative importance of such regular patterns. For the stream chemistry study, we see that seasonal flow variability impacts the measures as is evident through comparison of results with streamflow influence removed, although the analysis with the flow influence removed captures the long-memory persistence. Our approach, therefore, demonstrates that some care is needed in the interpretation of results when periodic or regular patterns are present.

Two key issues associated with estimating reliable information measures are the choice of *k* in *k*NN estimator and the usage of MIWTR for dimensionality reduction. In this study, we set *k*=5 throughout the paper to reduce the estimation bias (Frenzel & Pompe, 2007). Studies have been done for the sensitivity analysis of choosing different *k* values by using synthetic models (see Figures 4.2–4.6 in Runge, 2014). They show that setting *k* within 5 to 10 is able to quantify reliable causal strengths for CMI with high dimensions around 10 and time series length longer than 1,000. This analysis serves as a basis for using *k*NN with *k*=5 in this paper, where CMI with dimensions around 10 to 17 are estimated by using around 2,000 observational data points. In addition, MIWTR is developed to achieve an efficient estimation of different information measures (in equations (12)–(14)) by reducing the cardinalities of the condition set used in the estimation. WTR is better suited for simplifying a DAG than the traditional transitive reduction in that the weights or the strengths of edges are taken into account (Bosnacki et al., 2010). That is, the higher strength an edge has, the less likely that it will be excluded. An example of computing the information flow in a quadvariate logistic model in the appendix with and without this approach establishes the feasibility of its usage. Estimation of information theoretic measures using limited data size is a challenge. Through Figure A3, we show that the length of available data is adequate for such estimation and produces estimates consistent with data lengths an order of magnitude larger. However, much more research is required in this field, as the dimensionality grows quickly with increasing number of variables. We thus anticipate that the effectiveness of this approach will improve for more complex DAG, such as that arising when more variables are observed and included in the graph construction to explore interdependencies.

The approach presented here is fundamentally different from most existing information-based approaches, which either only focus on pairwise interaction or interactions in a specific part of the system. This uniqueness sheds light on how the complex system dynamics are sustained over time, thus improving our understanding of the whole system dynamics and the role of individual components. This is especially helpful in the current age of big data. With the increasing availability of observations, these data-driven tools will provide more insights in different scientific domains. Such data-driven approaches will open up new avenues for investigating complex system dynamics.

## Acknowledgments

Funding support from the following NSF grants is acknowledged: EAR 1331906, ACI 1261582, and EAR 1417444. We also thank Allison Goodwell for her comments that helped improve the manuscript. The directed acyclic graph for time series of the stream chemistry example is estimated by using the Tigramite package (Runge et al., 2012a; Runge, 2015; Runge et al., 2015; Runge et al., 2017). The codes for conducting momentary information weighted transitive reduction and calculating the information flows in the stream chemistry and logistic examples are available at GitHub (https://github.com/HydroComplexity/CausalHistory).

## Appendix A: Dimensionality Reduction Using MIWTR

In the appendix, MIWTR is developed to reduce the number of nodes in for computation of equations (12)–(14). MIWTR builds on WTR by using MIT defined in equation (6) as the edge weight. Since MIT reflects the strength of direct coupling between a source and target, it serves as an excellent choice. We first provide the procedures of implementing MIWTR and then verify its feasibility through a quadvariate logistic model.

### A1. Method

WTR builds on the transitive reduction (TR; Aho et al., 1972). For a DAG, TR is aimed at removing “redundant” edges while keeping the connectivity structure of the graph. It is anchored on the idea that a transitive reduced graph can be obtained by removing any directed edge from in the original graph if there exists an indirect path connecting the two nodes. However, in a weighted graph, TR potentially removes some “important” edges that have large weights. To avoid that, WTR takes a step further by considering the weights of the edges in the reduction. That is, an edge linking nodes and is removed from the original graph if and only if there exists a stronger indirect path from to . Otherwise, the edge is kept in the graph.

*E*

_{1}. We define a path from node to as , where all the nodes and edges in are in and

*E*

_{1}, respectively. Note that the corresponding causal path is the union of all the paths, , from to . The representative weight of the causal path is defined as the maximal transitive influence (Bosnacki et al., 2010) and is given by

*A*,

*B*, and

*C*) and three corresponding weighted edges (i.e.,

*A*→

*C*,

*A*→

*B*, and

*B*→

*C*). TR removes the edge

*A*→

*C*due to the existence of the path

*A*→

*B*→

*C*indirectly connecting

*A*and

*C*. Meanwhile, WTR keeps

*A*→

*C*because the corresponding maximal transitive influence

*h*

_{AC}=

*w*

_{AC}=2. However, if the weight

*w*

_{AC}is changed to 0.9,

*A*→

*C*will be removed in WTR since now

*h*

_{AC}=1>

*w*

_{AC}=0.9.

*Z*

_{t}in the DAG for time series as the target and as the time lag for separating into an immediate and a distant causal history. We now define a subgraph of , . The node set includes the union of the immediate causal history and , that is, . The edge set

*E*

_{s}contains all the edges in . The procedures for reducing the dimensionality of by using MIWTR is as follows.

- Implement WTR to exclude edges in , generating a new graph , where includes the edges remaining after the implementation of WTR on .
- For each node , check whether there is an edge linking to any node in the immediate causal history based on the new graph . If there is no edge , remove from .
- Repeat removal of nodes in the previous step for every node in .
- Return the reduced set .

Consider the DAG for time series in Figure 2 as an example. in the orange nodes can be further reduced by excluding and if the edges and are removed by using MIWTR. A validity test for verifying the MIWTR-based reduction of in computing equations (12)–(15) is illustrated through a quadvariate logistic model in the appendix. We note that MIWTR algorithm needs to be implemented for each distant/immediate causal history segmentation associated with every for each target variable.

### A2. Verification of MIWTR: A Test on a Quadvariate Logistic Model

*ϵ*is set as 0.2.

The procedures of computing the information flow are as follows. We first use the Tigramite package to construct the directed acyclic time series graph based on the synthetic data generated from equation (A2). Given the graph describing the causal history, MIWTR is employed to simplify the condition set
according to the procedures in section1.
and
and their PIDs are then computed, with the partition time lag
ranging from 5 to 50, using *k*-nearest-neighbor method with *k*=5. The parameter setting of the Tigramite is the same as the stream chemistry analysis in section 4.1. Further, to analyze how the data length affects the performance of MIWTR, we compute the information flow with time series lengths 200, 400, 600, 1,000, 5,000, and 10,000.

The cardinalities of and with and without MIWTR are plotted in Figure A2a. It can be observed that the dimensions decrease with increasing length of synthetic data. This is because more training data allow a more reliable estimation of the directed acyclic time series graph. The estimated graph becomes stable when data length is larger than 1,000, indicated by the convergence of the decreasing cardinalities. Furthermore, for a given data length, we also observe significant drops of dimensions for both and due to the reduction of by using MIWTR. The reduced dimension of is around 10 for data length greater than 1000.

The plots of and with and without MIWTR are shown in Figure A2b. We can observe that both the results using MIWTR (solid lines) and without MIWTR (dashed lines) converge and are pretty close to each other, especially for data length greater than 1,000. Another visualization of using MIWTR with different data lengths is plotted in Figure A3. It can be observed that for each , both and estimated with 1,000 data points are consistent with estimates obtained by using longer data series, with only slight improvements when the data length increases from 1,000 to 10,000. Further, the comparison between each information measure with and without MIWTR shows consistency of estimation when MIWTR is used. It implies that in this quadvariate logistic model, the implementation of MIWTR in reducing the dimensions can ensure a reliable estimation of and given enough time series data (>1,000).

The PIDs for and with MIWTR are plotted in Figures A4a and A5a, respectively. Both and contains dominant redundant information, which are and . It illustrates the symmetric structure of the model in equation (A2). Also, the comparison between the information partitioning with and without MIWTR, in Figures A4b and A5b, shows that the differences are close to zero when more than 1,000 data points are used. This is consistent with the conclusion that the cardinality reduction based on MIWTR does not affect the estimation of information-theoretic measures significantly when the time series data are sufficient.

In the analysis of stream chemistry data and weather station data in section 4, the cardinalities of and of all the variables are reduced to be less than or around 20 by using MIWTR. Based on the quadvariate logistic model example, the associated estimations of information flows in Figure 7 are reasonable, because the corresponding time series lengths of the data (around 1,000–4,000) are sufficient to achieve reliable estimation.