Associations Between Environmental and Sociodemographic Data and Hepatitis‐A Transmission in Pará State (Brazil)

Abstract Hepatitis‐A is a waterborne infectious disease transmitted by the eponymous hepatitis‐A virus (HAV). Due to the disease's sociodemographic and environmental characteristics, this study applied public census and remote sensing data to assess risk factors for hepatitis‐A transmission. Municipality‐level data were obtained for the state of Pará, Brazil. Generalized linear and nonlinear models were evaluated as alternative predictors for hepatitis‐A transmission in Pará. The Histogram Gradient Boost (HGB) regression model was deemed the best choice (RMSE= 2.36, and higher R2 = 0.95) among the tested models. Partial dependence analysis and permutation feature importance analysis were used to investigate the partial dependence and the relative importance values of the independent variables in the disease transmission prediction model. Results indicated a complex relationship between the disease transmission and the sociodemographic and environmental characteristics of the study area. Population size, lack of sanitation, urban clustering, year of notification, insufficient public vaccination programs, household proximity to open‐air dumpsites and storm‐drains, and lack of access to healthcare facilities and hospitals were sociodemographic parameters related to HAV transmission. Turbidity and precipitation were the environmental parameters closest related to disease transmission. Based on HGB model, a hepatitis‐A risk map was built for Pará state. The obtained risk map can be thought of as an auxiliary tool for public health strategies. This study reinforces the need to incorporate remote sensing data in epidemiological modelling and surveillance plans for the development of early prevention strategies for hepatitis‐A.

political unit of choice, being the smallest political-administrative unit of the Brazilian federative republic (Ramalho, 2020).

Epidemiological Data
Information on hepatitis-A cases was obtained from the Notifiable Diseases Information System (SINAN) of Brazil's Ministry of Health (MS, 2007). Data included individual names and addresses, all of which were omitted to ensure and preserve confidentiality, and comprised Pará's residents confirmed to be infected with the hepatitis-A virus between January 2008 and December 2017. The data was aggregated by municipality and month. The epidemiological data set was geocoded by municipality and consists of 5,500 reported positive new cases (RPC), representing 4.26% of all RPCs in Brazil.

Sociodemographic Data
Annual data on the coverage of the anti-HAV vaccination program and on the number of live births in each municipality were obtained from the SUS's Information Technology Department platform (DATASUS) (MS, 2019), encompassing annual vaccination rates per municipality for the 2014-2017 period. Population coverage of anti-HAV vaccination is the ratio between vaccinated individuals (infants and children under 2) and the total population of a given municipality.
A total of eight variables were obtained from the IBGE, (2011) census data (IBGE, 2011): households with/ without sanitation; households near storm drains; households near open-air sewage discharge; households near open-air dumpsites; households with running water; households with water-wheel; and households with a self-supplied water. These census data indicate the number of households in each condition. Therefore, each variable was transformed into relative percentages by dividing the number of households by the total number of households in each municipality. The annual population estimate per municipality was also obtained from the IBGE (IBGE, 2017). The demographic data was applied to evaluate the incidence of the disease in each municipality. A temporal dependence was also incorporated to the model by adding LEAL ET AL.  All geographical and political boundaries and shapes (municipalities and mesoregions) were obtained from the IBGE (IBGE, 2019). The municipalities' centroid coordinates (longitude and latitude) were taken as covariates during the modelling, enabling the integration of spatial dependence into the models. The municipalities' centroid coordinates were previously reprojected for the SIRGAS 2000 polyconic projection.
Data on surface daytime and nighttime temperatures (SDT and SNT , respectively) were derived from the Moderate Resolution Imaging Spectroradiometer (MODIS), product MOD11A2, with 1 km 2 spatial resolution (Wan et al., 2015). SDT and SNT are important factors that induce human behavior, as fluctuations in their values indirectly influence human activities such as bathing, hydration and water recreation (Parsons, 2003). Thus, one should expect oscillations in daily temperatures to influence HAV transmission.
The Landsat surface reflectance data set was used to estimate the TSM and the turbidity of the waterbodies of each municipality in Pará. These water quality parameters include water transparency Stech, 2016b;Ody et al., 2016;Rodrigues et al., 2017), transmittance , and, consequently, the amount of solar irradiance available in the system. Since solar irradiance directly influences virus survival in aquatic systems through photodegradation (Bales et al., 1993;Hu et al., 2015;Mavignier & Frischkorn, 1992;Sattar et al., 2000), TSM and turbidity are expected to be indirectly related to HAV survival, and therefore, to viral transmissivity.
Turbidity was estimated using a semi-empirical algorithm previously validated for both estuarine and coastal waters (Dogliotti et al., 2014). The algorithm relates turbidity to remote sensing reflectance at wavelength    , with     w .     w is defined as the ratio of water-leaving radiance (    w L ) and the above-water down- ). The resulting turbidity is expressed in Formazin Nephelometric Units (FNU).
The algorithm was validated for independent environments, with stable performance and relative mean error below 13.7%. The algorithm is described in Equations 1, 2 and 3.  (Funk et al., 2015) The data set comprises different platforms, orbiting sensors and in situ meteorological station data.
TSM was estimated using a generalized algorithm validated for continental waters (Alcântara et al., 2016a). The algorithm has been previously validated with performance values for root mean square error (RMSE) equal to 24.62 (Alcântara et al., 2016a). The algorithm defines TSM as a second degree polynomial function of the ratio of two remote sensing reflectances. For this study, the algorithm was applied to the MODIS data set, given its higher temporal resolution (daily) vis-à-vis Landsat's (∼16 days). Therefore, the spectral bands were corrected to the nearest available band from MODIS (see Equation 4 and Equation 5).
Since water body dynamics is mainly influenced by climatic and inter-annual variability (i.e., tides, rain cycles, temperature oscillations) (Simons & Sentürk, 1976), as well as by land use and land coverage changes that directly impact transport of sediments, deposition of materials and biochemistry fluxes (Simons & Sentürk, 1976), EVI and NDVI were also integrated into the model. Both indexes can be related to surface vegetation coverage (da Silva et al., 2019) and both were derived from MODIS product MOD13Q1, with 1  1 km spatial resolution (Justice et al., 1998).
Data from the Climate Hazards Group Infrared Precipitation with Station (CHIRPS) (Funk et al., 2014) were applied to assess monthly accumulated precipitation in the municipalities of Pará. CHIRPS data have a spatial resolution of ∼5.6 × 5.6 km 2 and encompass nearly 30 years of quasi-global rainfall data (50°S-50°N). CHIRPS provides gauge-precipitation satellite estimates with low latency, high resolution, low bias, and long record period (Funk et al., 2015).
The digital elevation data set from the Shuttle Radar Topography Mission (SRTM) (SRTM, 2015) and the CHIRPS precipitation data set were used to estimate the Hydrological Mobility Index (HMI). Both data sets were spatially resampled to the same spatial resolution of the CHIRPS data set (which has coarser spatial resolution). The index describes the hydrological flushing potential of a given surface (Fonseca et al., 2007) and, thus, can be associated with pathogen dispersal in the environment, serving both as a flusher and a retainer of the virus, influencing disease transmission (Barbosa et al., 2017;Fonseca et al., 2007).
Another five environmental variables were also later derived from the CHIRPS data set to be incorporated in the hepatitis-A modelling: , where PPF stands for point-probability function. Each PPF represents the cumulative number of monthly precipitation occurrences given an intensity threshold that might be expected from the PPF of a predefined family of probability distribution functions (PDF). The PPF approach was applied to evaluate the potential relationship LEAL ET AL.
10.1029/2020GH000327 5 of 17 between disease transmission and extreme precipitation events (Diaz & Murnane, 2008;Gullón et al., 2017;Marcheggiani et al., 2010). Since there is still much to be considered with respect to extreme precipitation events, this statistical approach was based on prior similar epidemiological studies (Curriero et al., 2001;Gullón et al., 2017). In brief, the algorithm for the derivation of these secondary precipitation variables can be described in three steps, as follows: First, the precipitation time-series is linearly decomposed into three time components: the trend (   t T ), the seasonal (   t S ) and the residue (   t R ). This approach assumes that the trend changes linearly over time, implying a linear additive structure (Equation 6). In addition, the decomposition assumes that seasonality presents constant frequency (width of cycles) and amplitude (height of cycles) over time.
Second, a Pearson Type III probability distribution family is fit into   t R by means of a Maximum Likelihood Estimation (MLE) (Virtanen et al., 2020). This PDF family is defined in terms of the mean (μ), the standard deviation (σ) and the skewness (skew) of the distribution (Vogel & McMartin, 1991) (Equation 7). This produces a large number of different distributions, both skewed and symmetrical, and is reduced to a standard frequency function when skewness is zero. This type of distribution is largely used by the U.S. Army Corps of Engineers in flood frequency analysis, by the National Oceanic and Atmospheric Administration in precipitation data analysis, and by the U.S. Navy (Federal Aviation Administration (FAA), 2003). where, Finally, skew and  are the skewness and the standard deviation of the time-series, respectively.
Once the PDF is fitted for   t R , its hyper-parameters as well as the selected percentiles (1.0%, 5.0%, 90.0%, 99.0%, and 99.9%) are used to retrieve thresholds for later classification of   t R . The thresholds are then assessed by means of the point probability function (

 
x PPF ) of the given PDF. PPF is defined as the inverse of a cumulative distribution function (CDF). PPF is also called probability quantile function in statistics literature (Wasserman, 2009), but the PPF nomenclature is used here. The Pearson Type III PPF is defined in Equation 12.
For the third and final step of the algorithm, the thresholds derived from Equation 12 are then used to classify   t R . The classified   t R is then aggregated monthly for each threshold. These parameters are used as proxies for the evaluation of precipitation disaster events, since they can be highly significant for waterborne diseases such as hepatitis-A (Freitas et al., 2015).

Data Pre-processing
Prior to analyzing the data, all variables and all hepatitis-A cases were aggregated per municipality and per month. Remote sensing variables were averaged per month and per municipality. Precipitation data were summed monthly and averaged spatially for each municipality. Elevation and declivity data were averaged spatially for each municipality.

Statistical Analyses
Multivariate regression analyses were used to evaluate the best model for assessing the main factors that impact hepatitis-A transmission. The evaluated regression models used here were: a) the Generalized Linear Model ( Equation 13) that represents the number of observed events in a given The expected value (   µ i ) was assumed to be the linear sum of each relative risk coefficient (    i ) and the Equation 14). In this study, the relative risk coefficient represents i municipality under the null hypothesis. Under this hypothesis, the transmission risk of the disease is constant over the entire study area. The relative risk can take on real values between zero and +∞. If the relative risk is 1, this would mean that all verified municipalities have the same average risk of infection in the study area; if less than one, it would mean that the municipality's transmission risk is lower. If higher than one, it would mean that the municipality's transmission risks is higher.
Alternative to the Poisson family distribution, the negative binomial (NB) family is also commonly used to model counting processes, the main difference being that it allows for over-dispersion of the data. Under this assumption, the data follow an expected value Fox, 2008). Unless the parameter  is large, the variance of Y increases more rapidly than for a Poisson distributed variable. By defining the expected value of   Y i as a random variable, it is possible to incorporate additional variability among observed counts. The PDF of a NB variable is described in Equation 15.
The MPL algorithm is a nonlinear model. It assumes that the relationship between the covariates and the dependent variable can be defined by an association of neurons structured in sequential layers (de Wilde, 2013). The MPL algorithm accepts several types of activation functions (  log loss, identity, tanh, relu, etc.). In this study, the relu activation function (Equation 16) together with stochastic gradient descent adam solver (Kingma & Ba, 2015) were applied to evaluate the weights of the neuron matrix.
LEAL ET AL.
10.1029/2020GH000327 7 of 17 Gradient Boost (GB), Decision Tree (DT) and Histogram Gradient Boost (HGB) are machine learning (ML) algorithms that can perform both classification and regression tasks. They are capable of fitting complex data sets in an additive model approach (Boehmke & Greenwell, 2019). These ML algorithms can capture nonlinear relationships between the covariates and the dependent variable in forward stage wise fashion (Petrere & Friedman, 2000) by minimizing the negative gradient of a given loss function (Pedregosa et al., 2011). Machine learning is greatly influenced by its hyper-parameters setting. Therefore, tuning these hyper-parameters is an essential step in analysis. For each model, a grid-search technique (Unpingco, 2016) was applied to retrieve the respective best fitting hyper-parameters of each model configuration. The RMSE loss function (Equation 17) was applied to fit each model and, respectively, select the best hyper-parameters.
After fitting the different ML models, each had its coefficient of determination 2 R (Equation 19) evaluated. Only the models with a strictly positive (above zero) coefficient of determination were selected, discarding those with negative 2 . R This initial model filtering step was required in order to minimize potential overfitting (therefore bias) in the models' tunings (Boehmke & Greenwell, 2019). After this filtering step, the remaining models were cross-compared in respect to their RMSE, and the best model was deemed the one with lowest RMSE.
After selecting the best regression model for the number of cases of hepatitis-A (the one with the lowest RMSE), a partial dependence analysis (PDA) and the permutation feature importance (PFI) were verified.
The PDA can depict the relationship between the dependent and the independent variables of the model (Molnar, 2019). It graphically structures the variables' marginal effects (whether linear, monotonic or more complex) (Petrere & Friedman, 2000). PFI is a model inspection technique especially useful for nonlinear/ complex estimators (Pedregosa et al., 2011) and is defined as the decrease in a model score (e.g., RMSE) when a single covariate is randomly shuffled (Pavlov, 2019). A shuffling effort of 99 shuffles was applied for the PFI analysis.
A spatial analysis was applied for evaluation of the best regression model's predictions (and respective residues) in regards to the reported notification cases of hepatitis-A. These variables were interpolated to a continuous surface covering the study area, and later averaged over time for visual inspection. The kernel density estimate (KDE) interpolation method was applied for generating the respective continuous surfaces. The Seaborn python's Package KDE's algorithm (Waskom; The Seaborn Development Team, 2020) was applied for the interpolations.

Results
A set of six different techniques was applied to model hepatitis-A transmission. Of all models tested (Table 2), HGB Regression proved to be the best in terms of RMSE and 2 R criteria. GB obtained the lowest RMSE of all models, despite its low non-biased 2 R . GLM-Poisson, MPL, and DT returned negative 2 R scores, LEAL ET AL.

10.1029/2020GH000327
8 of 17 indicating biased estimates. The set of optimized hyper-parameters derived from the grid-search analysis can be found in Table 3.
After selecting the HGB model, a partial dependence analysis (PDA) was applied to indicate the relative dependence of each variable. The results of the PDA reflected how each variable related to hepatitis-A transmission. PDA values varied between −2.4 and 0 ( Figure 2). Positive relations were observed for population size, households near open-air sewage discharge, households near open-air dumpsites, and latitude. Negative relations were observed for vaccination coverage, households with public water supply, households with waterwheels, and the municipalities' centroid longitude. A constant relation was observed for the variable households with sanitation. More complex (nonlinear) relations were observed for the variables households near storm-drains, households with local water supply and year of notification. The dependences of households near storm-drains, households with local water supply and year of notification presented a bell-shaped pattern, indicating that they varied depending on the municipality and/or period studied.
The environmental variables with positive relations were turbidity, precipitation and NDVI (the latter one in lesser degree) (Figure 2). IMH and EVI were negatively related. A constant relation was observed for SDT , SNT , TSM, and all PPF derived variables. For normalized turbidity values below 1.8, a partial dependence plot of turbidity indicated no clear relationship with disease transmission, but for higher values partial dependence was positively related to disease transmission. Precipitation and NDVI relative dependences were nonlinearly associated with disease transmission, although they denoted an average positive trend. With respect to precipitation, there was nearly constant partial dependence for below-average values; for near average values, precipitation had a negative dependence effect for above average values, precipitation had positive dependence. In respect to NDVI, for below zero normalized values, NDVI denoted constant dependence with disease transmission; for higher values, .. dependence was positive. EVI denoted an inverse pattern with respect to NDVI.
PFI analysis depicted the relative importance of each environmental and sociodemographic parameter in the HGB model ( Figure 3). In decreasing order of importance, population size, NDVI , latitude, year of noti-   The spatial distribution of the notification cases of hepatitis-A (Figure 4a) indicated two major hotspots, each indicating a high-risk region for the disease transmission: one northwest and another northeast of the study area. HGB predictions also evidenced these same hotspots (Figure 4b).
The residues from the HGB (Figure 4c) were also more densely located at northwest and northeast of the study area, potentially reflecting a spatial structure in the model's residue (Anselin et al., 2006;Ywata & Albuquerque, 2011).

Discussion
This study evaluated hepatitis-A transmission by means of sociodemographic and environmental parameters from the state of Pará, Brazil, for the period between January 2008 and December 2017. The observed relations were mostly complex, indicating that multiple interaction effects control the disease transmission. The sociodemographic variables closest related to hepatitis-A were the population size, national public vaccination coverage, longitude of the municipality centroids, year of notification and location of households near open-air dumpsites and near storm-drains. The environmental variables most related to hepatitis-A were turbidity and precipitation.
Given the importance of public vaccination in mitigating hepatitis-A transmission (Fiore et al., 2006;WHO, 2011), the vaccination relative dependence values were expected to be higher, if not the highest of all variables of the model. The observed low relative dependence was associated with the insufficient coverage rate of the public vaccination program (MS, 2014), as well as with problems arising from lack of sanitation, sewage disposal and drinking water in the study area (Freitas et al., 2015;IBGE, 2011;UN, 2007). The longitude of the municipality centroids showed that disease transmission is spatially dependent. Westerly municipalities (longitudes < 50°W) had higher risk of hepatitis-A transmission than easterly municipalities (longitudes > 50°W), a spatial pattern that reflects the sociodemographic characteristics of the study area, where relatively richer and more developed municipalities tend to be located on the eastern part of the state (GOVERNO DO PARÁ, 2010). These findings reinforce the importance of clean drinking water and proper sociodemographic conditions for controlling hepatitis-A transmission (Jacobsen & Koopman, 2005).
Households near storm-drains were both negatively and positively related to hepatitis-A incidence. For municipalities with a low percentage of households near storm-drains, the relationship was negative, whereas positive dependence was observed for municipalities with high percentage of households near stormdrains. This pattern was associated with population density, storm-drain clogging and contact rate of the population with contaminated water-bodies. A similar dual pattern was observed in a previous study, in which the authors suggested that a variable's dependence duality is a reflection of internal spatial variations of disease transmission in the study area (Rogers, 2000). This reinforces the notion that epidemiological programs, policy-making and strategy planning must be specific to each area/community (WHO, 2014(WHO, , 2017. Only then, it is possible to properly consider the unique epidemiological factors associated with a disease's transmission.
LEAL ET AL.

10.1029/2020GH000327
10 of 17 Turbidity had a complex relationship with disease transmission and its dependence pattern was expressed by a peaked Gaussian distribution shape. Lower values of turbidity did not influence disease transmission; for average values, the turbidity was positively associated; and for higher values turbidity was negatively related to disease transmission. The peaked Gaussian distribution shape dependence was attributed to different characteristics of the limnological environment, for example, increased untreated sewage discharge into the environment (Guimaraens & Codeço, 2005), contamination of waterbodies nearby, and particle sedimentation (James et al., 2013;UNESCO, 1982). Aside from the fact that untreated sewage is directly linked to virus dispersion and propagation of the disease (Guimaraens & Codeço, 2005), wastewater also influences the attenuation of light in the water column, increasing the turbidity of the water body (de Oliveira et al., 2018). The higher the turbidity, the more suspended particles there are in the water column (Ellison et al., 2014;Jafar-Sidik et al., 2017;Pereira Filho et al., 2013). Also, more suspended particles in the water column mean a greater adherence rate of other materials (organic and inorganic), leading to an increase in sedimentation rates (Galvez & Niell, 1993;Thornton, 1990;de Wilde, 2013). As a consequence, the suspended particles may act as binding agents in the limnological environment; in sufficiently large number, these particles can more efficiently bind particles like the hepatitis-A virus (Kendall et al, 2012), increasing its deposition rate. If there are less HAV available in the system, the chances of infection are reduced, directly diminishing disease transmission. In some cases, increases in turbidity can also be related to increases in water turbulence (Knoblauch, 1999). As turbulence increases, higher dispersion forces act on the HAV present in the water column (Simons & Sentürk, 1976). As a consequence, turbulence acts as a cleaning agent that diminishes the virus pool available for potential infection (Gurjão, 2015;Simons & Sentürk, 1976).
Precipitation also denoted a nonlinear association with disease transmission. For below average precipitation, the effect was nearly constant; for near average values, precipitation had a negative effect on disease transmission, while above average precipitation had a positive effect. Lower precipitation events induce less turbulent behavior in water bodies, and consequently a higher deposition rate (Bittencourt-Oliveira et al., 2012;Pereira Filho et al., 2013). Under this scenario, HAV is expected to be less present in water systems. The opposite is also true. Under higher precipitation events, the deposition rate is reduced with the increase in water turbulence (Bittencourt-Oliveira et al., 2012;Pereira Filho et al., 2013). Under intense precipitation events, there is contamination of public water supply systems due to increased run-off from surrounding areas, to inundation processes and/or to flushing of streets, ponds and other potential water sources (Cann et al., 2013). Given that contaminated water serves as a source for the spread of hepatitis-A LEAL ET AL.
Previous studies have related hepatitis-A transmission to extreme precipitation and flooding events (Gullón et al., 2017;Marcheggiani et al., 2010). This study, however, by applying the PPF methodology to the study area, found no statistical evidence supporting such a statement. Despite the different tested PPFs, their respective relative dependencies were constant for the disease transmission. Several aspects intrinsic to the study area can be accounted for this poor association. Pará is characterized by an equatorial climate, with daily precipitations (FAPESPA, 2018), with mean annual accumulated precipitation potentially reaching 13.2 m, depending on the subregion (Lima et al., 2010). Pará's have no public management directives regarding its waterbodies, nor even state planning or public billing policy for water usage (ANA, 2013a). Only 25% of all Pará's municipalities have 55% or more of its sewage collected and treated (ANA, 2013b). Furthermore, a great parcel of Pará population have an intrinsic relationship with the local water resources, whether for personal consumption or for public transport (transportation by water) (Menezes et al, 2015). This latter is even more pronounced for riverine communities, whose residents live mostly in Palafita households (Menezes et al, 2015). Residences that are mostly build of wood (when on land) or over floating devices (Gama et al., 2018). Given these intrinsic characteristics of the study area, intense precipitation events can LEAL ET AL.

10.1029/2020GH000327
12 of 17 have a positive relationship with the disease transmission, as previously observed for the amazon region (de Paula et al., 2007), and abroad (Gullón et al., 2017;Marcheggiani et al., 2010), but may be masked by these intrinsic characteristics of Pará environment and its local communities.
As disaster events may impact public health in different time frames-from short-lasting impacts (hours) to long-lasting ones (years) (Freitas et al., 2015), a time lag effect can impinge a direct assessment of the disease transmission. Thus, future studies are required to investigate this temporal dependence. Other methodological approaches as the Auto Regressive Integrated Moving Average (ARIMA) and artificial neural network models might be possible alternatives (Chadsuthi et al., 2012;Guan et al., 2004;Luz et al., 2008;Ture & Kurt, 2006). Furthermore, given the variability and lack of consensus on how to measure and depict extreme precipitation events (Gullón et al., 2017), other methodological approaches to detect extreme events are required to properly assess a potential relationship with the hepatitis-A transmission.
Regarding the spatial analyses, the estimated transmission values derived from the HGB model were in agreement with the observed notification cases of hepatitis-A. The model's results reinforce the notion that the hepatitis-A transmission is spatially and temporally dependent in the study area. The observed hotspots for the disease transmission followed along with the spatial distribution of the population density (IBGE, 2017), implying that regions of higher density have higher notification cases for the disease. A relationship that is mainly caused by an irregular accessibility of healthcare centers and public vaccination coverage throughout the study area (Affonso et al., 2016;Fernandes & Fernandenos, 2013;Fernandenos & Fernandes 2016). In response to the precarious coverage rate of the national vaccination program in Pará state (Brito & Souto, 2020), one can expect an increase in notification cases for hepatitis-A in the next years in the study area. Given the spatial complexity inherent to disease transmission modelling in population dynamics (Diez-Roux, 2000;WHO, 2014), especially regarding waterborne diseases as hepatitis-A, more studies are required in order to evaluate these spatial structures and its spatial dependency for a more robust disease risk assessments.
The present study reiterates how important it is for public health practitioners and water companies to be aware of the risks related to waterborne disease outbreaks. It is important to stress that the methods applied here can also be extended to other waterborne diseases, reinforcing the applicability of this work. Furthermore, future studies may also apply the current methods for different time-periods of a same study area (i.e., prior and after national public vaccination programs). Through this temporal segmentation approach, these studies may evidence potential temporal variations in the sociodemographic and environmental factors on the hepatitis-A transmission. Specifically regarding Brazil, there may be at least three major time-periods that could be further analyzed: before the public vaccination program (before 2014); between 2014 and 2016, period prior to the Bill N° 204-2016; and after 2016, period in which Bill N° 204-2016 was already operational, potentially resulting in a significant improvement in the compulsory notification system (virtually increasing the hepatitis-A notification cases).
Given the impacts of extreme weather events on waterborne diseases, especially under a scenario of climate change, health disparities are likely to occur in the near future. A population's ability to adapt to and limit the effects of such events is likely dependent on socioeconomic and environmental circumstances, as well as on the information and technology available (Gullón et al., 2017). Since waterborne diseases are expected to have higher incidence, and even higher geographical coverage due to climate change (Ahern et al., 2005;Davies et al., 2015;UN, 2007), and, given the increase in population density and the lack of proper sanitation and vaccination in developing countries as Brazil (IBGE, 2016(IBGE, , 2018Paungartten et al., 2015), this essay may be of interest for early warning planning in the public health sector (FORD et al., 2009).

Conclusions
This study assessed the relationship between hepatitis-A transmission and environmental and sociodemographic variables in the state of Pará, Brazil. Generalized linear and nonlinear models were examined as alternative predictors for hepatitis-A. The best-suited model was the HGB. Population size, lack of sanitation and of proper public vaccination, households' proximity to open-air dumpsites and storm-drains, and insufficient access to healthcare facilities and hospitals were the sociodemographic parameters more closely related to HAV transmission. Turbidity and precipitation were the environmental parameters more closely LEAL ET AL.

10.1029/2020GH000327
13 of 17 related to disease transmission, and it was found that hepatitis-A transmission was positively associated with periods of average turbidity and more intense precipitation.
Despite enhancements in the public healthcare sector, Pará state still lacks proper sociodemographic conditions (sanitation, sewage disposal, accessibility to potable water, public education, public awareness, etc.) in order to effectively control the hepatitis-A without the constant support of the public vaccination programs.
A proper mitigation will only be possible if investments are made in alternative strategies for sustained disease control and relief, which are essential for public health policymakers, vaccine developers and disease control specialists to make robust estimates of current and future distribution of disease transmission around the world. Since remote sensing can be of great importance to assess disease-related environment factors, providing meaningful insights for controlling disease transmission, this study stresses the need to incorporate remote sensing data to epidemiological modelling and surveillance plans in order to develop early prevention strategies for waterborne diseases.
This work emphasizes the importance of incorporating different methodological approaches in epidemiological studies in order to assess the factors mostly related to waterborne diseases transmission. The present study can contribute significantly to preventive strategies aiming the mitigation of the disease transmission in municipalities under higher risk. Here, we reiterate that the applied methods can be extended to other waterborne infectious diseases (i.e., leishmaniosis, harmful algal blooms related infections, diarrhea, and many others). The hepatitis-A was used as a test case due to its importance to the study area, and due to its standardized database. Future studies can also apply these same methods for different time-periods in order to assess temporal variations in the regulatory factors of the hepatitis-A transmission.