Near real-time monitoring of algal blooms
Throughout the world today, increases in nutrient and pollutant-laden runoff from urban population growth and agricultural production are contributing to the degradation and eutrophication (excess algal growth and oxygen depletion) of surface waters. The ecological impact of these stressors is clearly evident in western Lake Erie as seen through the intensive re-emergence of expansive algal blooms from the mid-1990s, which coincides with influxes of dissolved reactive phosphorus into the lake. Algal blooms containing Microcystis, a cyanobacterium that produces the hepatotoxin microcystin, threaten the stability of the ecosystem, contaminate fish catches, and require tap water treatment utilities to filter out the toxin.
To safeguard the environment and human health, as well as support remediation plans, it is essential to develop a reliable method for the daily monitoring of harmful algal bloom (HAB) formation, movement, and differentiation from nontoxic algal blooms. Manually assessing toxic zones through sampling is time consuming and expensive, but we have developed a near-real-time early warning system based on integrated data fusion and mining (IDFM) techniques to provide a detailed toxicity map the moment daily satellite imagery becomes available.
Our system predicts microcystin distributions by measuring the water's surface reflectance using fused multispectral satellite imagery, bolstered by a suite of wireless ground sensor networks. The fundamental advantage of IDFM is the ability to aggregate the spatial, temporal, and spectral properties of multiple satellite sensors into a single synthetic image that possesses the most useful characteristics of the input images, thereby enhancing the reliability of the data for post processing and data mining.1, 2


Each step of IDFM is designed to overcome physical and technical limitations (see the flow chart in Figure 1). In steps 1 and 2, the data streams from two satellites with similar spectral band ranges are acquired. To facilitate comparison and data fusion, we process the satellite images to have the same projection, spatial resolution, radiometric processing, and bit depth. Fusing the data in step 3 overcomes the limitation posed by using a single satellite sensor. For example, the Moderate Resolution Imaging Spectroradiometer (MODIS) captures global images on a daily basis, which is ideal for an early warning system. However, the 1000m resolution of MODIS's ocean color bands is outclassed by other satellite sensors. The Medium Resolution Imaging Spectrometer (MERIS) sensor on the environmental monitoring satellite Envisat offers 300m resolution, yet the three-day revisit time cannot accurately capture the rapid growth and dynamic movement patterns of algal blooms in Lake Erie. The ultimate solution is the pairwise fusion of similar spectral bands of MERIS and MODIS to yield a single synthetic data product that maintains both the optimal 300m resolution and daily revisit time. We used a model named the spatial and temporal adaptive reflectance fusion model (STAR-FM) to fuse the satellite images. We observed a coefficient of determination (commonly referred to as the R2 value, which indicates on a scale of 0 to 1—weak to strong—how well predictions made by the model match observed outcomes) of 0.8278 when comparing an RGB (i.e., based on red, green, and blue light) synthetic satellite image made from fusing MERIS and MODIS to the actual satellite image of Lake Erie for that day.5
In step 4, data mining algorithms are used to develop a model relating the fused surface reflectance data to in situ microcystin data. The relationship between reflectance and microcystin concentration is highly nonlinear and difficult to decompose. Consequently, data mining techniques have higher explanatory power than traditional two-band models and regression techniques. This is especially true in optically complex coastal waters that are a mixture of chemicals, which convolute the spectral signature of the water body. Another advantage of using empirical models in place of analytically based techniques for inversion modeling is that the unique water quality characteristics of the water body are integrated into the resulting model during training and validation. However, this also presents a transferability limitation, as the model would exhibit a marked drop in accuracy when applied to a water body with different water-quality conditions.6, 7 We applied the genetic programming (GP) data mining technique (which is based on evolutionary processes)8, 9 to build the empirical model, yielding a coefficient of determination of 0.9269, compared to 0.2710 and 0.7062 for a traditional two-band model and two-band spectral slope model.10, 11 Finally, in step 5, the GP model uses surface reflectance data to generate a concentration map of the spatiotemporal distribution of microcystin (see Figure 2).
It should be noted that a limiting factor of empirical HAB prediction algorithms is their reliance on ground truth data for initially training the model, as well as recalibrating the model to reflect changes in water quality over time. This issue is overcome by integrating ground-based spectroradiometric sensors in the surface water to detect the algal pigments chlorophyll-a and phycocyanin. The temporal resolution of ground sensors is significantly higher than the one-day revisit time of MODIS, and these readings could be used as inputs to the empirical model for predicting microcystin changes throughout the day. More importantly, on days when the space-based sensors are unable to provide imagery due to cloud cover, approximations for HAB locations and toxicity would still be feasible. Thus, the ground and space-based sensor network would act in concert to yield real-time updates for monitoring HABs across an area and thereby correlate these dynamics to changes in the ecosystem and climate, as well as anthropogenic influences. This would, in turn, add to the database of knowledge for closely understanding bloom development, which would ultimately lead to improved action plans for HAB remediation. A significant limitation of the land-observing satellite Landsat, MERIS, and MODIS spaceborne sensors is their inability to observe the study site when there is significant cloud cover. To overcome this, we are now including band data from cloud-penetrating sensors (i.e., synthetic aperture radar) in the IDFM methodology.