As you watch the soundless video above, you may find this real-time flow of water mesmerizing. Joan Didion would. It’s her 1979 essay on water come to life. In her essay on water, she described her fascination with the journey water makes, something that most people don’t even think about. She guided her readers through the water path, tracing where it goes and the impressiveness of its journey. What you see in the video above—the National Water Model—wouldn’t be possible without big data. Because of the power to capture and wrangle so much data, we can now actually see how water flows in the continental United States (CONUS) in real time increments.
If you’re a data scientist, you can appreciate the complexity of the National Oceanic and Atmospheric Administration (NOAA) team’s project to produce an interactive map that simulates the water cycle in the CONUS. Currently, the visual representation shows an hourly river flow on an interactive map for 2.7 million US rivers and their reaches. At any one time, 6,000 to 8,000 streamgages are sending data to inform this map, producing 15 to 25 million individual data points per month. Because of this information, NOAA can offer short-term, medium-term, and long-term water prediction.
Image Source: https://water.usgs.gov/nrp/topical-research/impact-extended-drought-stream-hydrology-fisheries/
This amalgamation of streamgage data is just one of many data sources NOAA team members wrangle in this product. Add to the data mix soil saturation, water runoff, snow water equivalents, and various other layers of disparate data—like wind speed, pressure, temperature, short- and long-wave radiation, precipitation, temperature, and humidity. Then add this parameter: “get it done in nine-months.” Clearly production, not perfection, was their initial aim. They’ll be using susequent iterations to perfect the product.
Their end goal? To help households on a neighborhood level properly prepare for drastic water situations, like floods and droughts, through fine-tuned local weather predictions.
So, considering their timeframe, what’d they decide to include, cut, and nudge in their product? We’ll explain the basic ins and outs.
Where does their data come from?
While this massive water-data conglomeration, aka National Water Model (NWM), could be more finely tuned (you probably won’t find your favorite reservoir on the map), its reach is still impressive.
It pulls in United States Geological Survey (USGS) streamflow data; Global Forecasting System (GFS) data; Climate Forecasting System (CFS) Numerical Weather Prediction (NWP) forecast data; NCEP Weather Prediction Models (WPM) data; Multi-Radar/Multi-Sensor System (MRMS) radar data (free only to the government); Integrated Water Resource Science Services Data Services (IWRSS) data; and other non-specified data sources to overlay this information onto 1 kilometer land surface model informed by USGS’s 2011 National Land Cover Database (NLDC).
Another interesting side question is, who informed all these data sources? The list is very long. Here are just a few to whet your palette: National Severe Storms Laboratory; Federal Aviation Administration; the University of Oklahoma Cooperative Institute in Mesoscale Meteorological Studies; US Army Corps of Engineers; USGS; NOAA; and we’ll stop there because we could go on for a bit.
How they combine and analyze sources
What did the majority of this magic? Their algorithm. With the tight project timeframe and several constraints, they choose a fast and stable methodology to route streamflow data into their system. While the methodology they chose lacked dynamic capabilities another methodology offered, it reduced computational time by 60 percent, so they went for it. After deciding on this approach, they built a nudging algorithm from scratch that helped with their calibration challenges, model bias, improper error covariances, and computational management. In the future, it’s possible they may go for a hybrid model, but for now, this does the job.
But good enough gets a little strange in some areas. Obviously the NOAA team had a lot of data to work with, but even with all this information, they left out information you’d think they would have included. For instance, they didn’t include Great Lakes information or many US reservoirs and dams. They also didn’t use one of the best snow data sets, SNODAS, to inform the product.
When you look more closely, though, there are reasons for lack of inclusion. For instance, the network they were using, NHDPlusv2, had reservoir interfacing issues, in particular around how reservoirs are classified. So, out of all the reservoirs in the United States, this map shows only about 1,615. As for the Great Lakes, it sounds like the United States needs to collaborate more with Canada to clean up the data details before NWM is ready to include them.
Besides deciding on what information to include, the NOAA team had to deal with data configuration issues. It was difficult to configure this many disparate data sources. Not only did they need a common set of tools to retrieve the different sources and start unifying them, they also had to regrid data onto a common 1 kilometer grid. With the base grid only 1 kilometer, datasets that had rougher resolutions, like temperature, humidity, surface pressure, and incoming short-wave radiation, had to go through downscaling routines to properly map onto the base grid. But even while they did get these models to work, some other information spliced from Mexico and Canada (important because they are tributaries that feed US waterways) just didn’t map well. You can see it in certain segments. Even with that, the sheer fact that this model is up and running is impressive.
Overall, what was their biggest issue? Quality control (QC) and keeping the data clean. For instance, while the USGS has 7,941 streamgages, as of October 2015, the team could only use 6,554 of these because they couldn’t verify the reliability of the data. And while there were already USGS QC metrics, the team needed to make sure they weren’t assimilating unreliable streamflow data—since real-time data can have inaccurate data creep into it without being properly flagged.
And all of this is done, as you can imagine, on a supercomputer.
How they communicate the data
The NWM displays on an interactive webpage with four configurations: analysis and assimilation, short-range, medium-range, and long-range. The map includes several layers for you to interact with: you can see all the recording stations in the use, the snow depth, the snow-water equivalent, stream anomalies, soil saturation, and many more to come.
There’s also what they call an Interactive Forecast Chart. After zooming in, or before if you don’t care about being precise, select any point on the map, and an interactive chart pops up that forecasts the particular waterway that you selected.
We think so, but as we’ve made clear, NWM version 1 is only a start. It’s an evolving model with a lot to look forward to. What it helps with is immediately flash flood forecasting for areas where tributaries converge and provide 700 times the forecast locations that it had before, increasing coverage and adding areas that weren’t yet covered. Beyond helping improve general flood warnings on a neighborhood level, NOAA hopes to even show things like local water quality—all in due time.