Tim Smith sent me the link to this publication from the Renewable Natural Resources Foundation: Congress on Harnessing Big Data for the Environment. It is actually Volume 30, No. 4 of the Reneable Natural Resources Journal.
Observations for the Future
Our objective in conducting this congress was to explore the challenges and opportunities in harnessing big data for the environment. Big data has three primary components: 1) data generated in massive quantities by electronic sensors, 2) computational capacity that permits unprecedented speeds and the manipulation of huge datasets, and 3) technology that provides inexpensive storage of more data than ever before. The combination of these three advances has created a new technological capability of unprecedented power and speed.
Big data has already seen extensive use in finance, market research, social media, manufacturing, healthcare (records and technology management), math-intensive science, and ap- plications heavily dependent upon binary measurement. Most big data used for environmental assessment and monitoring are collected by satellites that record surface and atmospheric conditions (such as Landsat and NOAA weather satellites), and NOAA's 32,000 data collection stations. Satellite images provide inventory and condition data – both current status and trends over time. However, harnessing big data for environmental decision-making presents difficult challenges.
Big Data versus Big Judgment
Typical decisions about the use and conservation of natural resources, including land-use and environmental standards decisions, must consider social, economic and political factors. As has been the case since people first began debating the value of a duck or open space or how much pollution is permissible, social factors are considered in addition to physical assessment data.
However, it is the case that social data are not available as big data, and concerted efforts to apply big data processes to environmental decision-making are nearly non-existent. Thus, most environmental decisions will continue to be made through human integration of social and physical data – big judgment.
Most Historical Environmental Data is Big Data Incompatible
In the absence of recently developed big data technology, environmental data that was collected in the past is not big data compatible. During the RNRF Congress, the representative from the U.S. Geological Survey observed that none of the water research data that has been collected by the agency is big data compatible.
A major impediment to using previously gathered data in big data analyses is the need to integrate disparate datasets. With many scientists in many fields collecting and analyzing data, comprehensive data integration is problematic. Data sets must be managed from the time of origination so that they can be integrated into larger or more complete datasets. Interoperability of datasets requires use of similar units, similar collection methods, and open access. As future data is made in- teroperable, more groups will be able to contribute to and collaborate on projects through the input of their data. It is likely that much historical environmental data will be not available for use in big-data processes. The science community will need to devote resources to promote interoperability.
Has Big Data Redefined the Value of Data?
The generation of data by electronic monitors and sensors has changed the nature of data. Historically, data was gathered by investigators for a specific purpose and to answer specific ques- tions. Data was considered uniquely valuable among scientists. Data is now being generated in torrents by ma- chines. More than 90 percent of existing data has been generated in the past two years. There is so much data that only 0.5% is currently being used and that percentage is destined to drop. The gap between generation and use of data is growing. So much data is being generated that it cannot be stored. This characteristic of big data will change notions of the value and necessity of storing and maintaining datasets. The science community will need to come to terms with what data should be preserved.
Defining the Public Sector – Private Sector Collaboration
Both public and private sectors collect and utilize big data. Potential benefits from public-private partnerships promoting the use of big data for the environment are intuitive, however, the history of such partnerships is relatively brief. Delegates at the RNRF Congress recognized that there is a need and opportunity for conversations among representatives of the public and private sectors to develop ideas and approaches for advancing big data for the environment.
There also was a strong consensus that publicly-financed big data for the environment is a significant public good, and the need for robust advocacy for such data has never been greater. A meeting of the interested parties should be convened.
Enjoy! Lots of good stuff!
"Big data is not about the data." - Gary King, Harvard University