By: Rebecca Wenker, NOAA Coral Reef Conservation Program Data Manager
Takeaway: Coral reef data goes through a whole lifecycle including planning, collecting and organizing, preservation, access and sharing, and re-use. The goal of this lifecycle is to make the data FAIR, or findable, accessible, interoperable, and re-usable, which ultimately benefits both scientists and communities.
When coral reef data come to mind, often what many people think of first is the collection processes. This can include scientists going out on multi-leg research cruises, SCUBA divers surveying reefs of various depths and types, and underwater instruments monitoring changes in ocean chemistry. After data collection, analysis can begin—gathering all of that data together to analyze and discover trends. In reality, there is a whole lifecycle that coral reef data go through before and after these steps!
The data lifecycle can be divided into the following generalized sections:
Collecting and organizing
Access and sharing
A common thread through each section is an emphasis on proper data management and stewardship, which essentially means ensuring that data are organized, accessible, and usable. This is especially important as the size and diversity of data increases. For example, Structure-from-Motion (SfM) photogrammetry is an imaging technique that is increasingly used in coral reef monitoring methodologies, and it generates 3D models of reefs with file sizes as large as 15 terabytes! In comparison, typical coral reef monitoring data in the form of spreadsheet files come in at an average of 2–500 megabytes in size.
The ultimate goal of the lifecycle is to ensure that data are FAIR, or Findable, Accessible, Interoperable, and Reusable. Every step of the lifecycle has actions you can take to make sure this is the case!
Ideally, all parts of the coral reef data lifecycle should be addressed in a data management plan (DMP) before data collection even begins. A DMP is a formal document that outlines what you will do with your data during and after you complete your research project, and ensures that your data are safe in the present and future. These plans are extremely important, because if your data collection, organization, storage, and preservation methods are outlined in advance, you are less likely to encounter unanticipated roadblocks or make errors that reduce the quality of your data and research results. DMPs also help you plan how you will share your data, allowing it to be discovered and used by others. However, DMPs are not set in stone, and can be edited as needed during the lifecycle.
The five main components of a DMP generally include:
Documentation (data description, data collection and analyses, metadata, timeline),
Roles and responsibilities,
Budget (equipment, collection/execution, data storage, data publishing),
Storage and preservation (short-term and long-term), and
Access, sharing, use plans and policies.
Funding organizations are increasingly requiring the inclusion of a DMP as part of the application process, and often have particular requirements regarding data format, storage, products, and access that need to be met. Therefore, incorporating those expectations into your DMP is critical, and can often help shape the rest of the data lifecycle.
Collection and Organization
Moving into data collection should be less intimidating now that you’ve planned the process out in a DMP, but there are still important things to remember! The major things you should incorporate during data collection include:
Data Organization - Using a standardized file format or database for data entry, following spreadsheet best practices, and using logical file naming conventions.
Quality Control - Define and enforce standards used during collection (data formats, terminology and codes used, measurement units, metadata), document changes to data, and perform quality assurance checks and statistical summaries on data after analysis. Assign someone this data quality responsibility.
Data Storage - Ensure that your data are stored in a way to prevent accidental data loss. Routinely back-up data, and have multiple copies of these back-ups in different locations.
Data Documentation - metadata, metadata, metadata! It is crucial that you capture the data that describes your data. For example, who created it, what the content of the data is, when the data were created, where they were geographically, how the data were collected/processed, and why the data were developed. You NEED metadata to remember how you processed and collected your own data, so information won’t be lost when a student/employee leaves, and so that data can be used again in the future. I recommend looking up best practices for writing quality metadata.
If you take all of these into account, your data analysis should now be more streamlined and the quality of the data improved.
Now that you have the data, and it has been organized, analyzed, and is sitting on your hard drive, what do you do with it? Archive it!
Data repositories and archives are physical or digital information infrastructures which provide long-term storage, preservation, and access to metadata and data. In addition, they can include other benefits such as metadata generation, citation and DOI issuance (digital object identifier used to uniquely identify and access your dataset or document), dataset embargoing, and data curation! Consequently, submitting your data to an archive is one of the best things you can do in terms of data access and sharing.
As an example, most of the NOAA Coral Reef Conservation Program’s data are submitted to NOAA’s National Centers for Environmental Information, which maintains one of the world’s most significant environmental data archives. It guarantees that your data will be preserved for decades; accepts a wide range of environmental data types and formats; and provides data curation and stewardship to ensure that the data being archived is of the best quality possible.
Archives vary in their data focus, metadata standards and documentation requirements, file formats and data size accepted, costs, and levels of curation. Also, when you submit your data to an archive they should be in their final form and understandable for re-use without having to reach out to you as the scientist. That is why it is important to plan ahead in your DMP to make this process easier, generate quality metadata, and know what is expected of you from the archive you intend to submit your data to.
Access and Sharing
After you take the step to preserve your data, it is now time to share it.
The positive impacts of data accessibility and sharing cannot be understated, and they are critical for progress in both the scientific and public realms. In the scientific community, data access increases the impact and visibility of research, promotes innovation and collaboration, and encourages data reproducibility. This in turn helps to increase data transparency and verification. Data access also helps the public foster trust in scientific research, and enables them to make informed decisions with regard to policy development, voting, education, and personal lifestyle choices. For example, scientific research has shown that several common chemicals in sunscreen are harmful to corals. In response, many people who recreate on or near coral reefs use reef safe (mineral) sunscreens instead. In fact, the state of Hawai’i has banned sunscreens containing the chemical active ingredients oxybenzone and octinoxate, which are the two most harmful sunscreen chemicals to corals.
As described above, data preservation via repositories and archives are one method of data access and sharing. However, there are many others, including:
Conference or public presentations,
Project/Program websites, and
With all of this data out there, it is time to re-use it! Data re-use can have tremendous impacts, especially in the scientific community. Data re-use enables researchers to build upon the work of others, improve methodologies, and perform meta-analyses. It can also be re-used to educate new researchers about the most current and noteworthy findings in the field. This all helps to maintain research continuity, as well as save time and money. Comparing, reproducing, and verifying results also authenticates data as accurate and trustworthy. Finally, data re-use can promote resource/data exchange, interdisciplinary research, and general feedback, which leads to more comprehensive and wider-reaching results.
All of this improves the quality of the data, research, and future results - which ultimately benefits both scientists and the public!
To sum up, the data lifecycle can be long and complex. However, it’s worth pursuing so that quality data are generated and can be accessed, understood, and used by others for many years to come.
The NOAA Coral Reef Conservation Program was established in 2000 by the Coral Reef Conservation Act. Headquartered in Silver Spring, Maryland, the program is part of NOAA's Office for Coastal Management.