From Observation to Insight: Publishing a Quarter-Century of Migration Data
by Nancy Sheehan, Program Coordinator
Preserving a Legacy
For 30 years, Journey North volunteers have heralded the arrivals and departures of migratory wildlife. Capturing the movement of butterflies and birds across a large geographical range would not have been possible without the involvement of thousands of people who looked to the sky with wonder and curiosity – and then took the time to report their observations to Journey North. Now, this treasure trove of migratory data is more accessible to scientists than ever before. Journey North has published nine data packages with the Environmental Data Initiative (EDI), a central data repository for environmental data.
From Vision to Action
Journey North owes its existence to the vision and dedication of Elizabeth Howard, who founded the project in 1994. As a biologist and advocate for engaging the public in tracking migration patterns, Howard's outreach has significantly influenced public involvement in migratory science and wildlife conservation. Howard was the director of the program from 1994 to 2018.
Data Governance: Advancing Data Stewardship
In 2018, Nancy Sheehan became the Journey North program coordinator, realizing the immense potential of the project's long-term observational data. With extensive experience in participatory science, research, and community engagement, Sheehan saw the valuable resource that Journey North’s data offered. Over the course of the program's history, volunteers have submitted close to 800,000 observational reports, making Journey North a data collection powerhouse for migratory species. Journey North volunteers have tracked the movement of migratory species like monarch butterflies, hummingbirds, swallows, blackbirds, robins, orioles, loons -- and even whales!
“It was like finding a treasure. The dedication and efforts of volunteer observers over the years have created an invaluable resource with immense potential to advance scientific research on migration and seasonal change.” – Nancy Sheehan, Program Coordinator (2018-2024)
In 2019, Sheehan spearheaded efforts to unleash the full potential of the data by adopting data governance practices. To ensure Journey North’s data accessibility, reliability, and usability, Sheehan needed to align Journey North’s data management practices with data governance standards and F.A.I.R. principles.
What Is Data Governance?
Sheehan articulated a data governance framework to establish policies and standards governing how data is managed, used, and protected and to develop data quality standards and comprehensive data documentation. At the center, Sheehan’s goal was to ensure that Journey North data was Findable, Accessible, Interoperable, and Reusable (F.A.I.R.), thereby maximizing its use in scientific research on migration and biodiversity.
“Basically, we needed to be better data stewards of the incredible legacy passed onto us from dedicated Journey North volunteers through the years. We needed to ensure that we were meeting FAIR Guiding Principles as data managers -- that is, we need to ensure that this data is findable, accessible, interoperable, and reusable.” – Nancy Sheehan, Program Coordinator
What is A Central Data Repository?
Central environmental repositories are specialized databases that store and organize ecological and environmental data, making it easily accessible for researchers and the public. Meeting the F.A.I.R. principles—findable, accessible, interoperable, and reusable—is crucial in migration and biodiversity science and conservation because it ensures that data can be easily located, accessed, and integrated across various platforms and studies. This facilitates collaborative research, enhances the quality and reproducibility of scientific studies, and supports informed decision-making in conservation efforts.
The Environmental Data Initiative (EDI) is an organization dedicated to facilitating data management, sharing, and synthesis for ecological and environmental research. EDI provides infrastructure, tools, and support services to help researchers effectively manage and disseminate their data. It serves as a central data repository for environmental data, aiming to make data more accessible, discoverable, and reusable for scientific research and decision-making.
Researchers can deposit their datasets with EDI, where they are curated, archived, and made available to the broader scientific community. EDI promotes the use of standardized metadata and data formats to enhance interoperability and usability across different research projects and disciplines.
What is Metadata Documentation and Why is it Important?
Metadata documentation refers to detailed information that describes the data, including how it was collected, the methods used, the structure of the data, and any processing steps it underwent. This "data about the data" provides context, helping researchers understand the origins, accuracy, and limitations of the dataset.
The lack of metadata documentation significantly impacts how researchers might use data contributed by volunteers. Without clear metadata, researchers may struggle to assess the reliability and validity of the data, leading to skepticism about its quality. Additionally, they may find it challenging to determine whether the data "fits" their specific research questions and requirements. Understanding the context and methodology behind the data is crucial for researchers to evaluate its relevance and applicability to their studies. Without this information, valuable data might be underutilized or overlooked, limiting its potential contributions to scientific research, replicability of studies, integration with other datasets, and conservation efforts.
The Work
With limited resources, Sheehan secured two grants sponsored to publish data and develop metadata documentation. These grants allowed Sheehan to hire two summer fellows in 2021 and 2022. Fellows Luis Weber-Grullón and Maricela Abarca played key roles.
"Without the financial and technical assistance of EDI and the invaluable skills of Luis Weber-Grullón and Maricela Abarca, publishing these datasets to a central repository would not have been possible. This effort honors the dedication of thousands of volunteers who have collected observational data over the past 25 years. On behalf of Journey North volunteers (past and present), I encourage researchers to utilize this data in exploring pressing questions in wildlife migration." – Nancy Sheehan, Program Coordinator
Before publishing, Sheehan worked with Weber-Grullón and Abarca to address errors in the historical datasets, including missing values, incorrect formatting (such as dates or numerical values), outliers, duplicate entries, spelling and typographical errors, inconsistent naming conventions, data entry mistakes, integrity issues, invalid data, and inconsistencies within the dataset. By addressing these issues, Sheehan ensured that the Journey North data was suitable for publication and analysis, enabling researchers to draw accurate conclusions and insights from the dataset.
Sheehan with Weber-Grullón and Abarca also developed metadata documentation describing data fields, verification methods, protocols and methods followed by observers, and descriptions of the geographic and temporal context of the data.
Publishing the Journey North data offers several advantages, one of which is that each data package will have a Digital Object Identifier (DOI), a unique reference identifier. DOIs ensure the accessibility and discoverability of these datasets without the need for separate Google searches or relying on word-of-mouth to find Journey North datasets. Researchers use DOIs in their references, making Journey North datasets discoverable in articles that utilize this data. Additionally, DOIs are machine-readable, supported by online discovery systems, and can be indexed in online libraries.
Continuing the Commitment to Data Governance
As Journey North launches a new mobile-friendly data entry and registration system in the fall of 2022, these existing data packages will need to be updated to include data submitted from 2020 to 2024.
Submitted June 2024, this article updates our December 2021 piece on the efforts to publish Journey North data following F.A.I.R. data management principles.