7 Open Science and Open Data

7.1. What Are Open Science and Open Data?

Although it is beyond the scope of this guide to open education to delve into data science, a brief introduction to open science and open data is included because of their direct connection to open access, open licenses, and the open education movement more generally.

Attribution: “Intersections of Openness: Open Access, Science, and Education” by Abbey Elder is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

The sharing of research results, methods, and data is central to scientific practice, and stimulates new forms of communication and collaboration. Today many funding agencies are mandating open access to the outputs and processes of research, accelerating research and learning, and increasing the impact universities can have beyond their walls. Open science and open data work together to further the reach of scientific research outputs. Open science supports the scientific community and the advancement of scientific discovery by promoting transparency and in turn supporting reproducibility and credibility. Open data focuses on making machine-readable datasets from a primary source accessible within the confines of privacy and cultural protocols.

Transparency is a core value in both open science and open data and refers to open sharing of data, research design, and materials, making it easier to reproduce the evidence derived from research (McKiernan et al., 2016). Openness in the sciences also seeks to make research outputs comprehensible to a broader public, thereby providing the opportunity for scientific research to have applications in non-academic environments (Bartling & Friesike, 2014). The “citizen science movement” creates the opportunity for public participation in research and practical applications supporting the greater reach of scientific outputs.

Attribution: Open Science: What, How, and Why? by SHB Online is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

With the amount of scientific data increasing at least twofold every year, issues surrounding the access, use, and curation of data sets are vital to consider. The data-rich, researcher-driven environment that is evolving poses new challenges and provides new opportunities in the sharing, review, and publication of research results. Ensuring access to primary research data will play a key role in seeing that the scholarly communication system evolves in a way that supports the needs of scholars and the academic enterprise as a whole.

Like open access articles or open educational resources, open data is freely available online and ideally openly licensed, allowing anyone to download, copy, and analyze the data without externally imposed financial, legal, or technical barriers. The term “open data” typically applies to a range of non-textual materials, including datasets, statistics, transcripts, survey results, and the metadata associated with these objects. Much open data is, in essence, the factual information that is necessary to replicate and verify research results. Other open data is gathered and made available by national, state, regional, or local governments or other agencies or organizations for public use and has not necessarily been used in research. Census data is an example of this kind of open data. Open data policies usually encompass the notion that machine extraction, manipulation, and meta-analysis of data should be permissible.

When it comes to data, according to open data advocacy organization the Open Knowledge Foundation, the key features of openness are:

  • Availability and access. The data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading online. The data must also be available in a convenient and modifiable form.
  • Reuse and redistribution. The data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable.
  • Universal participation. Everyone must be able to use, reuse, and redistribute the data—there should be no discrimination against fields of endeavor or against persons or groups. For example, “non-commercial” restrictions that would prevent “commercial” use, or restrictions of use for certain purposes (e.g., only in education), are not allowed. This rules out Creative Commons licenses with NC stipulations, for example.

One key point is that when opening up data, the focus is on non-personal data, that is, data that does not contain information about specific individuals. If you are sharing research data, it is essential to de-identify any data that is made open. Similarly, for some kinds of government data, national security restrictions or other security or privacy concerns may apply and require limitations on access and distribution of that data.

Broadly communicating results and making research data accessible and fully available for reuse encourages new research through the reanalysis of existing data, further leveraging the value of a research investment. Providing access to data that is made accessible in formats and under terms that enable full reuse promotes interoperability, and allows the data to be mined using cutting-edge computational tools across huge amounts of data to find connections, trends, and patterns that cannot be uncovered when data is closed or siloed.

7.2. Open Data Mandates for Federally Funded Research

In 2013, in the United States, the White House Office of Science and Technology Policy (OSTP) issued a policy memorandum requesting that research and data produced from federally funded research initiatives be made freely available to the public. This public policy declaration allows for up to a 12-month embargo of research from most funding agencies. However, in August 2022, the OSTP issued a second policy memorandum calling for all federally funded research (including the underlying data) to be publicly accessible without embargo by December 31, 2025. The new guidance calls on all federal agencies with research and development expenditures to implement a policy advancing open access of publications and underlying data of research funded by the agency immediately upon publication. This policy document overrides the initial one and applies more broadly to federal agencies beyond those engaged just with scientific research outputs. According to the memorandum:

A federal public access policy consistent with our values of equal opportunity must allow for broad and expeditious sharing of federally funded research—and must allow all Americans to benefit from the returns on our research and development investments without delay. Upholding these core U.S. principles in our public access policy also strengthens our ability to be a critical leader and partner on issues of open science around the world. The U.S. is committed to the ideas that openness in science is fundamental, security is essential, and freedom and integrity are crucial. (OSTP, 2022)

Under the new requirements, the supporting data for peer-reviewed scholarly publications resulting from federally funded research will need to be freely available and publicly accessible in a data repository by the time of publication, unless certain exceptions apply. Researchers will be responsible for creating data management and sharing plans that meet OSTP guidelines. The memorandum also specifies that other federally funded data not associated with peer-reviewed, scholarly publications should be shared freely. The OSTP offers further information online at Frequently Asked Questions: 2022 Public Access Policy Guidance.

“This updated guidance is probably the most important event for open science in the United States to date,” said Brian Nosek, Executive Director of the Center for Open Science. “This policy directive moves the thirty years of advocacy for open access within reach of the goal line for a complete transformation to open by default. Moreover, by also mandating sharing the data underlying reported results, this directive is a major leap forward for the open data movement.”

In addition to enabling verification of published findings, open data enables reuse of data for combining across multiple studies and even asking novel questions that were not considered by the original researchers. And, open data fosters opportunities for innovation for creating new services. Finally, transparency and sharing of data makes fraud and malpractice more inconvenient and easier to detect.

Other major funders of research in the United States, such as the Bill and Melinda Gates Foundation, have established open access publication policies in the last few years, including requirements that underlying data sets be made open.

7.3. Making Your Data Open

The process of making data truly open can seem overwhelming, but it doesn’t have to be. To simplify the process, it may be helpful to think about enabling open data through two basic routes:

  • Making data technically open. Ensuring that your data are made available as a complete set in a machine-readable format on an easily accessible platform, such as an online data repository, is key to enabling open data.
  • Making data legally open. Ensuring that your data are made available under legal terms that allow users to redistribute and fully reuse the data is the second key step to ensuring open data. The only way to be sure that your data are adequately covered is to put a license on it that conforms to the full Open Definition of open data. Many options for such licenses are available, such as those produced by Creative Commons (see Chapter 3).

Attribution: “The Case for Open Data” by Bernard Becker Medical Library is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

The Open Knowledge Foundation’s Open Data Handbook provides easy-to-follow guidance on how to make your data open.

Many data repositories are fee-based, but as a graduate student, you likely have access to a repository via your university. There are both discipline-specific and general repositories. Some commonly used general repositories include Dryad, Figshare, and GitHub. The journal Nature maintains a useful list of both specialist and generalist repositories, mainly for the sciences. If you’re not sure whether you have access to a repository, ask your advisor or check with the university library or IT support.

References

Bartling, S., & Friesike, S. (2014). Opening science: The evolving guide on how the internet is changing research, collaboration and scholarly publishing. Springer Open. https://doi.org/10.1007/978-3-319-00026-8

McKiernan, E. C., et al. (2016). Point of view: How open science helps researchers succeed. eLife, 5, e16800. https://doi.org/10.7554/eLife.16800

OSTP (Office of Science and Technology Policy). (2022, August 25). Memorandum for the heads of executive departments and agencies: Ensuring free, immediate, and equitable access to federally funded research. https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-Access-Memo.pdf

Attributions

This chapter was adapted from the following openly licensed sources:

Center for Open Science, “A Win for Open Science: White House OSTP’s Updated Guidance Advances Open Access and Data Sharing Across Federal Agencies” (September 1, 2022), licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0)

Jill Emery, “Overview of Open Access Scholarly Publishing,” in Karen Bjork and Jill Emery, Portland State University Library Open Access Guidebook (n.d.), licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0)

Open Knowledge Foundation, Open Data Handbook, licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

SPARC, “Open Data” (n.d.), licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

SPARC “Open Data Factsheet” (n.d.), licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

The University of British Columbia, Program for Open Scholarship and Education (2023), licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

I am grateful to these creators for making this source material open.

License

Icon for the Creative Commons Attribution 4.0 International License

A Graduate Student's Guide to Open Education and Scholarship Copyright © 2023 by Andrea Kingston is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book