Manage Your Research Data: Overview
What is Data Management?
Data Management is the process by which researchers plan to collect, store, archive, and ultimately share their research data. Many questions related to Data Management have long been issues researchers are trained to address through the course of their work:
What data are collected and created?
How is the data created or collected?
What supplemental documentation is needed to understand the data?
Are there privacy issues associated with your data collection?
Are there legal issues associated with your data?
How will your data be stored and backed-up during the project?
How will you ensure data security?
How much of the collected data will be retained and shared when the project is concluded?
What is the long-term preservation plan?
How will the final data be shared?
Answers to these questions will vary by discipline and project, though there are general best practices.
The most useful resource for answering these questions as they relate to individual projects are the DMPTool.
In addition to this guide, the Library also provides DMP workshops.
What is a Data Management Plan?
A Data Management Plan (DMP) is a document, typically a page in length, that explicitly addresses the questions above as they relate to a specific research project. The content of a DMP will vary by discipline, funding agency, and project. Some DMPs will be no more than a paragraph explaining that no data was collected nor shared, while a few may be two pages to explain why specific data is not being shared (usually for legal or privacy reasons). Pragmatically, a successful DMP should at least meet the basic requirements for grants (these guidelines will be spelled out in the grant or by the granting agency).
Data Management requirements are constantly changing, making it extremely difficult to provide a definitive "how to" guide for drafting a DMP. For up-to-date guidance relating to your specific topic, consult:
- The instructions in your grant application.
- The standards suggested by your professional association (ex. Data Management recommendations via the American Psychological Association)
- The standards suggested by the federal agency funding projects in your field (ex. NSF Data Staring Policy)
Data Management requirements and individual DMPs vary, making it difficult to provide sample DMPs that are both current and representative. The University of Arizona and Stanford University, however, to provide examples (below).
Please note that agency requirements are constantly changing, so a DMP from even a few years ago may not be an ideal template.
The best resources to ensure your DMP is accurate are guidelines within your grant, or those posted on the funding agency's website.
The DMPTool is also constantly update, and is your best resource for drafting a Data Management Plan.
Data Management guidelines were written to apply across all disciplines. As a result, the definitions and terminology used is often extremely broad. While potentially frustrating, this ambiguity is meant to allow flexibility for researchers as they write a Data Management Plan. Below are some comment terms and definitions:
Data reflects any information created during the course of a research project that is needed to validate or recreate the final results of the study. This can include, but is not limited to: test results, statistics, code, images, computer files, survey responses, transcripts, recordings, laboratory logs, or algorithms.
Live data is data currently being created, manipulated, or used for an ongoing research project.
Archived Data is data that is no longer being altered or manipulated, or has served as final research data for a grant or published study.
Final Research Data
This is data generated during a project that is needed to validate or recreate the results and conclusion of the completed study. The scope of “Final Research Data” may vary between projects, and it is the responsibility of the Principal Investigator to determine and justify the scope of their Final Research Data within their Data Management Plan.
The primary researcher responsible for managing the research data. Principal Investigator may also be the researcher tasked with overseeing a laboratory, or the lead team member responsible for overseeing data collection and creation.
Data Management Plan
The document drafted by the Principal Investigator, through which data creation, preservation, and sharing policies are outlined. Fundamental issues to be addressed in each plan must include: data collection methods, documentation and metadata, ethics and legal compliance, storage and backup policies, data sharing, data management responsibilities.
Open Data vs Public Data
Public data is publicly available upon request, while open data is immediately and freely available without an intermediary. Data produced by the National Center for Education Statistics (NCES) that must be requested is "public" while data that can be immediately downloaded from the NCES website is "open." Similarly, if researchers need to directly contact a PI for access to research data, this does not meet open data sharing requirements; conversely, depositing data in an open repository like PDXScholar, which provides 24/7 immediate access to content, does count as open data.