Manage Your Research Data: Data Management Plans
Introduction
Data management plans (DMPs) are documents that outline what will be done with the research data during and after the research project. Many federal agencies and some private funding agencies require a DMP be submitted as part of your grant proposal. Whether required by funders or not, writing a DMP is a very helpful exercise that will improve the usefulness of your research data for yourself and others over the long term.
Data Management practices vary between disciplines and projects, making it difficult to provide a definitive set of best practice guidelines.
This is further complicated by the fact that data management standards are constantly changing, with best practices constantly changing to better serve the current data environment.
For specific guidance on best practice, consult:
- The instruction in your grant application.
- The standards suggested by your professional association (ex. Data Management recommendations via the American Psychological Association)
- The standards suggested by the federal agency funding projects in your field (ex. NSF Data Staring Policy)
The best tool for drafting a Data Management Plan is the DMPTool, which provides step-by-step assistance in drafting each session of the DMP.
Examples
Data Management requirements and individual DMPs vary, making it difficult to provide sample DMPs that are both current and representative. The University of Arizona and Stanford University, however, to provide examples (below).
Please note that agency requirements are constantly changing, so a DMP from even a few years ago may not be an ideal template.
The best resources to ensure your DMP is accurate are guidelines within your grant, or those posted on the funding agency's website.
The DMPTool is also constantly update, and is your best resource for drafting a Data Management Plan.
Issues to Consider
Data Collection
What data is being collected or create?
Questions to consider: type, format, volume, possible software needs.
General suggestions: if using digital data or documents, use formats that are non-proprietary or do not require specialized software when possible.
How is the data created or collected?
Questions to consider: What standards and methodologies will be used to collect data. How will you plan file and folding naming, version tracking, and storage?
General suggestions: Establish standards, methods, and organizational principles early in the project; determine short and long term storage solutions before data collection begins.
Documentation and Metadata
What supplemental documentation is needed to understand the data?
Question to consider: What documentation is needed for future researchers to understand the data?
Suggestion: Include "Read me" documents in .txt format to explain definitions, variables, units of measurement, vocabulary, software needs, format. If including codebooks, us a file format that is non-proprietary when possible. Documentation should include at least basic metadata:
Author, contributors, date of publication, access restrictions
Compliance and Ethics
Are there privacy issues associated with your data collection?
Questions to consider: Does the data require consent? Is personal information being shared? How will sensitive data be protected? Are you legally allowed to store/share the collected data?
Suggestions: If data relates to human subjects, data collection should be approved by the IRB. Data storage and sharing will need to meet HIPAA regulations.
Are there legal issues associated with your data?
Questions to consider: Does this data collection violate copyright or intellectual property laws? Will there be restriction on access to your data once it is shared?
Suggestion: Work with the PSU Innovation and Intellectual Property Office if there are questions of copyright or ownership regarding collected data.
Active Data Storage and Backup
How will your data be stored and backed-up during the project?
Questions to consider: Where will the data be housed during the project (ex. personal computer, university network, department hard drive)?
Will the data be backed-up? Who is responsible to ensure the data is saved and recovered?
Suggestions: For questions on live data, contact Office of Information Technology and their office of Research Computing. For long term archiving, see data archives and data sharing tab.
How will you ensure data security?
Questions to Consider: Are there security risks? How will colleagues gain access to the data?
Suggestion: Contact the Office of Information Technology regarding IT Security to discuss unique security aspects of your project. Possible security issues should also be addressed in your IRB proposal.
Long-term Preservation and Sharing
How much of the collected data will be retained and shared when the project is concluded?
Questions to Consider: Is all the data collected useful for future research? What is the minimal amount of data required to reproduce/prove the study? Will you share all "collected data" (all data produced during the project), "selected data" (data that may be of use in the future to other researchers), or just the "research data" (data used to reproduce the final results/thesis of the project)?
Suggestion: Most grant proposals will stipulate a minimum amount of data to be shared when the project concludes. Many data depositories do not have a cap on the amount of data that can be archived, so this should not be a factor in determining how much data to share; the key aspect is whether the data might be useful to future researchers.
What is the long-term preservation plan?
Questions to consider: Where will the data be archived and shared? Are there costs involved? Who will provide access, and will it be findable for future researchers? Is there a hosting cost?
Suggestions: Approved data depositories, such as PDXScholar and ICSPR, will provide long-term access to datasets while also making this content findable through databases and search engines. Many data depositories will also assign a DOI to the dataset, making it easier to find in the future. For more detail, see data archives and data sharing tab.
How will the final data be shared?
Questions to consider: How will future researchers gain access to this data?
Suggestion: The fewer barriers the better. In the past, it was not uncommon for researchers to contact the PI directly and request their research data directly. This approach introduces innumerable difficulties and is not accepted as an appropriate data sharing plan by grant funding agencies. Ideally, researchers should have immediate access to research data without needing to contact the original PI. The best way to provide this access is through a data depository (see data archives and data sharing tab).
Will there be restrictions on your data sharing?
Questions to consider: Will there be an embargo period before researchers may access your shared data? Is there data that will not be shared for privacy reasons? Are there ethics, privacy, copyright, or intellectual property issues that prevent data sharing?
Suggestion: If you are applying for a grant, be sure to at least meet the minimum data sharing requirements in your proposal. If you are restricting access to the data, explain why in the Data Management Plan.
Responsibilities
Who is responsible for data management?
Questions to consider: Who is responsible for implementing the data management plan, reviewing its content, and ensuring metadata, data quality, storage, backup, and archiving.
Suggestion: Data collection, accuracy, quality, and grant compliance is the responsibility of the PI, though there are offices at PSU to assist researchers. The Library provides DMP workshops and trainings in data management, the Office of Information Technology assists with live data storage and intellectual property, and the office of Research and Graduate Studies assists with the grant process. Finally, if researchers deposit their data in PDXScholar, the Institutional Repository of Portland State University, the library will ensure long term access that is in compliance with federal grant requirements.
Recommended Resource: DMPTool
- DMPToolThe Data Management Planning Tool (DMPTool) helps you create, review, and share data management plans that meet institutional and funder requirements. Developed by the University of California Curation Center of the California Digital Library.
Additional Web Resources
- Portland State University Research Data GuidebookThis guidebook, compiled by Prof. Kimberly Pendell, assists Portland State faculty and students with the proper care and management of their research data by gathering together the University’s infrastructure, training, and recommended best practices.