Manage Your Research Data: General Best Practices

This guide provides a primer on the fundamentals of data management.

On This Page

Overview of Data Management
Practices varies between disciplines, and this section points researchers to potentially useful external resources.

Issues to Consider
Addresses major data management questions relating to data collection, documentation and metadata, compliance and ethics, active data storage and backup, long-term preservation and sharing, and responsibilities.

Additional Web Resources
Links to some other excellent resources.

Overview

Data Management practices vary between disciplines and projects, making it difficult to provide a definitive set of best practice guidelines.
This is further complicated by the fact that data management standards are constantly changing, with best practices constantly changing to better serve the current data environment. 
For specific guidance on best practice, consult:
- The instruction in your grant application.
- The standards suggested by your professional association (ex. Data Management recommendations via the American Psychological Association)
- The standards suggested by the federal agency funding projects in your field (ex. NSF Data Staring Policy)

The best tool for drafting a Data Management Plan is the DMPTool, which provides step-by-step assistance in drafting each session of the DMP. 

Issues to Consider

Data Collection

What data is being collected or create?
Questions to consider: type, format, volume, possible software needs.
General suggestions: if using digital data or documents, use formats that are non-proprietary or do not require specialized software when possible. 

How is the data created or collected?
Questions to consider: What standards and methodologies will be used to collect data. How will you plan file and folding naming, version tracking, and storage?  
General suggestions: Establish standards, methods, and organizational principles early in the project; determine short and long term storage solutions before data collection begins.

Documentation and Metadata

What supplemental documentation is needed to understand the data?
Question to consider: What documentation is needed for future researchers to understand the data?
Suggestion: Include "Read me" documents in .txt format to explain definitions, variables, units of measurement, vocabulary, software needs, format. If including codebooks, us a file format that is non-proprietary when possible. Documentation should include at least basic metadata: 
Author, contributors, date of publication, access restrictions

Compliance and Ethics

Are there privacy issues associated with your data collection?
Questions to consider:  Does the data require consent? Is personal information being shared? How will sensitive data be protected?  Are you legally allowed to store/share the collected data?
Suggestions: If data relates to human subjects, data collection should be approved by the IRB. Data storage and sharing will need to meet HIPAA regulations.

Are there legal issues associated with your data?
Questions to consider:  Does this data collection violate copyright or intellectual property laws? Will there be restriction on access to your data once it is shared?
Suggestion: Work with the PSU Innovation and Intellectual Property Office if there are questions of copyright or ownership regarding collected data.

Active Data Storage and Backup

How will your data be stored and backed-up during the project?
Questions to consider: Where will the data be housed during the project (ex. personal computer, university network, department hard drive)?
Will the data be backed-up? Who is responsible to ensure the data is saved and recovered?
Suggestions: For questions on live data, contact Office of Information Technology and their office of Research Computing. For long term archiving, see data archives and data sharing tab.

How will you ensure data security?
Questions to Consider: Are there security risks?  How will colleagues gain access to the data?
Suggestion: Contact the Office of Information Technology regarding IT Security to discuss unique security aspects of your project. Possible security issues should also be addressed in your IRB proposal.

Long-term Preservation and Sharing

How much of the collected data will be retained and shared when the project is concluded?
Questions to Consider: Is all the data collected useful for future research? What is the minimal amount of data required to reproduce/prove the study? Will you share all "collected data" (all data produced during the project), "selected data" (data that may be of use in the future to other researchers), or just the "research data" (data used to reproduce the final results/thesis of the project)?
Suggestion: Most grant proposals will stipulate a minimum amount of data to be shared when the project concludes.  Many data depositories do not have a cap on the amount of data that can be archived, so this should not be a factor in determining how much data to share; the key aspect is whether the data might be useful to future researchers.

What is the long-term preservation plan?
Questions to consider: Where will the data be archived and shared? Are there costs involved? Who will provide access, and will it be findable for future researchers? Is there a hosting cost?
Suggestions: Approved data depositories, such as PDXScholar and ICSPR, will provide long-term access to datasets while also making this content findable through databases and search engines. Many data depositories will also assign a DOI to the dataset, making it easier to find in the future. For more detail, see data archives and data sharing tab.

How will the final data be shared?
Questions to consider: How will future researchers gain access to this data? 
Suggestion: The fewer barriers the better. In the past, it was not uncommon for researchers to contact the PI directly and request their research data directly. This approach introduces innumerable difficulties and is not accepted as an appropriate data sharing plan by grant funding agencies. Ideally, researchers should have immediate access to research data without needing to contact the original PI. The best way to provide this access is through a data depository (see data archives and data sharing tab).

Will there be restrictions on your data sharing?
Questions to consider: Will there be an embargo period before researchers may access your shared data? Is there data that will not be shared for privacy reasons? Are there ethics, privacy, copyright, or intellectual property issues that prevent data sharing? 
Suggestion: If you are applying for a grant, be sure to at least meet the minimum data sharing requirements in your proposal. If you are restricting access to the data, explain why in the Data Management Plan.

Responsibilities 

Who is responsible for data management?
Questions to consider: Who is responsible for implementing the data management plan, reviewing its content, and ensuring metadata, data quality, storage, backup, and archiving.
Suggestion: Data collection, accuracy, quality, and grant compliance is the responsibility of the PI, though there are offices at PSU to assist researchers. The Library provides DMP workshops and trainings in data management, the Office of Information Technology assists with live data storage and intellectual property, and the office of Research and Graduate Studies assists with the grant process. Finally, if researchers deposit their data in PDXScholar, the Institutional Repository of Portland State University, the library will ensure long term access that is in compliance with federal grant requirements.