Manage Your Research Data: Overview
What is Data Management?
Data Management is the process by which researchers plan to collect, store, archive, and ultimately share their research data. Many questions related to Data Management have long been issues researchers are trained to address through the course of their work:
- What data are collected and created?
- How is the data created or collected?
- What supplemental documentation is needed to understand the data?
- Are there privacy issues associated with your data collection?
- Are there legal issues associated with your data?
- How will your data be stored and backed-up during the project?
- How will you ensure data security?
- How much of the collected data will be retained and shared when the project is concluded?
- What is the long-term preservation plan?
- How will the final data be shared?
Answers to these questions will vary by discipline and project, though there are general best practices.
The most useful resource for answering these questions as they relate to individual projects are the DMPTool.
Data Management guidelines were written to apply across all disciplines. As a result, the definitions and terminology used is often extremely broad. While potentially frustrating, this ambiguity is meant to allow flexibility for researchers as they write a Data Management Plan. Below are some comment terms and definitions:
Data reflects any information created during the course of a research project that is needed to validate or recreate the final results of the study. This can include, but is not limited to: test results, statistics, code, images, computer files, survey responses, transcripts, recordings, laboratory logs, or algorithms.
Live data is data currently being created, manipulated, or used for an ongoing research project.
Storage & Stored Data
Short or long-term storage of active data. This may be on OIT Research Computing data storage devices, local hard drives, or on the Cloud.
Archived Data is data that is no longer being altered or manipulated, or has served as final research data for a grant or published study. Archived data is being stored in a secure and permanent system, and is accessible to researchers.
Data that is made publicly accessible through data repositories like PDXScholar.
Final Research Data
This is data generated during a project that is needed to validate or recreate the results and conclusion of the completed study. The scope of “Final Research Data” may vary between projects, and it is the responsibility of the Principal Investigator to determine and justify the scope of their Final Research Data within their Data Management Plan.
The primary researcher responsible for managing the research data. Principal Investigator may also be the researcher tasked with overseeing a laboratory, or the lead team member responsible for overseeing data collection and creation.
Data Management Plan
The document drafted by the Principal Investigator, through which data creation, preservation, and sharing policies are outlined. Fundamental issues to be addressed in each plan must include: data collection methods, documentation and metadata, ethics and legal compliance, storage and backup policies, data sharing, data management responsibilities.
Open Data vs Public Data
Public data is publicly available upon request, while open data is immediately and freely available without an intermediary. Data produced by the National Center for Education Statistics (NCES) that must be requested is "public" while data that can be immediately downloaded from the NCES website is "open." Similarly, if researchers need to directly contact a PI for access to research data, this does not meet open data sharing requirements; conversely, depositing data in an open repository like PDXScholar, which provides 24/7 immediate access to content, does count as open data.