Research Data Management
Document Your Data
Data sets should be accompanied by information that explains the data.
Understanding the data is an essential part of of being able to use it. Where it came from, how it was recorded, what metrics or processes were used -- these are just a few of aspects of being able to work with data, and they can all be answered with documentation.
Documentation helps organize data and facilitates data sharing, discovery, and curation
and is essential for re-use and preservation. Whether it is to keep track from the beginning of the project to the end, to help your future self make sense of the data, or to enable sharing with others, taking even a small amount of time to document as you go can save hours in the long run and may even save your data from being unusable.
Documentation can be created formally or informally.
Formats such as discipline-specific templates or metadata schemas, or simple tools like text editors or lab notebooks can each be used to document data sets. There are a number of tools and methods for efficiently creating consistent documentation -- some may be standards for your discipline, and others may be applied as needed for your individual situation.
Some examples of informal metadata that can be created during the research project:
- Readme - a simple text document describing the contents of a directory
Resources for creating ReadMe documents
- Templates - a standardized document that can help ensure that all appropriate details are captured
- Your notes
- Lab notebooks (paper or electronic)
The choice to use a formal metadata scheme is often dictated by the discipline from which the data originates. Committing to a formal scheme requires knowledge of the scheme and the tools that support its creation and use.
- DDI (Data Documention Initiative)
An international, XML-based metadata specification for social and behavioral sciences data.
More info: An introduction from the Data Ab Initio blog (K. Briney)
- Dublin Core
A general specification for describing many kinds of resources
- EML (Ecological Metadata Language)
An XML-based modular metadata specification designed for and by the ecology discipline
- ISO 191 Series (International Organization for Standardization)
An international set of standards for documenting geospatial data, managed by ISO Technical Committee 211
Some useful tools for creating metadata for your research data include:
- Colectica for Excel
A free Excel plug-in used to document spreadsheet data using the DDI specification
- Extended Attributes for SAS 9.4 and higher
A SAS Enterprise Guide add-in to describe variable attributes using the DDI specification
A KNB (Knowledge Network for Biocomplexity) application that allows scientists to describe their data sets in the EML specification and share their descriptions and data via KNB Metacat
- Other DDI tools
A list of metadata tools maintained by the DDI Alliance