Document Your Data
Informal Documentation
Effective data documentation does not require complex tools. A text editor or lab notebook can be used to document data. Tools, formats, and methods for data documentation can vary by discipline and even by lab or research group.
Read Me files
A Read Me is a file created using a plain text editor and saved using the name README.txt. A Read Me file is a very flexible form of documentation. It can describe the context of a project, the contents of a folder, or any other information that you will need later. Create and save a Read Me using plain text (.txt) to ensure the file is readable now and in the future.
Further resources for creating Read Me documents:
- Introduction to Read Me files (Data Ab Initio blog by Kristin Briney)
- Guide to writing Read Me metadata (Cornell University Research Data Management Service Group)
- Read Me elements and template for data deposit (Georgia Tech Library)
Data Dictionaries
A data dictionary describes the contents of a structured data set in a file that's separate from the data itself. Data dictionaries document necessary information for interpreting the data now and reusing the data in the future. Kristin Briney's Data Ab Initio blog includes an informative post about data dictionaries.
Templates
A template is a way to remind yourself and your research partners to record essential information about a project or data. Templates are helpful for ensuring consistency over time. You can create as many templates as you need and use them with paper and pen or with a computer. Kristin Briney's Data Ab Initio blog includes a helpful post about templates.
Formal Metadata
The choice to use a formal metadata scheme is often dictated by the discipline from which the data originates. Committing to a formal scheme requires knowledge of the scheme and the tools that support its creation and use.
Common descriptive metadata standards with wide adoption include:
- Dublin Core
A general specification for describing many kinds of resources - DDI (Data Documention Initiative)
An international, XML-based metadata specification for social and behavioral sciences data - EML (Ecological Metadata Language)
An XML-based modular metadata specification designed for and by the ecology discipline - ISO 191 Series (International Organization for Standardization)
An international set of standards for documenting geospatial data, managed by ISO Technical Committee 211
Tools for Creating Metadata
- Colectica for Excel
A free Excel plug-in used to document spreadsheet data using the DDI specification - Extended Attributes for SAS 9.4 and higher
A SAS Enterprise Guide add-in to describe variable attributes using the DDI specification - Morpho
A KNB (Knowledge Network for Biocomplexity) application that allows scientists to describe their data sets in the EML specification and share their descriptions and data via KNB Metacat - Other DDI tools
A list of metadata tools maintained by the DDI Alliance