Skip to Main Content

Data management for students: Versioning

Versioning

By processing and analyzing data, new versions of data files are often produced during the research process. Cleaning up the data, creating new variables, adding or merging data, converting file formats and changing the dataset's structure are examples of changes that can be made to a dataset. However, the original data must always be preserved and reverted to earlier versions.

Tips:

  • Always save the raw data file and make sure that no changes can be made to it (e.g. save read-only, save to a different, secure location, set access rights).
  • Document changes that have been made to which version of the file.
  • Decide how many versions of a file you want to save, which versions you want to save, how long you want to keep them and how you want to structure them.
  • Store new versions of your files regularly, because files may become corrupt. For example, do this at weekly intervals on Friday afternoon, or upon completion of every chapter of your document.
  • Be consistent in naming the different versions, for example by adding the date (YYYY-MM-DD) in the file (e.g. 20201001_surveydata), or the version number (e.g. Surveydata_v2)
  • Do not use ambiguous descriptions of the version, such as '_new', '_lastversion' or '_revised'. There may always be newer versions of the file.