Skip to Main Content

Data management for students: Variable names and labels

Variable names and labels

The organization, names and labels of variables contribute significantly to making your data files understandable and comprehensible. This is not only important when you share your data with others, or collaborate on data, but also for yourself. Abstract filenames without labels make it difficult overtime to find out what that particular variable stood for.

Variable names

In general, there are three strategies for naming variables: 

  1. Using a numeric code that corresponds to the position of the variables in a system (e.g. V001, V002, V003);
  2. Using codes that refer to the research tool used (e.g. question number in a questionnaire: Q1a, Q1b, Q2);
  3. Using names that refer to the content of the variables (e.g. BIRTH, AGE, GENDER).

Tips:

  • Start with a letter. Do not start with a question mark, exclamation mark or special character such as #, @, & (these are often used for specific purposes in software applications). 
  • There should be no spaces in variable names.
  • Use short names, no longer than eight characters.
  • Do not use diacritical characters (characters set above, below, or through a letter, such as ä, ç, ø) or national special characters.
  • Use meaningful names that help you to orient yourself in the database. 
  • In a longitudinal study (or a study with repeated measurements) the consistent use of variable names (such as BIRTH, AGE, GENDER) makes it easier to merge the different data files.
 

Variable labels

Labels of variables give a short description of the variable. In many cases, a clear label is indispensable for understanding the variable

Tips:

  • Preferably label your variables in English so that the dataset can be understood by as wide an audience as possible. 
  • Although the labels are not bound to a limited number of characters, it is advisable to find a compromise between length and clarity of the label. Labels of variables are often included in the output of analyses and the use of very long labels can then be impractical. Sometimes part of the label is omitted, which can make the label incomprehensible. 
    • Example: A label can consist of (part of) the question, or a description of the question, such as 'How old are you?' or 'Age'. 
 

Labels of variable values

Sometimes it is necessary to assign labels to the values of variables. This is not necessary for continuous variables such as age, height, or weight because these values speak for themselves. However, this is not the case for nominal and ordinal variables. A nominal variable such as gender has two values and is usually represented as an 0 or 1 in the data file. In that case, it is necessary to assign labels to these values ('0=man; 1=woman'), so that you and any re-users of the data know which value represents which gender. The same goes for ordinal variables with, for example, an “agree/disagree” scale from 1 to 5. By assigning labels to the values it becomes clear that 1 stands for 'completely disagree' and 5 stands for 'completely agree'.