The clean experimental coder part 1: Naming variables
Introduction
I recently started reading the excellent book Clean Code: A handbook of Agile Software Craftsmanship. It offers numerous tips and techniques for code that is neat and easy to read. I feel that within the scientific community we do not always give clean code the attention it deserves. The buzz word is reproducability, but I would like to argue that reproducable code is first and foremost clean code.
Looking back at the code I produced as a PhD student, I have to admit that it usually did not look too great. At that time I was not really concerned with this aspect of coding. In fact I was usually quite proud of the the messy things I created, but several several years of experience and eating some humble pie have shifted my perspective. In a sense, the time was exactly right to get into contact with this book.
Clean Code is written by a professional industry software developer. Therefore, not all chapters are relevant to how computer programming is used in research in psychology. But there is still a tremendous amount of information that is incredibly useful irrespective of the field in which you work. After reading the first chapter, I decided to kick of a series of posts inspired by this book, but applied to code examples that I have encountered in my own work (both my programming course as well as the online experiments I am involved in).
Avoid mental mapping
Our brains are wired to read. The closer the text we read is to natural language, the easier it is to process that text. Because we want to conserve our brain power to solve actual problems, it is a good to write your code as close to natural language as possible. For example, whenever we use abbreviations we have to make an additional mental step to link the abbreviation to the actual concept. In addition, an abbreviation might make sense for you but not necessarily for someone else who reads your code. Consider the variable ins
in the following line of code:
ins = "We rest assured that you already know the game, as choosing Rock, Paper or Scissors by choosing respectively 1, 2 ot 3. The computer will also choose."
Clearly, the variable holds an instruction for a computer game, but the fact that it does is not immediately clear by the name alone. And suppose you have not seen the variable declaration, how would you interpret the following line of code:
print(ins)
Clearly it is printing something, but what exactly is it printing? Storing characters is cheap, so your code will read more fluently if you just write the name in full:
print(instructions)
Use intention revealing names
The name of a variable should tell you everything you need to know about its purpose within the context of your code. In the following code, a player is asked to select what they will play in a game of rock, paper, scissors:
player = input("Please pick rock(1), paper(2) or scissors(3): ")
Is player really the best choice here? The same name could also be used to hold the name of the player. It seems that this variable will contain an option that the player has selected, so perhaps selected_option
, or player_action
would better capture the content of that variable.
Avoid disinformation
The name of a variable should also not make the wrong suggestion. For example, the state of a system can be used to describe how that system is currently operating. For a computer game, the state can be "active" or "paused". So when I see the word state
in a variable name, I expect that the content of that variable will tell me someting about the state of the system it is describing. Now let's look at the following code:
play_state = input("Do you want to play a game of to play a game of 'rock, paper, scissors' against the computer! [y]es/[n]o:")
Does the play_state
variable tell us anything about the state of the game? Unfortunately it does not, so this might potentially confuse someone who reads your code and expects an actual description of the status of a game. Since the line of code gets a response from the user, a more appropriate variable name could be user_response
. We can then create an actual state (e.g., game_status
) variable by evaluating the content of the user_response
variable. Splitting this up has the additional benefit that we can add input validation to the user_response
variable (it would make less sense to apply input validation to a play_state
variable).
user_response = input("Do you want to play a game of to play a game of 'rock, paper, scissors' against the computer! [y]es/[n]o:")
game_status = "running" if user_response == "y" else "terminated"
Using single letter variables
Based on the principles mentioned above, it is obvious that we should avoid single character variables names such as a
, b
, c_1
, ... . But there are a few specific cases where this is less frowned upon. Specifically, when using counter variables in loops single characters such as i, j and k are occasionaly used.
for i in range(10):
for j in range(5):
print(i, j)
Also, when single characters make sense in the problem domain you are working, they can also be used. For example, in mathematics we use x and y all the time to represent coordinates and function arguments. So when working out a mahematical problem in code it is also fine to use the same symbols that you would use as if you were to write out the mathematics on paper
x = 10
y = 5
distance = ( x**2 + y**2)**0.5