Pure Functions and Referential Transparency


What does it mean when we talk about the purity of a function? And why does this matter? Why should we as programmers care about the purity of the functions we define? What benefits do pure functions have over impure functions? This blog post will be a elementary introduction into how we can think about functions in terms of purity. Furthermore, we’ll look into how side effects in our functions increases the difficulty of reasoning about and reusing our code.

Side Effects

Imagine you are integrating a third party library that validates credit card information. Its API instructs you that you only need to pass in an array of credit card information into a method validate() that returns a boolean regarding whether or not the supplied card information is valid. You are pleased to find that the library is simple and does exactly what you were looking for. A few months later, your billing service suddenly stops working. You get reports that customers can’t verify their credit card information. Absolutely no orders are going through. You are confused about what caused this since you haven’t altered any of the relevant code recently. After some debugging and digging through the library code you adopted to verify credit card information, you see that the code is in fact communicating with a third party to verify the credit cards. You check the internet and realize that the third party service’s network is not operating correctly.

This is a perfect example of unrealized side effects buried within your program, and now your business value is suffering. We can label this 3rd party function impure because it contained behavior outside the computation of its documented return value. In other words, it was not simply using the data supplied by the inputs to determine the validity of the credit cards. The function was also communicating with the outside world via an external call to a 3rd party service.

How can we learn to better identify side effects that we ourselves write into our programs or adopt into our codebase via 3rd party libraries? How can we better reason about the costs of side effects and ways to prevent or at least control them in our code?

Referential Transparency

The previous example highlighted an external API call that communicated with a 3rd party Credit Card validation service as a side effect.  But what exactly is a side effect? And how can we learn to diagnose them in our code?

We can formalize our understanding about what constitutes a side effect via a term called referential transparency. Although this term sounds academic in its title, what it actually describes is quite easy to understand. In essence:

an expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program (in other words, yielding a program that has the same effects and output on the same input).

source: Wikipedia: Referential_transparency (Computer Science)

Mathematically this is an inherent attribute of functions. We know this from algebra.

For instance, take the function f(x) = x + 3. This is referentially transparent because given any input x, you will always have the same output. f(3) = 6. f(10) = 13. If we had another function g(x) = f(x) + f(x), it is very simply for us to evaluate any given input.

g(3) = f(3) + f(3)

In algebra, we can replace both f(3) calls with its  value and get the correct output for g(3):

g(3) = f(3) + f(3)

g(3) = 6 + 6

g(3) = 12.

Programming Example

Let’s try to translate this to an actual programming example.

Imagine we are building an application to ship orders placed by customers:

We can see from the code above that the ship method takes a single Order object as an argument. From this order object, it instantiates the proper delivery transport that will be responsible for delivering the shipment. It then uses the transport to actual ship the order. Finally, it instantiates and returns a ShipmentReceipt object that encapsulates all the readable details of the transaction so it can be received and understood by the user. Is the above ship function pure? To answer this question we simply need to determine if it is referentially transparent.  Remember earlier it was stated that:

an expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program (in other words, yielding a program that has the same effects and output on the same input).

source: Wikipedia: Referential_transparency (Computer Science)

If we had a program that utilized the Shipment object to ship an order like so:

Could we replace the call to Shipment.ship(order) with the return value of the method like so:

In the above snippet, we replaced the call to Shipment.ship(order) with its return value new ShipmentReceipt(order). If the ship method were referentially transparent this snipper would work exactly as the snippet calling Shipment.ship(). Unfortunately, that is not true. This is because the method contains side effects.

Impure Functions and Side Effects

A function can be defined as impure when it contains side effects.

We saw that side effects are simply any behaviors within our function outside of computing the function’s return value given the supplied input. In the Shipment example, we can see that line 7 of the Shipment module is a side effect because it breaks referential transparency. If we used the ship method’s immediate return value instead of calling the function itself:

the actual code to trigger the delivery would never be called and the behavior of the program would be changed.

The call to deliveryTransport.deliver(order) is a side effect because it involves interaction with the outside world. Perhaps it adds the order to a delivery database or contacts a 3rd party vendor to handle the delivery.

Why does this matter?

One of the most immediate advantages of pure functions is they vastly simplify our reasoning about our programs. In other words, if the function is referentially transparent it is unnecessary for us to be concerned with its underlying code because   its behavior is defined strictly by its return value for any given input.

When dealing with pure functions we only need to concern ourselves with the immediate scope when evaluating how the function behaves. It allows us to think about our programs more algebraically via substitution and reduction. Remember how we determined the value of g(3) in the mathematic example above? We did not have to step into the function f(x) to reason about return value of g(3). We simply knew that given input 3, function f(x) would always return the value 6.

Furthermore, functional purity allows us to think about the functions we define as “black box abstraction,”  or, in other words, as named and modular units that take in a specific input and always return a specific output. Black box abstractions inherently allow us to reuse code without worrying about potential behavioral impacts.

Impure function, on the other hand, complicates reuse.

For instance, what if we had to consider batch ordering. If a user orders multiple items how can we reuse the impure ship method efficiently? An order that contains three separate items would call deliveryTransport.deliver(order) three distinct times, which, depending on the implementation, could be wildly inefficient. Here it becomes clear that side effects make it more difficult to reason about the behavior of our programs. We are forced to consider more than the local state of evaluation to determine how exactly our program will behave.


In this post we learned that a function can be either pure or impure based on whether or not it contains side effects. We learned that the concept of referential transparency can help us formalize our diagnosis of side effects within the functions we define. For instance, if a function given a specific input contains behavior outside the computation of its return value, we learned that the function is not referentially transparent and thus not a pure function.

Next we explored that pure functions help simplify how our reasoning about how a function behaves. We can evaluate our programs more algebraically through reduction and substitution. This process is a well known model of evaluation called the substitution model. Lastly we learned that pure functions inherently make our code more modular and promote reuse due to the nature of black box abstractions.

As a counterpoint, we also learned that side effects complicate our own understanding about how functions behave because we need to keep track of more than the local state of evaluation. We also realized from the Shipment example that side effects constrain the modularity of our code. If we needed to batch handle orders and shipment, we could not easily plugin in our method into a batch-able process because it would involve potential inefficient and redundant calls to the the delivery transport mechanism.

What’s Ahead?

You may be asking how our programs can be useful if they never contain side effects. That is a completely valid question, especially as a newcomer to functional programming. The goal of functional programming is not to completely eliminate side effects as they will certainly be necessary when building complex systems. However, if we become more aware of the functions we define and the effects they generate, we can learn how to abstract communication to the outside world to a single layer in our application leaving the rest of our code clean, concise, and very modular. In future posts we will examine functional techniques and patterns that make this possible.

Further Reading and Learning Material

The Substitution Model

SICP Lecture 1B: 1B: Procedures and Processes; Substitution Model/

Coursera Course: Functional Programming Principles in Scala