Pure Functions and Referential Transparency

Introduction

What does it mean when we talk about the purity of a function? And why does this matter? Why should we as programmers care about the purity of the functions we define? What benefits do pure functions have over impure functions? This blog post will be a elementary introduction into how we can think about functions in terms of purity. Furthermore, we’ll look into how side effects in our functions increases the difficulty of reasoning about and reusing our code.

Side Effects

Imagine you are integrating a third party library that validates credit card information. Its API instructs you that you only need to pass in an array of credit card information into a method validate() that returns a boolean regarding whether or not the supplied card information is valid. You are pleased to find that the library is simple and does exactly what you were looking for. A few months later, your billing service suddenly stops working. You get reports that customers can’t verify their credit card information. Absolutely no orders are going through. You are confused about what caused this since you haven’t altered any of the relevant code recently. After some debugging and digging through the library code you adopted to verify credit card information, you see that the code is in fact communicating with a third party to verify the credit cards. You check the internet and realize that the third party service’s network is not operating correctly.

This is a perfect example of unrealized side effects buried within your program, and now your business value is suffering. We can label this 3rd party function impure because it contained behavior outside the computation of its documented return value. In other words, it was not simply using the data supplied by the inputs to determine the validity of the credit cards. The function was also communicating with the outside world via an external call to a 3rd party service.

How can we learn to better identify side effects that we ourselves write into our programs or adopt into our codebase via 3rd party libraries? How can we better reason about the costs of side effects and ways to prevent or at least control them in our code?

Referential Transparency

The previous example highlighted an external API call that communicated with a 3rd party Credit Card validation service as a side effect.  But what exactly is a side effect? And how can we learn to diagnose them in our code?

We can formalize our understanding about what constitutes a side effect via a term called referential transparency. Although this term sounds academic in its title, what it actually describes is quite easy to understand. In essence:

an expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program (in other words, yielding a program that has the same effects and output on the same input).

source: Wikipedia: Referential_transparency (Computer Science)

Mathematically this is an inherent attribute of functions. We know this from algebra.

For instance, take the function f(x) = x + 3. This is referentially transparent because given any input x, you will always have the same output. f(3) = 6. f(10) = 13. If we had another function g(x) = f(x) + f(x), it is very simply for us to evaluate any given input.

g(3) = f(3) + f(3)

In algebra, we can replace both f(3) calls with its  value and get the correct output for g(3):

g(3) = f(3) + f(3)

g(3) = 6 + 6

g(3) = 12.

Programming Example

Let’s try to translate this to an actual programming example.

Imagine we are building an application to ship orders placed by customers:

We can see from the code above that the ship method takes a single Order object as an argument. From this order object, it instantiates the proper delivery transport that will be responsible for delivering the shipment. It then uses the transport to actual ship the order. Finally, it instantiates and returns a ShipmentReceipt object that encapsulates all the readable details of the transaction so it can be received and understood by the user. Is the above ship function pure? To answer this question we simply need to determine if it is referentially transparent.  Remember earlier it was stated that:

an expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program (in other words, yielding a program that has the same effects and output on the same input).

source: Wikipedia: Referential_transparency (Computer Science)

If we had a program that utilized the Shipment object to ship an order like so:

Could we replace the call to Shipment.ship(order) with the return value of the method like so:

In the above snippet, we replaced the call to Shipment.ship(order) with its return value new ShipmentReceipt(order). If the ship method were referentially transparent this snipper would work exactly as the snippet calling Shipment.ship(). Unfortunately, that is not true. This is because the method contains side effects.

Impure Functions and Side Effects

A function can be defined as impure when it contains side effects.

We saw that side effects are simply any behaviors within our function outside of computing the function’s return value given the supplied input. In the Shipment example, we can see that line 7 of the Shipment module is a side effect because it breaks referential transparency. If we used the ship method’s immediate return value instead of calling the function itself:

the actual code to trigger the delivery would never be called and the behavior of the program would be changed.

The call to deliveryTransport.deliver(order) is a side effect because it involves interaction with the outside world. Perhaps it adds the order to a delivery database or contacts a 3rd party vendor to handle the delivery.

Why does this matter?

One of the most immediate advantages of pure functions is they vastly simplify our reasoning about our programs. In other words, if the function is referentially transparent it is unnecessary for us to be concerned with its underlying code because   its behavior is defined strictly by its return value for any given input.

When dealing with pure functions we only need to concern ourselves with the immediate scope when evaluating how the function behaves. It allows us to think about our programs more algebraically via substitution and reduction. Remember how we determined the value of g(3) in the mathematic example above? We did not have to step into the function f(x) to reason about return value of g(3). We simply knew that given input 3, function f(x) would always return the value 6.

Furthermore, functional purity allows us to think about the functions we define as “black box abstraction,”  or, in other words, as named and modular units that take in a specific input and always return a specific output. Black box abstractions inherently allow us to reuse code without worrying about potential behavioral impacts.

Impure function, on the other hand, complicates reuse.

For instance, what if we had to consider batch ordering. If a user orders multiple items how can we reuse the impure ship method efficiently? An order that contains three separate items would call deliveryTransport.deliver(order) three distinct times, which, depending on the implementation, could be wildly inefficient. Here it becomes clear that side effects make it more difficult to reason about the behavior of our programs. We are forced to consider more than the local state of evaluation to determine how exactly our program will behave.

Conclusion

In this post we learned that a function can be either pure or impure based on whether or not it contains side effects. We learned that the concept of referential transparency can help us formalize our diagnosis of side effects within the functions we define. For instance, if a function given a specific input contains behavior outside the computation of its return value, we learned that the function is not referentially transparent and thus not a pure function.

Next we explored that pure functions help simplify how our reasoning about how a function behaves. We can evaluate our programs more algebraically through reduction and substitution. This process is a well known model of evaluation called the substitution model. Lastly we learned that pure functions inherently make our code more modular and promote reuse due to the nature of black box abstractions.

As a counterpoint, we also learned that side effects complicate our own understanding about how functions behave because we need to keep track of more than the local state of evaluation. We also realized from the Shipment example that side effects constrain the modularity of our code. If we needed to batch handle orders and shipment, we could not easily plugin in our method into a batch-able process because it would involve potential inefficient and redundant calls to the the delivery transport mechanism.

What’s Ahead?

You may be asking how our programs can be useful if they never contain side effects. That is a completely valid question, especially as a newcomer to functional programming. The goal of functional programming is not to completely eliminate side effects as they will certainly be necessary when building complex systems. However, if we become more aware of the functions we define and the effects they generate, we can learn how to abstract communication to the outside world to a single layer in our application leaving the rest of our code clean, concise, and very modular. In future posts we will examine functional techniques and patterns that make this possible.

Further Reading and Learning Material

The Substitution Model

SICP Lecture 1B: 1B: Procedures and Processes; Substitution Model/

Coursera Course: Functional Programming Principles in Scala

Scala build.sbt and Dependencies

 

I’ve been slowly starting to explore web development with Scala via the Play Framework.

Coming from a PHP background, it has been useful for me to make connections between the two development worlds. For instance, how do you manage dependencies in your Scala project? How is it similar to using Composer to manage PHP dependencies? How are they different?

Scala has a very powerful tool called sbt , which is an acronym for Scala Build Tool. Part of its functionality offers a way to manage your application’s dependencies.

When you create a new Play application via the command line, there exists a build.sbt file in the root directory of the application. If you open that file, you’ll see the following code :

libraryDependencies ++= Seq(
  jdbc,
  anorm,
  cache,
  ws
)

** Note: the sbt allows two conventions for managing builds, including the build.sbt approach which is the simpler and more concise approach. If you need the full expressive power of Scala to manage more complicated builds, you can use .scala files.

It’s important to note here that this is actual Scala code. Unlike Composer, which uses  a JSON structure to manage dependencies, the code above is using Scala’s Seq object to define multiple dependencies at once. The ++= operate is simply a shorthand for:

libraryDependencies = libraryDependencies ++ Seq(//dependencies)

The ++ operator “returns a new sequence containing the elements from the left hand operand followed by the elements from the right hand operand.”

When you need to include more libraries for your dependencies, you can simply edit this list to include additional libraries. For instance, with the application I am building, I wanted to use an ORM like library to manage my data model. After doing some research, I found slick, which is an FPM that allows access to the data model using similar functional techniques that Scala provides for its core collection objects.

In slick’s README documentation, it states the following must be added to the application’s dependencies:

“com.typesafe.slick” %% “slick” % “2.0.2”,
“org.slf4j” % “slf4j-nop” % “1.6.4”,
“com.typesafe.play” %% “play-slick” % “0.6.0.1”

The syntax here was a bit confusing at first look, but it is follows a standard notation:

groupID % artifactID % revision

groupID is a token that allows artifacts to grouped together, artifactID is the token that refers to the actual dependency within the Group, and revision is the version number you require. This meta data will be transferred to Apache Ivy, which is the dependency manager that sbt delegates to when resolving your application’s dependencies.

You may have noticed the double percentage (%%) in:

“com.typesafe.slick” %% “slick” % “2.0.2”

and

“com.typesafe.play” %% “play-slick” % “0.6.0.1”

The %%  is a shorthand that includes your application’s version of Scala to properly resolve your dependency. This small detail was actually very important when I attempted to include slick as a dependency within my application. Every time I tried to update my application via sbt’s update command, I kept getting errors stating that the slick dependencies could not be resolved. Ultimately, the reason for this was because Typesafe Activator includes Scala 2.11 within its standard download, which Slick currently does not support. In order to bypass this and continue with development, I just altered the build.sbt ‘s scalaVersion to:

scalaVersion := "2.10.4"

and then re-compiled and re-ran the eclipse command (which is the IDE I’ve chosen to use, but Play has support for others)

Once you add a new dependency to the build.sbt file, you need to run update within the console or simply compile (which normally runs the update command) to download the actual libraries from their repositories into your application.

It’s also important to note that when resolving dependencies, sbt will only look in the standard Maven repositories. If the library you application is dependent on does not exist in the standard location, you’ll have to add a custom resolver to map the dependency to the proper location. I haven’t had the opportunity to deal with this yet.

Further Information:

https://www.playframework.com/documentation/2.3.x/SBTDependencies

http://www.scala-sbt.org/0.13/tutorial/Library-Dependencies.html

http://maven.apache.org/repository/index.html

http://www.playframework.com/documentation/2.3.x/Home