The rationale behind Java Lambda/Closures

courtesy of blog.takipi.com

In this article I would like to talk about the design choices behind Java’s most awaited language feature Lambdas/Closures. Its no secret that Java 8 now provides the support for lambda functions more precisely closures. But interestingly the implementation and usage of Java closures differs significantly compare to other modern programming languages. There is already a ton of information written on the this subject. So in particular, I would like to provide some thoughts on

  1. Why Java’s closures can only capture final or effectively final local variables?
  2. Why do we specify a Java lambda function with an interface definition(SAM)?

Before going further we need to understand the basic terminology.

What are final variables?

Any variable local or class member variable whose value cannot be changed after the declaration. In other words any variable whose value is immutable is called a final variable. In Java we mark these type of variables with final keyword.

Why do we need final variables?

Immutable or final variables play a crucial role during runtime optimization and during concurrent programming. Certain guarantees/assumptions can be made when dealing with final variables.

What are effectively final variables?

These are variables without a final modifier but whose value is not changed over the program once its value is assigned.

What is a lambda function?

A lambda function is a piece of code which can be passed around as data. This essentially blurs the border between data vs code. Traditionally data is some thing that is passed to a function and code is something that act up on the data. But the concept of passing code itself as data is a very profound idea originated in lambda calculus and predominantly used in functional programming languages. Functionally, lambdas are comparable to C function pointers but with strict, safe compile time and runtime type checking.

Lets look at an example for why do we need to pass code as data?

Essentially 95% of the code is the same for both functions, Only 5% ie.., code inside the if statement for filter check differs. This pattern repeats very frequently in many problems like sorting a list in increasing or decreasing order. Now, Instead of hard coding the filter check in each function, What if the language has some way to pass the code for the check as parameter to filter method? Each programming language has it own way to handle this situation. For example, C/C++98 has raw function pointers. C++11 has closures, C# has delegates(strongly typed function pointers). Javascript has closures

Now lets look at how above methods can be unified with the help of lambda functions in Java. Since Java 8 has lambdas we can now pass pieces of code as parameters to methods. The specification of lambda function’s prototype in Java differs significantly compared to other languages. We will go over this later in the article. But for now, any interface with a single method can act as the prototype for lambda function. So in our case, we can specify a lambda which takes an element as input and returns bool for the filter method as shown below.

Interestingly, Instead of creating our own Myfilter interface, Java 8 come with a convenient Predicate<T> class to exactly suite our needs. With Predicate class the filterList prototype looks as shown below.

So finally in the main function we can pass any lambda whose prototype matches with the test method of Predicate<T> interface. These examples demonstrates the power of using code as data and the need for lambda functions.

What is a closure?

Closure can be thought of as lambda with an associated environment with it.
What do I mean by this? In the above example the lambda ele -> ele % 2 == 0 can access every local variable of main method. But of course the lambda did not use any of these local variables. Lets consider the below example where in local variable div is being accessed by lambda.

Now lambda has an environment containing div variable captured. This type of lambda functions are called as closures. Even though it looks straight forward there are couple of important thing to be considered to enable this feature in a language.

First we need to understand what is the value/state of div in lambda when it is changed outside the lambda? To handle this situation programming languages primarily implement closures as below.

  1. Bind the reference to the local variable being captured inside the lambda. In which case the life of the local variable is not limited to the method in which it is declared, but it out lives until the end of the lambda function.
  2. Or Just limit the access to only final variables. Since final variables are immutable the compiler can simply copy the value of the local variable in to the lambda.

The C# Way. The first of two choices require major changes to both language semantics like extending the life of local variables and also effects the runtime optimizations during synchronization on local variables. It is kind of a double edged sword. Even though in a traditional sense it is the true closure implementation, But since we are extending the life of the local variable it is prone to synchronization bugs in multi thread scenarios. For example in the main method what if filterList is spawning a new thread to do its computation and inadvertently modifying the content of div. Any other lambda which is depending on div get effected. This will eventually lead to hard to find long lived local variable/visibility bugs. Unlike the usual mindset, This will also undermine the assumption of safety guarantees with method’s local variables in multi threaded scenarios.

The Java Way. On the other hand even though the second option is not as powerful as the first one. It does have some benefits over first method, No semantics changes to the language definition of local variables is needed. Since the captured data is immutable(final), Whether its a primitive type or a reference type, the compiler simply copies the content of the captured variable/reference. This will prevent the concurrent modification of local variable state among multiple lambda methods(in fact there is no local variable state captured by the lambda!). In turn disallow local variable state getting jammed inside a closure. This is purely a design choice. To make it less frustrating Java relaxed the capture definition to effectively final variables as well.

So in Java, the below code does not compile.

Even though this might seem like a serious limitation at first. It does encourage other functional programming patterns available in JDK. It is also an importance design choice made by Java language architect for both safety and for implementation cost.

One important point is, none of the above methods will prevent the class of bugs when a final local reference variable is pointing to a MUTABLE object.

Why do I have to create an interface to specify a lambda method prototype?

In the above snippet we saw the test method which takes int and return bool is declared in an interface called Predicate<T>. Why do we have to do this? This is again a clever design choice made by Java language architects to reuse the existing language semantics for specifying the lambdas. In C#, we have can declare a function prototype as if we declare a variable of type int. This is possible because C# type system has function as it first class citizen, So with the help of delegate syntax we can introduce a new function prototype in to the type system. For example the same test method declared with delegate in C#

Since lambdas are introduced very lately in to the Java language a lot of care has to be taken to make sure the feature coexist well with the existing libraries. Also, Introducing a new function type in to the type system is a major language change which impact not only with existing libraries but also with other language features. To play with the existing ecosystem well Java leverages Single Abstract Method(SAM) types for specifying lambdas. These are nothing but Interface which has single method declared in them, Also called as Functional Interfaces, like the run method of Runnable etc. Since the type system already know how to handle method calls on interface this is pretty straight forward.

I just scratched the surface, The rabbit hole is too deep at the JVM level leveraging invokedynamic bytecode for the actual implementation!

 

2 thoughts on “The rationale behind Java Lambda/Closures

Leave a Reply