What limitations does the JVM impose on tail-call optimization

Question

Clojure does not perform tail call optimization on its own: when you have a tail recursive function and you want to have it optimized, you have to use the special form recur. Similarly, if you have two mutually recursive functions, you can optimize them only by using trampoline.

The Scala compiler is able to perform TCO for a recursive function, but not for two mutually recursive functions.

Whenever I have read about these limitations, they were always ascribed to some limitation intrinsic to the JVM model. I know pretty much nothing about compilers, but this puzzles me a bit. Let me take the example from Programming Scala. Here the function

def approximate(guess: Double): Double =
  if (isGoodEnough(guess)) guess
  else approximate(improve(guess))

is translated into

0: aload_0
1: astore_3
2: aload_0
3: dload_1
4: invokevirtual #24; //Method isGoodEnough:(D)Z
7: ifeq
10: dload_1
11: dreturn
12: aload_0
13: dload_1
14: invokevirtual #27; //Method improve:(D)D
17: dstore_1
18: goto 2

So, at the bytecode level, one just needs goto. In this case, in fact, the hard work is done by the compiler.

What facility of the underlying virtual machine would allow the compiler to handle TCO more easily?

As a side note, I would not expect actual machines to be much smarter than the JVM. Still, many languages that compile to native code, such as Haskell, do not seem to have issues with optimizing tail calls (well, Haskell can have sometimes due to laziness, but that is another issue).

score 26 · Accepted Answer · answered Jul 21 '12 at 13:29

26

Now, I don't know much about Clojure and little about Scala, but I'll give it a shot.

First off, we need to differentiate between tail-CALLs and tail-RECURSION. Tail recursion is indeed rather easy to transform into a loop. With tail calls, it's much harder to impossible in the general case. You need to know what is being called, but with polymorphism and/or first-class functions, you rarely know that, so the compiler cannot know how to replace the call. Only at runtime you know the target code and can jump there without allocating another stack frame. For instance, the following fragment has a tail call and does not need any stack space when properly optimized (including TCO), yet it cannot be eliminated when compiling for the JVM:

function forward(obj: Callable<int, int>, arg: int) =
    let arg1 <- arg + 1 in obj.call(arg1)

While it's just a tad inefficient here, there are whole programming styles (such as Continuation Passing Style or CPS) which have tons of tail calls and rarely ever return. Doing that without full TCO means you can only run tiny bits of code before running out of stack space.

What facility of the underlying virtual machine would allow the compiler to handle TCO more easily?

A tail call instruction, such as in the Lua 5.1 VM. Your example does not get much simpler. Mine becomes something like this:

push arg
push 1
add
load obj
tailcall Callable.call
// implicit return; stack frame was recycled

As a sidenote, I would not expect actual machines to be much smarter than the JVM.

You're right, they aren't. In fact, they are less smart and thus don't even know (much) about things like stack frames. That's precisely why one can pull tricks like re-using stack space and jumping to code without pushing a return address.

answered Jul 21 '12 at 13:29

1

I see. I did not realize that being less smart could allow an optimization that would be otherwise forbidden. – Andrea Jul 21 '12 at 14:41
7

+1, tailcall instruction for JVM has already been proposed as early as 2007: Blog on sun.com through the wayback machine. After the Oracle takeover, this link 404's. I'd guess it didn't make it into the JVM 7 priority list. – K.Steff Jul 21 '12 at 23:12
1

A tailcall instruction would only mark a tail call as a tail call. Whether the JVM then actually optimized said tail call is a completely different question. The CLI CIL has a .tail instruction prefix, yet the Microsoft 64-bit CLR for a long time didn't optimize it. OTOH, the IBM J9 JVM does detect tail calls and optimizes them, without needing a special instruction to tell it which calls are tail calls. Annotating tail calls and optimizing tail calls are really orthogonal. (Apart from the fact that statically deducing which call is a tail call may or may not be undecidable. Dunno.) – Jörg W Mittag Feb 04 '15 at 14:57
@JörgWMittag You make a good point, a JVM can easily detect the pattern call something; oreturn. The primary job of a JVM spec update would be not to introduce an explicit tail-call instruction but to mandate that such an instruction is optimized. Such an instruction only makes compiler writers' jobs easier: The JVM author doesn't have to make sure to recognize that instruction sequence before it gets mangled beyond recognition, and the X->bytecode compiler can rest assured that their bytecode is either invalid or actually optimized, never correct-but-stack-overflowing. – Feb 04 '15 at 16:57
@delnan: The sequence call something; return; would only be equivalent to a tail call if the thing that is called never asks for a stack trace; if the method in question is virtual or calls a virtual method, the JVM will have no way of knowing whether it could inquire about the stack. – supercat Feb 04 '15 at 21:03
@supercat D'oh! That neatly dismantles Jörg's objection for the general case. Have the programmer opt into a tail call, accepting that it may alter semantics. – Feb 04 '15 at 21:42
@delnan: If I were designing a framework, there'd be no "may" about it: a framework would be expected to guarantee that an arbitrary number of tail calls may be made consecutively without blowing the stack. To avoid "deceptive" stack traces, I might have the stack show that a method was reached via some form of tail call. – supercat Feb 04 '15 at 21:52
@supercat "May" in that it does not make a difference for a lot of code (that which doesn't inspect the stack and does any of the other things that can detect tail calls). Of course the point of enabling tail calls is having them guaranteed. – Feb 04 '15 at 22:13
@delnan: My point was that stack usage should either grow with recursion depth or be guaranteed not to. Having code which requires 100 bytes of stack on some platforms require 100,000,000 bytes on others is not helpful. – supercat Feb 04 '15 at 22:23
@supercat I'm not sure where we're disagreeing, or whether we're even disagreeing at all, so I'll call it a day here. – Feb 04 '15 at 22:28
@K.Steff That blog post you cited can now be found here: https://blogs.oracle.com/jrose/entry/tail_calls_in_the_vm – Paulo Dec 29 '16 at 19:27
Introduction is just wrong. Tail recursion is a special kind of tail call, and thus would benefit from tail call optimization at run-time, as much as any other tail call. The fact that some recursive functions (whether tail calls or not) can automatically be re-written as loops is a red-herring. – jpaugh Jan 18 '18 at 20:14

score 13 · Answer 2 · edited Feb 06 '15 at 15:03

Clojure could perform automatic optimisation of tail recursion into loops: it is certainly possible to do this on the JVM as Scala proves.

It was actually a design decision not to do this - you have to explicitly use the recur special form if you want this feature. See the mail thread Re: Why no tail call optimization on the Clojure google group.

On the current JVM, the only thing that is impossible to do is tail call optimisation between different functions (mutual recursion). This isn't particularly complex to implement (other languages like Scheme have had this feature from the beginning) but it would require changes to the JVM spec. For example, you'd have to change the rules about preserving the complete function call stack.

A future iteration of the JVM is likely to get this capability, though probably as an option so that backwards compatible behaviour for old code is maintained. Say, Features Preview at Geeknizer lists this for Java 9:

Adding tail calls and continuations...

^{Of course, future roadmaps are always subject to change.}

As it turns out, it's not that big a deal anyway. In over 2 years of coding Clojure, I have never run into a situation where the lack of TCO was an issue. The main reasons for this are:

You can already get fast tail recursion for 99% of common cases with recur or a loop. The mutual tail recursion case is pretty rare in normal code
Even when you need mutual recursion, often the recursion depth is shallow enough that you can just do it on the stack anyway without TCO. TCO is just an "optimisation" after all....
In the very rare cases where you do need some form of non-stack-consuming mutual recursion there are plenty of other alternative that can achieve the same goal: lazy sequences, trampolines etc.

"future iteration" - Features Preview at Geeknizer says for Java 9: Adding tail calls and continuations - is that it? — gnat, Jul 22 '12 at 16:52
Yep - that's it. Of course, future roadmaps are always subject to change.... — mikera, Jul 22 '12 at 17:02

score 5 · Answer 3 · answered Jul 21 '12 at 22:24

As a sidenote, I would not expect actual machines to be much smarter than the JVM.

It's not about being smarter, but about being different. Until recently, the JVM was designed and optimized exclusively for a single language (Java, obviously), which has very strict memory and calling models.

Not only there wasn't any goto or pointers, there wasn't even any way to call a 'bare' function (one that wasn't a method defined within a class).

Conceptually, when targeting JVM, a compiler writer has to ask "how can I express this concept in Java terms?". And obviously, there's no way to express TCO in Java.

Note that these aren't seen as failures of JVM, because they aren't needed for Java. As soon as Java needed some feature like these, it is added to JVM.

It's only recently that the Java authorities started taking seriously JVM as a platform for non-Java languages, so it has already gained some support for features that have no Java equivalent. The best known is dynamic typing, which is already in JVM but not in Java.

score 3 · Answer 4 · answered Jul 23 '12 at 13:07

So, at the bytecode level, one just needs goto. In this case, in fact, the hard work is done by the compiler.

Have you noticed that the method address starts with 0? That all methods ofsets start with 0? JVM doesn't allow one to jump outside a method.

I have no idea what would happen with a branch with offset outside the method was loaded by java -- maybe it would be caught by the bytecode verifier, maybe it would generate an exception, and maybe it would actually jump outside the method.

The problem, of course, is that you can't really guarantee where other methods of the same class will be, much less methods of other classes. I doubt JVM makes any guarantees about where it will load the methods, though I'd be happy to be corrected.

Good point. But to tail-call optimize a self-recursive function, all you need is a GOTO within the same method. So this limitation doesn't rule out TCO of self-recursive methods. — Alex D, Sep 23 '12 at 19:38

What limitations does the JVM impose on tail-call optimization

4 Answers4

Linked