Tail Call Optimization (TCO), dependency, broken debug builds in C and C++ — and gcc 4.8

TCO: Reducing the algorithmic complexity of recursion.
Debug build: Add overhead to a program to trace errors.
Debug without TCO: Obliterate any possibility of fixing recursion bugs.

“Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.”°

UPDATE: GCC 4.8 gives us -Og -foptimize-sibling-calls which generates nice-backtraces, and I had a few quite embarrassing errors in my C - thanks to AKF for the catch!

1 Intro

Tail Call Optimization (TCO) makes this

def foo(n):
    print(n)
    return foo(n+1)
foo(1)

behave like this

def foo(n):
    print(n)
    return n+1
n = 1 while True: n = foo(n)

Table of Contents

I recently told a colleague how neat tail call optimization in scheme is (along with macros, but that is a topic for another day…).

Then I decided to actually test it (being mainly not a schemer but a pythonista - though very impressed by the possibilities of scheme).

So I implemented a very simple recursive function which I could watch to check the Tail Call behaviour. I tested scheme (via guile), python (obviously) and C++ (which proved to provide a surprise).

2 The tests

2.1 Scheme

(define (foo n)
  (display n)
  (newline)
  (foo (1+ n)))

(foo 1)

2.2 Python

def foo(n):
    print n
    return foo(n+1)

foo(1)

2.3 C++

The C++ code needed a bit more work (thanks to AKF for making it less ugly/horrible!):

#include <stdio.h>

int recurse(int n)
{
  printf("%i\n", n);
  return recurse(n+1);
}

int main()
{
  return recurse(1);
}

Additionally to the code I added 4 different ways to build the code: Standard optimization (-O2), Debug (-g), Optimized Debug (-g -O2), and only slightly optimized (-O1).

all : C2 Cg Cg2 C1

# optimized
C2 : tailcallc.c
    g++ -O2 tailcallc.c -o C2

# debug build
Cg : tailcallc.c
    g++ -g tailcallc.c -o Cg

# optimized debug build
Cg2 : tailcallc.c
    g++ -g -O2 tailcallc.c -o Cg2

# only slightly optimized
C1 : tailcallc.c
    g++ -O1 tailcallc.c -o C1

3 The results

So now, let’s actually check the results. Since I’m interested in tail call optimization, I check the memory consumption of each run. If we have proper tail call optimization, the required memory will stay the same over time, if not, the function stack will get bigger and bigger till the program crashes.

3.1 Scheme

Scheme gives the obvious result. It starts counting numbers and keeps doing so. After 10 seconds it’s at 1.6 million, consuming 1.7 MiB of memory - and never changing the memory consumption.

3.2 Python

Python is no surprise either: it counts to 999 and then dies with the following traceback:

Traceback (most recent call last):
 File "tailcallpython.py", line 6, in <module>
   foo(1)
 File "tailcallpython.py", line 4, in foo
   return foo(n+1)
… repeat about 997 times …
RuntimeError: maximum recursion depth exceeded

Python has an arbitrary limit on recursion which keeps people from using tail calls in algorithms.

3.3 C/C++

C/C++ is a bit trickier.

First let’s see the results for the optimized run:

3.3.1 Optimized

g++ -O2 C.c -o C2
./C2

Interestingly that runs just like the scheme one: After 10s it’s at 800,000 and consumes just 144KiB of memory. And that memory consumption stays stable.

3.3.2 Debug

So, cool! C/C++ has tail call optimization. Let’s write much recursive tail call using code!

Or so I thought. Then I did the debug run.

g++ -g C.c -o Cg
./Cg 

It starts counting just like the optimized version. Then, after about 5 seconds and counting to about 260,000, it dies with a segmentation fault.

And here’s a capture of its memory consumption while it was still running (thanks to KDEs process monitor):

Private

7228 KB   [stack]
56 KB [heap]
40 KB /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6.0.17
24 KB /lib64/libc-2.15.so
12 KB /home/arne/.emacs.d/private/journal/Cg

Shared

352 KB    /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6.0.17
252 KB    /lib64/libc-2.15.so
108 KB    /lib64/ld-2.15.so
60 KB /lib64/libm-2.15.so
16 KB /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libgcc_s.so.1

That’s 7 MiB after less than 5 seconds runtime - all of it in the stack, since that has to remember all the recursive function calls when there is no tail call optimization.

So we now have a program which runs just fine when optimized but dies almost instantly when run in debug mode.

But at least we have nice gdb traces for the start:
recurse (n=43) at C.c:5
5         printf("%i\n", n);
43
6         return recurse(n+1);

3.4 Optimized debug build

So, is all lost? Luckily not: We can actually specify optimization with debugging information.

g++ -g -O2 C.c -o Cg2
./Cg2

When doing so, the optimized debug build chugs along just like the optimized build without debugging information. At least that’s true for GCC.

But our debug trace now looks like this:
5         printf("%i\n", n);
printf (__fmt=0x40069c "%i\n") at /usr/include/bits/stdio2.h:105
105       return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
5
6         return recurse(n+1);
That’s not so nice, but at least we can debug with tail call optimization. We can also improve on this (thanks to AKF for that hint!): We just need to enable tail call optimization separately:
g++ -g -O1 -foptimize-sibling-calls C.c -o Cgtco
./Cg 
But this still gives ugly backtraces (if I leave out -O1, it does not do TCO). So let’s turn to GCC 4.8 and use -Og.
g++ -g -Og -foptimize-sibling-calls C.c -o Cgtco
./Cgtco 
And we have nice backtraces!
recurse (n=n@entry=1) at C.c:4
4       {
5         printf("%i\n", n);
1
6         return recurse(n+1);
5         printf("%i\n", n);
2
6         return recurse(n+1);

3.5 Optimized for size

Can we invert the question? Is all well, now?

Actually not…

If we activate minor optimization, we get the same unoptimized behaviour again.

g++ -O1 C.c -o C1
./C1

It counts to about 260,000 and then dies from a stack overflow. And that is pretty bad™, because it means that a programmer cannot trust his code to work when he does not know all the optimization strategies which will be used with his code.

And he has no way to define in his code, that it requires TCO to work.

4 Summary

Tail Call Optimization (TCO) turns an operation with a memory requirement of O(N)1 into one with a memory requirement of O(1).

It is a nice tool to reduce the complexity of code, but it is only safe in languages which explicitely require tail call optimization - like Scheme.

And from this we can find a conclusion for compilers:

C/C++ compilers should always use tail call optimization, including debug builds, because otherwise C/C++ programmers should never use that feature, because it can make it impossible to use certain optimization settings in any code which includes their code.

And as a finishing note, I’d like to quote (very loosely) what my colleague told me from some of his real-life debugging experience:

“We run our project on an AIX ibm-supercomputer. We had spotted a problem in optimized runs, so we activated the debugger to trace the bug. But when we activated debug flags, a host of new problems appeared which were not present in optimized runs. We tried to isolate the problems, but they only appeared if we ran the full project. When we told the IBM coders about that, they asked us to provide a simple testcase… The problems likely happened due to some crazy optimizations - in our code or in the compiler.”

So the problem of undebuggable code due to a dependency of the program on optimization changes is not limited to tail call optimization. But TCO is a really nice way to show it :)

Let’s use that to make the statement above more general:

C/C++ compilers should always do those kinds of optimizations which lead to changes in the algorithmic cost of programs.

Or from a pessimistic side:

You should only rely on language features, which are also available in debug mode - and you should never develop your program with optimization turned on.

And by that measure, C/C++ does not have Tail Call Optimization - at least until all mainstream compilers include TCO in their default options. Which is a pretty bleak result after the excitement I felt when I realized that optimizations can actually give C/C++ code the behavior of Tail Call Optimization.

Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.Never develop with optimizations which are not required by the language standard.

Note, though, that GCC 4.8 added the -Og option, which improves the debugging a lot (Phoronix wrote about plans for that last september). It still does not include -foptimize-sibling-calls in -Og, but that might be only a matter of time… I hope it is.

Footnotes:

1 : O(1) and O(N) describe the algorithmic cost of an algorithm. If it is O(N), then the cost rises linearly with the size of the problem (N is the size, for example printing 20,000 consecutive numbers). If it is O(1), the cost is stable regardless of the size of the problem.

> **TCO**: Reducing the algorithmic complexity of recursion. > **Debug build**: Add overhead to a program to trace errors. > **Debug without TCO**: Obliterate any possibility of fixing recursion bugs.> *“Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.”[°](/light/english/free-software/tco-debug#golden-rule)***UPDATE**: [GCC](http://gnu.org/s/gcc) 4.8 gives us `-Og -foptimize-sibling-calls` which generates [nice-backtraces](/light/english/free-software/tco-debug#nice-backtraces), and I had a few quite embarrassing errors in my C - thanks to AKF for the catch!1 IntroTail Call Optimization (TCO) makes thisdef foo(n): print(n) return foo(n+1)foo(1)behave like thisdef foo(n): print(n) return n+1n = 1while True: n = foo(n)

Kommentare

Darstellungsoptionen

Wählen Sie hier Ihre bevorzugte Anzeigeart für Kommentare und klicken Sie auf „Einstellungen speichern“ um die Änderungen zu übernehmen.

C and C++

Please don't write „C(++)“. C and C++ are very different languages when done right. However your code is really a weird mixture...

To make it plain C just #include <stdio.h> instead of cstdio (unistd.h is not needed as far as I can see) and return from main with return 0;. Then you can compile it with gcc.

The gcc compiler switch for optimizing recursion is actually -foptimize-sibling-calls which is already included with -O2 and higher.

The switch -O0 is not optimizing for size, but to explicitly switch optimizing off. The switch for optimizing for size is -Os you should additionally use the switch -s to strip the binary right away.

identi.ca?

Was ist eigentlich mit deinem identi.ca Account passiert?

Argl, thanks a lot!

Argl, thanks a lot!

I just tried

gcc -g -foptimize-sibling-calls C.c -o Cgtco

but it broke. I found out, that I really need to pass -O1, which makes debugging painful.

Except if you have GCC 4.8: Then you can use -Og -foptimize-sibling-calls

O0 vs. Os

damn… that’s actually an embarrassing error… thanks for catching it!

C(++)

I switched this to C/C++. Is that better - or is it still too far from the truth?