Tutorial: Function Interposition in Linux

Published on 2009-06-30
Tagged: debugging linux

View All Posts

Have you ever wanted to change the way library code works without replacing the whole library or recompiling programs that use it? For instance, what if you wanted to create wrappers for malloc and free that log allocations to find memory leaks? You could rewrite programs that use malloc/free to use your modified versions instead, or you could modify libc; neither of these are very attractive options.

This tutorial will show you how to replace calls to functions in dynamic libraries with calls to your own wrappers. This is called function interposition, and it can be done in any program without recompiling the program or the library.

First, some background. When a program that uses dynamic libraries is compiled, a list of undefined symbols is included in the binary, along with a list of libraries the program is linked with. There is no correspondence between the symbols and the libraries; the two lists just tell the loader which libraries to load and which symbols need to be resolved. At runtime, each symbol is resolved using the first library that provides it. This means that if we can get a library containing our wrapper functions to load before other libraries, the undefined symbols in the program will be resolved to our wrappers instead of the real functions.

How do we get a program to load libraries it wasn't linked with though? Fortunately, this is the easy part. The environment variable LD_PRELOAD gives the loader a list of libraries to load before anything else. Let's suppose we have a shared library named libjmalloc.so that includes replacements for malloc and free. We want to use this with the program foo, so we run it like this:

LD_PRELOAD=/home/jay/libjmalloc.so ./foo

The loader will act as if foo were linked to libjmalloc.so. We give it an absolute path to the library so it doesn't search the normal places like /usr/lib. If you want to preload multiple libraries, separate their names with colons.

So far so good. But what if we want to use the original version of malloc in our implementation? In this example, we just want to print a message whenever malloc or free is called. However, we can't directly call the libc version of malloc from our wrapper, since the compiler will interpret it as a recursive call to the wrapper itself. The solution is to dynamically load a pointer to malloc using dlsym:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <dlfcn.h>

void* malloc(size_t size)
{
    static void* (*real_malloc)(size_t) = NULL;
    if (!real_malloc)
        real_malloc = dlsym(RTLD_NEXT, "malloc");

    void *p = real_malloc(size);
    fprintf(stderr, "malloc(%d) = %p\n", size, p);
    return p;
}

We compile this with:

gcc -shared -ldl -fPIC jmalloc.c -o libjmalloc.so

dlfcn.h declares functions which are used to dynamically load symbols that aren't linked in the program. One main use of these functions is for loading plug-ins. In this case, we can think of libc as a plug-in that supplies the function malloc (which we assign to real_malloc). We load this symbol with the function dlsym, which takes two arguments: a library handle, and a symbol name. Normally we would use a valid library handle, returned by dlopen; however, since any program this library is used with will already be linked with libc (or some library which supplies malloc), we pass RTLD_NEXT. This tells the dynamic linker "resolve this symbol in the next library which supplies it (not the one that is calling dlsym)". RTLD_NEXT is GNU specific, so make sure to define _GNU_SOURCE before including dlfcn.h.

At this point you are ready to replace most common library functions. However, there are a few functions which cannot be interposed using this method. For instance, what if you wanted to create a wrapper for dlsym itself? You also won't be able to wrap any library functions dlsym calls internally.

If you really need to wrap these functions, the GNU linker provides a useful option, --wrap. If you give it a symbol, say dlsym, it will replace all calls to dlsym with __wrap_dlsym and calls to __real_dlsym with the original dlsym in the program being linked. The disadvantage of this approach is that you need to re-link any program you want to use your wrappers with. The above example could be rewritten like this:

#include <stdint.h>
#include <stdio.h>

void* __real_malloc(size_t);
void* __wrap_malloc(size_t size)
{
    void *p = __real_malloc(size);
    fprintf(stderr, "malloc(%d) = %p\n");
    return p;
}

A few caveats before we close. First, LD_PRELOAD is ignored for programs with the SUID permission bits set for security reasons. Since function interposition lets you make a program do almost anything you want it to, Linux prevents you from modifying the behavior of a program running on behalf of another user or group. Second, you cannot interpose internal library function calls, since these are resolved before runtime. For instance, if some function in libc calls malloc, it will never call a wrapper function from a different library.

Aside from these limitations, function interposition is a very powerful technique that is useful for monitoring programs or modifying their behavior. Happy interposing!