Tutorial: Function Interposition in Linux
Have you ever wanted to change the way library code works without replacing the whole library or recompiling programs that use it? For instance, what if you wanted to create wrappers for malloc
and free
that log allocations to find memory leaks? You could rewrite programs that use malloc
/free
to use your modified versions instead, or you could modify libc; neither of these are very attractive options.
This tutorial will show you how to replace calls to functions in dynamic libraries with calls to your own wrappers. This is called function interposition, and it can be done in any program without recompiling the program or the library.
First, some background. When a program that uses dynamic libraries is compiled, a list of undefined symbols is included in the binary, along with a list of libraries the program is linked with. There is no correspondence between the symbols and the libraries; the two lists just tell the loader which libraries to load and which symbols need to be resolved. At runtime, each symbol is resolved using the first library that provides it. This means that if we can get a library containing our wrapper functions to load before other libraries, the undefined symbols in the program will be resolved to our wrappers instead of the real functions.
How do we get a program to load libraries it wasn't linked with though? Fortunately, this is the easy part. The environment variable LD_PRELOAD
gives the loader a list of libraries to load before anything else. Let's suppose we have a shared library named libjmalloc.so
that includes replacements for malloc
and free
. We want to use this with the program foo
, so we run it like this:
LD_PRELOAD=/home/jay/libjmalloc.so ./foo
The loader will act as if foo
were linked to libjmalloc.so
. We give it an absolute path to the library so it doesn't search the normal places like /usr/lib
. If you want to preload multiple libraries, separate their names with colons.
So far so good. But what if we want to use the original version of malloc
in our implementation? In this example, we just want to print a message whenever malloc
or free
is called. However, we can't directly call the libc
version of malloc
from our wrapper, since the compiler will interpret it as a recursive call to the wrapper itself. The solution is to dynamically load a pointer to malloc
using dlsym
:
#define _GNU_SOURCE #include <stdio.h> #include <stdint.h> #include <dlfcn.h> void* malloc(size_t size) { static void* (*real_malloc)(size_t) = NULL; if (!real_malloc) real_malloc = dlsym(RTLD_NEXT, "malloc"); void *p = real_malloc(size); fprintf(stderr, "malloc(%d) = %p\n", size, p); return p; }
We compile this with:
gcc -shared -ldl -fPIC jmalloc.c -o libjmalloc.so
dlfcn.h
declares functions which are used to dynamically load symbols that aren't linked in the program. One main use of these functions is for loading plug-ins. In this case, we can think of libc
as a plug-in that supplies the function malloc
(which we assign to real_malloc
). We load this symbol with the function dlsym
, which takes two arguments: a library handle, and a symbol name. Normally we would use a valid library handle, returned by dlopen
; however, since any program this library is used with will already be linked with libc
(or some library which supplies malloc
), we pass RTLD_NEXT
. This tells the dynamic linker "resolve this symbol in the next library which supplies it (not the one that is calling dlsym
)". RTLD_NEXT
is GNU specific, so make sure to define _GNU_SOURCE
before including dlfcn.h
.
At this point you are ready to replace most common library functions. However, there are a few functions which cannot be interposed using this method. For instance, what if you wanted to create a wrapper for dlsym
itself? You also won't be able to wrap any library functions dlsym
calls internally.
If you really need to wrap these functions, the GNU linker provides a useful option, --wrap
. If you give it a symbol, say dlsym
, it will replace all calls to dlsym
with __wrap_dlsym
and calls to __real_dlsym
with the original dlsym
in the program being linked. The disadvantage of this approach is that you need to re-link any program you want to use your wrappers with. The above example could be rewritten like this:
#include <stdint.h> #include <stdio.h> void* __real_malloc(size_t); void* __wrap_malloc(size_t size) { void *p = __real_malloc(size); fprintf(stderr, "malloc(%d) = %p\n"); return p; }
A few caveats before we close. First, LD_PRELOAD
is ignored for programs with the SUID permission bits set for security reasons. Since function interposition lets you make a program do almost anything you want it to, Linux prevents you from modifying the behavior of a program running on behalf of another user or group. Second, you cannot interpose internal library function calls, since these are resolved before runtime. For instance, if some function in libc
calls malloc
, it will never call a wrapper function from a different library.
Aside from these limitations, function interposition is a very powerful technique that is useful for monitoring programs or modifying their behavior. Happy interposing!