Machine learning and compiler optimization
This page was translated into Latvian. Thanks to Arija Liepkalnietis!
Inlining is a compiler optimization that replaces a call to a function with the body of that function. Inlining doesn't provide much benefit by itself, but it enables a lot of other important optimizations. For example, if you have a function call inside a loop, the compiler might be able to lift loop-invariant expressions in the function's body out of the loop. You can save a lot of time by moving instructions out of a loop.
Unfortunately, inlining is kind of tricky. It is safe to inline any function call if you know which function is being called (you can't usually inline virtual and dynamic calls). You also need some protection against recursion. However, inlining some calls can make performance worse. For example, inlining a large function that only gets called once will blow up code size and will make compilation more expensive.
When you inline a call, there's no way to guarantee that it will improve performance. You have to guess. Compilers usually have a heuristic function that predicts whether or not inlining will be beneficial at each call site. The quality of the heuristic function has a big impact on the performance of optimized code.
Experimenting with inlining in V8
A couple years ago, I experimented with replacing V8's inlining heuristic function. At the time, V8 had a very basic heuristic. While surprisingly effective, it would sometimes make amusingly bad decisions.
I started by making a build of V8 that recorded a bunch of information about every call site the inlining heuristic was applied to. I ran V8 on the Octane benchmark and dumped the results into a spreadsheet. Here are some of the metrics I recorded:
- Source size and unoptimized code size of caller and callee (under the theory that inlining large functions makes optimization expensive and can blow up code size).
- The number of arguments passed to the callee and the number of arguments that were compile-time constants (more opportunity to save call overhead and find loop invariant expressions).
- The number of loops the call site was nested inside (inlining functions with loop invariant code has significant benefit).
- The number of conditional statements the call site was nested inside (there is less benefit to inlining a function that is not always called).
- The number of call sites for the callee throughout the whole program (this information can be recorded by inline caches).
- Optimization and deoptimization counts for both caller and callee (inlining a callee that has deoptimized could make the caller deoptimize later).
- Amount of type information recorded inside the callee (more type information allows more aggressive optimization).
- The inlining decision that V8 actually made.
Once I had this data, I made another build of V8 that would read in another csv file containing a list of call sites and inlining decisions. V8 would make its decisions from this file instead of using the normal heuristic function.
With these builds in hand, I started experimenting with a new inlining algorithm. I looked through the recorded metrics for inlining decisions that had been made "incorrectly". I then manually changed those decisions and fed them back into V8 to see if performance improved. Once I was satisfied with the decisions in one benchmark, I tried to come up with a simple function that made the same decisions as I did.
Readers with any experience in machine learning are probably rolling their eyes right now. I was basically trying to do linear regression by intuition. I ended up overfitting pretty badly to some of the earlier tests in Octane, and my heuristic performed badly on the later tests.
Unfortunately, I had to switch to some more urgent work before I got a heuristic that I was happy with. A few months later, I changed jobs and stopped working on V8, so I never came back to the inlining problem.
An application for machine learning?
The company I work for now is encouraging all its engineers to learn machine learning. I've taken a couple classes that have been offered internally, so I'm starting to have some basic familiarity with it.
This made me think of the inlining heuristic again. It seems like the kind of problem that machine learning was born to solve. You have a question that can be answered by assigning a simple label (inline or not). You have a bunch of values associated with each example of that question (code size, parameter count, type information, etc.). It's unclear which values are useful though. You need a somewhat automatic way to generate a heuristic function that makes decisions based on those values. It's okay to spend a lot of time building and tuning the heuristic, but it needs to be fast at run-time.
A quick search shows there has already been quite a bit of academic work on applying machine learning to compiler optimization, including inlining. I'm not aware of any widely used compilers that do this. There are probably some out there, but it's not common.
The exciting thing for me is that inlining isn't the only optimization that would benefit from this. Machine learning could be used to create and tune any heuristic used in a compiler or virtual machine. JIT VMs like V8 rely heavily on heuristics for lots of things because a fast decision is frequently better than an accurate decision, and machine learning can help developers build heuristics that are both fast and accurate.