we can build traces by stitching together frequently executed sequences of basic blocks. We can even build traces across indirect branches, by adding a check to see whether the indirect branch target stays on the trace or not. These are similar to the traces used in the Dynamo system. The superior code layout of traces gives us another performance boost, bringing our system to about a 10% hit versus native, which is pretty good.
This is not a trivial thing, bringing performace from 300x down to 10%. Although the ideas of caching and linking are not new, one of our contributions is in identifying how to solve the specific architectural challenges of bringing these ideas to the CISC architecture of x86. For example, a naive implementation of preserving the x86 condition codes is 20 to 70% slower than the scheme we've managed to come up with.
|Copyright © 2004 Derek Bruening|