-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The generated parse table consumes too much memory #40
Comments
I've been working on a fix for (1)—that is, de-duplicating all of the different combinations of prefixes from the opcode table—for a while. I've pushed my current WIP branch here in case you'd like to follow along. That branch is a bit messy at the moment, since I'm adding a separate disassembler, named First, here is the general approach that I'm taking in this branch:
This approach was enough to get nearly the entire
Let's talk about why this happens. The prefix parsing approach that I took in step (2) makes a key assumption: bytes that can be used as prefixes will never be used as the first byte in an instruction's opcodes. After all, the only way we know how to stop disassembling prefixes is to encounter a byte that isn't in the set of known prefix bytes, which we interpret as an instruction's first opcode byte. Unfortunately, this assumption turns out not to be true. In the case of the Lines 5640 to 5645 in 7109bdc
But the flexdis86/src/Flexdis86/Disassembler.hs Line 315 in 7109bdc
As a result, we mistakenly parse VEX prefixes (which the Lines 4256 to 4263 in 7109bdc
Lines 4274 to 4281 in 7109bdc
What's more, these prefixes are expected to be followed by some number of additional bytes, so not only would need to "backtrack" over parsing the VEX prefix, we would also need to backtrack over parsing the additional bytes that follow the prefix. Moreover, since In short, I think we need some kind of way to perform this backtracking. I have some ideas for how to do this, but before I set out on implementing this idea, I wanted to do a quick sanity check with @travitch to make sure I'm on the right path. Can you see a simpler way to solve these problems? |
I wonder if, instead of "backtracking", we could just remember the last byte in the prefixes we parsed, and incorporate those "last bytes" into the validity/opcode checks. I also think we might need to be a bit liberal in accepting prefixes that are invalid if the parse is otherwise unambiguous (e.g., see https://repzret.org/p/repzret/) |
I considered this, but one complication with this idea is that the opcode table encodes the invariant that every instruction is reachable by a path containing at least one flexdis86/src/Flexdis86/Disassembler.hs Lines 994 to 996 in 7109bdc
(In my branch, there is a corresponding error case for For instructions like
Huh, that's an interesting read. I don't think this particular example poses an issue for my branch, as it can successfully disassemble |
Regarding |
This seems likely, as this prefix was added for... reasons in b625205. Nevertheless, I'm not too bothered by this, as this is a hack that would be needed in both the current and new designs. (If it were a hack that was only needed in one particular design, that would be a bit more eyebrow-raising.) |
This might be effectively backtracking, but there could be a |
That might work, although things would get complicated for instructions whose operands are parsed eagerly as prefixes due to VEX. I was thinking of instead having a newtype like this: newtype BacktrackingByteReader m a = BacktrackingByteReader (StateT [Word8] m a) And giving it a |
Just to record it, we talked about having a separate set of parse tables for VEX instructions that would be consulted iff the VEX prefix is in the set of parsed prefixes. That would avoid a need for backtracking in those cases. It would be nice to know if there are cases besides Operationally, that would mean parsing |
As an experiment, I pushed a branch here that encodes VEX prefix bytes into the opcode table alongside the instruction opcodes to avoid any conflicts with instructions like
For comparison, here is how large it is on the
And here is how large it is after removing VEX prefixes:
|
That being said, one thing about Speaking of which, are there other instructions whose opcodes conflict with bytes used for prefixes? Thankfully, Segment prefixesNo conflicts in any of "Simple" prefixes
REX prefixes (64-bit mode only)The The <instruction>
<mnemonic>vpclmulqdq</mnemonic>
<class>avx</class>
<def>
<opc>/vex=NDS.128.66.0F3A.WIG 44</opc>
<opr>Vdq Hdq Wdq Ib</opr>
</def>
</instruction> Several of these issues would be avoided by encoding VEX prefixes into the opcode table. Moreover, the |
|
Previously, the opcode lookup table would encode every possible permutation of allowable prefixes for each instruction as a separate path. This is expensive in both space and time, as observed in #40. The new approach taken in this patch, as described in `Note [x86_64 disassembly]` in `Flexdis86.Disassembler`, is to only encode the VEX prefixe and opcode bytes in the lookup table, leaving out all other forms of prefix bytes entirely. Instead, disassembly will start by eagerly parsing as many prefix bytes as possible, proceeding to parse opcode bytes after the first non-prefix byte is encountered. After identifying the possible instructions from the opcode, we will then narrow down exactly which instruction it is by validating them against the set of parsed prefixes. As noted in `Note [x86_64 disassembly]`, we had to add some special cases for `nop`-like instructions—namely, `endbr32`, endbr64`, `pause`, and `xchg`—to avoid some prefix byte–related ambiguity. The new handling for `xchg` is more accurate than it was before, so this patch fixes #42 as a side effect. This patch also addresses part (1) of #40 in that it should reduce the amount of memory usage that the lookup table takes, although there is potentially more work to be done (see part (2) of #40).
Previously, the opcode lookup table would encode every possible permutation of allowable prefixes for each instruction as a separate path. This is expensive in both space and time, as observed in #40. The new approach taken in this patch, as described in `Note [x86_64 disassembly]` in `Flexdis86.Disassembler`, is to only encode the VEX prefixe and opcode bytes in the lookup table, leaving out all other forms of prefix bytes entirely. Instead, disassembly will start by eagerly parsing as many prefix bytes as possible, proceeding to parse opcode bytes after the first non-prefix byte is encountered. After identifying the possible instructions from the opcode, we will then narrow down exactly which instruction it is by validating them against the set of parsed prefixes. As noted in `Note [x86_64 disassembly]`, we had to add some special cases for `nop`-like instructions—namely, `endbr32`, endbr64`, `pause`, and `xchg`—to avoid some prefix byte–related ambiguity. The new handling for `xchg` is more accurate than it was before, so this patch fixes #42 as a side effect. This patch also addresses part (1) of #40 in that it should reduce the amount of memory usage that the lookup table takes, although there is potentially more work to be done (see part (2) of #40).
Previously, the opcode lookup table would encode every possible permutation of allowable prefixes for each instruction as a separate path. This is expensive in both space and time, as observed in #40. The new approach taken in this patch, as described in `Note [x86_64 disassembly]` in `Flexdis86.Disassembler`, is to only encode the VEX prefixe and opcode bytes in the lookup table, leaving out all other forms of prefix bytes entirely. Instead, disassembly will start by eagerly parsing as many prefix bytes as possible, proceeding to parse opcode bytes after the first non-prefix byte is encountered. After identifying the possible instructions from the opcode, we will then narrow down exactly which instruction it is by validating them against the set of parsed prefixes. As noted in `Note [x86_64 disassembly]`, we had to add some special cases for `nop`-like instructions—namely, `endbr32`, endbr64`, `pause`, and `xchg`—to avoid some prefix byte–related ambiguity. The new handling for `xchg` is more accurate than it was before, so this patch fixes #42 as a side effect. This patch also addresses part (1) of #40 in that it should reduce the amount of memory usage that the lookup table takes, although there is potentially more work to be done (see part (2) of #40).
At long last, #43 takes care of part (1). Part (2) may also be worth doing, but I imagine (1) alone will be enough to knock out most of the egregious memory usage, especially in light of #43 (comment). |
The parse tables occupy about 400MB in memory after they are constructed, as can be seen in this profile collected by @RyanGlScott: verify-RSA.saw.pdf. There are two factors to this memory consumption:
flexdis86/src/Flexdis86/Disassembler.hs
Lines 415 to 420 in c19b55e
flexdis86/src/Flexdis86/Disassembler.hs
Lines 171 to 180 in c19b55e
Addressing the former is tricky. One could use a simple DFA to parse prefix bytes separately to save an enormous amount of space. However, not all prefixes are valid for all instructions; those restrictions are currently properly encoded in the fully elaborated tables. To separate out prefix parsing, it would be necessary to add a post-parsing check to see if the parse was valid or not.
Addressing the latter might be less tricky, as we could change the representation of the tables. Another disassembler uses a mostly unboxed structure: https://github.com/travitch/dismantle/blob/48433e7ccb02924b2f4695c8c9f09fb9cfccdfc4/dismantle-tablegen/src/Dismantle/Tablegen/LinearizedTrie.hs#L34. The x86 case is a bit trickier as the parser has more states than the parsers generated by dismantle. However, we might be able to take inspiration from the more compact parse table representation and adapt it for flexdis.
The text was updated successfully, but these errors were encountered: