-
-
Notifications
You must be signed in to change notification settings - Fork 17
oplines: precise error locations, less useless COPs #333
Comments
I'd like to get a little more clarity here. There are two separable ideas. First, you don' t need an opcode store in the op tree, but can get by with just a source-to-tree location mapping (inappropriately called a "line-number table"). That could be stored as a separate structure outside the op tree. With this, the tree is smaller (by 8% you say?) and execution is a little bit faster. It is only when there is an error, The second idea that seems to be mentioned above is somthing recenlty observed: with a bit of work, you can reconstruct equivalent source code from just the opcode tree. For Perl specifically, it is eerily the same most of the time. And by using the tree you can often get a more precise idea of what's up for execution based on the position in op tree. These two things though are independent, although the two can be blended and probably should be. With respect to decompiling in Perl, I'll be giving a talk on this in the upcoming Glasgow YAPC . In preparation I just and spent some time looking at both CPerl's and Perl's B::Deparse module. It is:
To that last point, that's why I wrote B::DeparseTree. In preparation for the talk I looked at adapting it to handle CPerl 5.26.2, or getting greater coverage than what I had previously done for vanilla Perl 5.26.2. It is going to be a lot of work. So I better get back to it. |
Hi Rocky, The 8% number comes from my implementation to remove unneeded COP's and store the opline in each op. Then you only need COP's at the beginning of each new filescope, as jump targets and when introducing and ending new scope. Currently each new line and semicolon needs a new COP, which is slow. optree introspection and deparsing is already a supported feature. With optimizations going on, and perl and cperl are adding more and more, that exact source will not be deconstructable, but an equivalent source, which is enough. For error reporting the missing bits are in the lexer and parser. There are still wrong lines being reported. Even if not deparsed. This should be pretty easy to fix, but in the last 20 years nobody had time to do it. |
Sure, that's fine. However I'd ask one other thing. It would be nice to have (at least on option demand) a transaction log of what optimizations were performed on the stream in reverse delta standpoint. ("reverse delta" is the way is GNU Emacs AntiNews is described, or the way that the old revision-control system RCS first introduced delta over the older SCCS and is currently how all version-control deltas work.) Here is a fictious program and log my ($x, $y = (2, 3) # 1
my @z; # 2
if ($x + $y eq 5) { # 3
$z [0] = 2; # 4
} else { # 5
$z[0] = 3 # 6
} # 7
The line numbers are more to get the idea, than what might really be used which would be an instruction offset or some way to mark the instruction as its position moves around. The way to think of this is as the compiler writing to a transaction journal as it proceeds. Also, or alternatively (on a switch) some summary information could be stored in each instruction. (As a side table is okay). For example on. Let's say there is no "if optimizaion" on line 3 but we've done the constant folding to 5, and the instruction to load 5 is:
There could be a flag bit that indicates this instruction is the result of an constant folded optimization. Other flags might be a flag on an instruction due to getting moved, flag bits on instructions that were added as a result of loop unrolling, and so on. No flag bits for an instruction indicates it was part of the original source stream. A program such as a debugger, or a poor programmer would make use of this to descramble the resulting equivalent program, and warn which parts might be funky due to optimization and which parts aren't. I'd be happy to split the above off and log it as a separate feature request. Obviously this is a low-priority thing. Like this issue and idea, having this is jotted down is there to get the idea out there in the off chance that someone is interested and has the time.
In my opinion on working on B::Deparse.pm is its hugeness and monolithic-ness, its lack of high-level documentation on how it works, and monolitic test cases. 6.5K lines in a single file, really? Yeah that is slim compared to perl5db.pl which is 10.5K, Perl is not going to win friends with code that's like this. It's not the 6.k or 10.5K, the problem demands that. It's the fact that it is one file. By comparison Devel::Trepan is 16.6K of just code (or 25K if you count comments and blank lines). However that's spread over 178 files which averages about 150 lines per file. So largeness is less an issue if the code is more modular and broken down. And the tests are likewise large and monolithic. Yeah, it is cool that Perl can read data from its own script, but when the tests get to be in the hundreds, it is best to put it in a separate file or files segregating simple and complicated test cases, specific bugs, and add a way to specify point the test to your own. I will probably do that for DeparseTree soon. And by doing this testing can grow even larger and be more complete. For example, why not deparse and run the resulting deparse on Perl's entire test suite. (That's what I attempt to do in Python.)
I've not had much success at having my talks recorded. However there will definitely will be slides and notes for the slides. The most complete jotted down ideas I have on decompiling are here. It hasn't gotten much review, so I would welcome comments. Perl's decompiler since it works off a Tree already is a little bit different. I should write a separate thing covering that and how it relates to the pipeline presented in that paper. I guess I'll have that done by the time Glasgow rolls around. There currently is very definite material that could be presented. However which material exactly will, alas, gets decided at the last minute. |
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op. On an average of 4 ops per line, and the reduced need for runtime nextstates of 90% - only lexstate changes and file beginning and the overhead of 5 ptrs per COP, we will win 4ptrs per reduced COP. On typical 10k src with 40k ops it will be a 4 ptrs(5-1)*10k memory win: 40kb (4 ops per line on average), plus the runtime win of ~about 4k ops, 8%. See [cperl #333]
See https://news.ycombinator.com/item?id=15185383
Looking up COP's to find the error lines is inexact, slow and esp. 8% heavier than using less COP's and moving the opline to each op.
I had a oplines branch a couple of years ago. Just need to add the warning and error cases.
The text was updated successfully, but these errors were encountered: