-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer size and overflow when calculating penalty #247
Comments
Hey @robinkrahl, Uh oh... thanks for checking the boundaries! I did some back of the envelope calculations and figured that
Yes, I think you're right... while developing it, I was playing negative penalties too, just to see what the effect was. I'm afraid there's no further cleverness behind the numbers than that 😄
I am unfortunately not 100% sure if it works to return a large fixed value such as
The cost returned for the pair |
I've been testing this and as I figured, it doesn't work to simply use saturating arithmetic. As an example, I truncated the cost at 15,000 and wrapped
The condition for being totally monotone says that
The anti-diagonal is too small since the bottom-left corner got truncated from 21,013 to 15,000. I'll try and figure out if there is some way of replacing "overflow" in the matrix with a bounded number which still satisfies the condition of monotonicity. It might be that the easiest thing is to say "don't use more than |
Some more observations. I ran wrap("x y z", u32::max_value() as usize) However, I don't see overflows if I use The above is with |
These fuzz tests immediately found the problem reported in #247 and gave a short way of reproducing it: wrap("x y z", usize::max_value()) will currently panic due to an overflow error. The wrap_first_fit function seems to not crash. Run the fuzz tests with: $ cargo fuzz run fill_optimal_fit -- -only_ascii=1 You’ll need to `cargo install cargo-fuzz` first.
These fuzz tests immediately found the problem reported in #247 and gave a short way of reproducing it: wrap("x y", 515566821223) will currently panic due to an overflow error. The wrap_first_fit function seems to not crash. Run the fuzz tests with: $ cargo fuzz run fill_optimal_fit -- -only_ascii=1 You’ll need to `cargo install cargo-fuzz` first.
The `wrap_optimal_fit algorithm` computes the penalty for a gap as `gap * gap`. If a fragment has a size near `usize::max_value()` and if the line width is small, this computation can easily overflow. When this happened, we would previously abort or unwind. Now, we instead do the computations with checked arithmetic and detect the overflow. We then proceed to wrap the half of the fragments by themselves. If this work, we then wrap the second half. This way, we might be able to wrap everything without overflow. Should there be a single fragment which causes the overflow by itself, this fragment is put on a line by itself. When wrapping part of the fragments, we might of course end up with a partial last line. To fix this, we simply pop this line and re-wrap the fragments that were put onto this line. This ensures no “seams” in the wrapping. Fixes #247.
Hi @robinkrahl, I've looked at this a lot and I've tried a few different approaches. I think I've found one that works, albeit with some performance loss. Basically, I switched to checked arithmetic so I can detect when the computations overflow. When that happens, I split the list of fragments into two and wrap each half separately before joining the results. This should give the best possible results in the face of "over-sized" fragments. It is about 15-20% slower than before, though. Some questions come to mind:
In any case, I'll put up the branch now and then I hope you can take a look. |
The `wrap_optimal_fit algorithm` computes the penalty for a gap as `gap * gap`. If a fragment has a size near `usize::max_value()` and if the line width is small, this computation can easily overflow. When this happened, we would previously abort or unwind. Now, we instead do the computations with checked arithmetic and detect the overflow. We then proceed to wrap the half of the fragments by themselves. If this work, we then wrap the second half. This way, we might be able to wrap everything without overflow. Should there be a single fragment which causes the overflow by itself, this fragment is put on a line by itself. When wrapping part of the fragments, we might of course end up with a partial last line. To fix this, we simply pop this line and re-wrap the fragments that were put onto this line. This ensures no “seams” in the wrapping. Fixes #247.
The branch is in #259. |
The `wrap_optimal_fit algorithm` computes the penalty for a gap as `gap * gap`. If a fragment has a size near `usize::max_value()` and if the line width is small, this computation can easily overflow. When this happened, we would previously abort or unwind. Now, we instead do the computations with checked arithmetic and detect the overflow. We then proceed to wrap the half of the fragments by themselves. If this work, we then wrap the second half. This way, we might be able to wrap everything without overflow. Should there be a single fragment which causes the overflow by itself, this fragment is put on a line by itself. When wrapping part of the fragments, we might of course end up with a partial last line. To fix this, we simply pop this line and re-wrap the fragments that were put onto this line. This ensures no “seams” in the wrapping. Fixes #247.
The `wrap_optimal_fit‘ algorithm computes the penalty for a gap as `gap * gap`. If a fragment has a size near `usize::max_value()` and if the line width is large, this computation can easily overflow. When this happened, we would simply crash. Now, we instead do the computations with checked arithmetic and detect the overflow. We then proceed to wrap the first half of the fragments by themselves. If this work, we then wrap the second half. This way, we might be able to wrap everything without overflow. Should there be a single fragment which causes the overflow by itself, this fragment is put on a line by itself. When wrapping part of the fragments, we might of course end up with a partial last line. To fix this, we pop the last line and re-wrap the fragments that were put onto this line. This helps remove some of the “seams” that would otherwise occur. However, this is not perfect if there are many over-sized fragments since they can still cause half-empty lines to appear in the output. Fixes #247.
Sorry, I’ve been working on other projects and didn’t monitor this issue properly.
Everything is better than a panic. Even just returning an error would be an improvement, because in that case I can just fall back to first fit, or I can reduce the scaling factor. |
I’ve run some more experiments and I’m a bit confused now. In your more recent comments, you talk about the word width and issues with overlong fragments. I think the issue that I was running into is an overflow in the penalty calculation for very short lines (here). As you pointed out earlier, this limits the width to approximately the square root of the maximum width. Anyway, I’ve tested your wrap-optimal-fit-checked branch and it seems to work fine for genpdf! :-) |
Hey @robinkrahl, Thanks for looking at this again! I guess it's time to add a checked version as in the branch... I was hoping to find some nice way of computing the penalties which could completely circumvent the issue, but so far I haven't found a great solution. #259 was vielleicht the best attempt so far — but it's also pretty complex with it's recursive nature. Since I appreciate simple code, I guess checked arithmetic is best. @MakotoE just put up #392 which (also) implements it, I'll have to dig out the |
I think #392 tries to fix this using saturating additions, which you investigated here and came to the conclusion that it does not meet the requirements for smawk. What about #289? Wouldn’t it mostly mitigate the problem for realistic use cases? And it should still be possible to detect the overflow case by checking for infinity. |
I think it's even better: there are no infinities because the penalties are scaled up to some maximum value: 93e5114 This approach also has a funny side effect of making the penalties dependent on the line length. That is, a gap of However, since this is different that the current approach, I recall spending a lot of time with a Jupyter Notebook trying to figure out if I can scale the penalties in a nice way to make them independent of the line width — but now I think I might have tried to solve a non-problem. |
Sounds great! Please let me know if you want me to test or review something. |
Thanks! I don't expect I'll get around to looking at this until the weekend, but I'll let you know! |
Playground example for the bug |
More weird test results with the playground example:
|
Overflow checks are done as standard only in debug mode: https://doc.rust-lang.org/rustc/codegen-options/index.html#overflow-checks so hopefully that explains your symptoms? |
Ah, yes. That's it. In release mode it doesn't perform checks. It just produces false results. |
This changes the type used for internal width computations in the wrap algorithms. Before, we used `usize` to represent the fragment widths and for the line widths. This could make the optimal-fit wrapping algorithm overflow when it tries to compute the optimal wrapping cost. The problem is that the algorithm computes a cost using integer values formed by (line_width - target_width)**2 When `line_width` is near `usize::MAX`, this computation can easily overflow. By using an `f64` for the cost computation, we achieve two things: * A much larger range for the cost computation: `f64::MAX` is about 1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a fragment width in the range of `u64`, will thus not exceed 3e38, something which is easily represented with a `f64`. This means that wrapping fragments derived from a `&str` cannot overflow. Overflows can still be triggered when fragments with extreme proportions are formed directly. The boundary seems to be around 1e170 with fragment widths above this limit triggering overflows. * Applications which wrap text using proportional fonts will already be operating with widths measured in floating point units. Using such units internally makes life easier for such applications, as shown by the changes in the Wasm demo. Fixes #247 Fixes #416
Hi all, I've been testing different ways to fixing this. My favorite is #421, which changes the internal computations to use I also played with changing the type used for all widths to either |
The optimization problem solved by the optimal-fit algorithm is fundamentally a minimization problem. It is therefore not sensible to allow negative penalties since all penalties are there to discourage certain features: * `nline_penalty` discourages breaks with more lines than necessary, * `overflow_penalty` discourages lines longer than the line width, * `short_last_line_penalty` discourages short last lines, * `hyphen_penalty` discourages hyphenation Making this change surfaces the overflow bug behind #247 and #416. This will be fixed next via #421 and this commit can be seen as a way of simplifying that PR.
The optimization problem solved by the optimal-fit algorithm is fundamentally a minimization problem. It is therefore not sensible to allow negative penalties since all penalties are there to discourage certain features: * `nline_penalty` discourages breaks with more lines than necessary, * `overflow_penalty` discourages lines longer than the line width, * `short_last_line_penalty` discourages short last lines, * `hyphen_penalty` discourages hyphenation Making this change surfaces the overflow bug behind #247 and #416. This will be fixed next via #421 and this commit can be seen as a way of simplifying that PR.
The optimization problem solved by the optimal-fit algorithm is fundamentally a minimization problem. It is therefore not sensible to allow negative penalties since all penalties are there to discourage certain features: * `nline_penalty` discourages breaks with more lines than necessary, * `overflow_penalty` discourages lines longer than the line width, * `short_last_line_penalty` discourages short last lines, * `hyphen_penalty` discourages hyphenation Making this change surfaces the overflow bug behind #247 and #416. This will be fixed next via #421 and this commit can be seen as a way of simplifying that PR.
The optimization problem solved by the optimal-fit algorithm is fundamentally a minimization problem. It is therefore not sensible to allow negative penalties since all penalties are there to discourage certain features: * `nline_penalty` discourages breaks with more lines than necessary, * `overflow_penalty` discourages lines longer than the line width, * `short_last_line_penalty` discourages short last lines, * `hyphen_penalty` discourages hyphenation Making this change surfaces the overflow bug behind #247 and #416. This will be fixed next via #421 and this commit can be seen as a way of simplifying that PR.
This changes the type used for internal width computations in the wrap algorithms. Before, we used `usize` to represent the fragment widths and for the line widths. This could make the optimal-fit wrapping algorithm overflow when it tries to compute the optimal wrapping cost. The problem is that the algorithm computes a cost using integer values formed by (line_width - target_width)**2 When `line_width` is near `usize::MAX`, this computation can easily overflow. By using an `f64` for the cost computation, we achieve two things: * A much larger range for the cost computation: `f64::MAX` is about 1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a fragment width in the range of `u64`, will thus not exceed 3e38, something which is easily represented with a `f64`. This means that wrapping fragments derived from a `&str` cannot overflow. Overflows can still be triggered when fragments with extreme proportions are formed directly. The boundary seems to be around 1e170 with fragment widths above this limit triggering overflows. * Applications which wrap text using proportional fonts will already be operating with widths measured in floating point units. Using such units internally makes life easier for such applications, as shown by the changes in the Wasm demo. Fixes #247 Fixes #416
This changes the type used for internal width computations in the wrap algorithms. Before, we used `usize` to represent the fragment widths and for the line widths. This could make the optimal-fit wrapping algorithm overflow when it tries to compute the optimal wrapping cost. The problem is that the algorithm computes a cost using integer values formed by (line_width - target_width)**2 When `line_width` is near `usize::MAX`, this computation can easily overflow. By using an `f64` for the cost computation, we achieve two things: * A much larger range for the cost computation: `f64::MAX` is about 1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a fragment width in the range of `u64`, will thus not exceed 3e38, something which is easily represented with a `f64`. This means that wrapping fragments derived from a `&str` cannot overflow. Overflows can still be triggered when fragments with extreme proportions are formed directly. The boundary seems to be around 1e170 with fragment widths above this limit triggering overflows. * Applications which wrap text using proportional fonts will already be operating with widths measured in floating point units. Using such units internally makes life easier for such applications, as shown by the changes in the Wasm demo. Fixes #247 Fixes #416
When testing the optimal-fit algorithm, I encountered a panic during the cost calculations:
I fixed this by multiplying my mm widths with the factor 100 instead of 1000, which should still be a sufficient precision. But looking at the cost calculations, I have two follow-up questions:
i32
instead of ausize
? They are generated by adding positiveusize
values, positive integer constants and other costs, sousize
should work too.i32
type tousize
, causing an integer overflow at the same location for themax_width
test case that sets the line width tousize::MAX
. This is currently working, probably because the huge valuetarget_width - line_width
is cropped when casting toi32
before updating the cost. Shouldn’t the cost calculation use saturating additions and multiplications if large line widths are supported?The text was updated successfully, but these errors were encountered: