-
-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow page breaks in floats, absolute blocks, table-cells #36
Comments
Yes, this is a known limitation: no page breaks are supported inside floats, absolute positioning, or table cells. Unfortunately right now I don’t have a better answer than “avoid using floats that way”. I’d be happy to help anyone who wants to fix this, but this is a non-trivial change in the layout code. Otherwise this is something to be fixed eventually, but I don’t know when I’ll get to it. |
Thanks. From your description I'm not sure whether this is the known limitation or not. In this case I'm not trying to have page breaks inside a floated element, but between floated elements. Each |
As reported in #375, we have the same problem with consecutive absolute/relative blocks. |
I have just started using WeasyPrint, and I'm already a big fan. However, I have also quickly run into the float/break issue -- my users want Bootstrap and floated columns, and don't like what happens in the PDF document! Can @SimonSapin or anyone else comment on the refactoring that would be necessary to fix this wartish problem? I haven't perused your codebase yet, but I know Python very well; so, I'm looking for high-level overview of the current layout model/algorithm and why it gets tripped up trying to put breaks in floats, and what would have to be changed. Thanks, |
301 @liZe |
Let's go! I'll skip some details and lie a little bit to avoid useless complexity. Web pages have mainly been created to be displayed on rectangles whose width is fixed and whose height is automatically calculated according to the content. That's what a "normal" browser do. But the problem is a bit different when you want to print these web pages: the height is fixed too and you'll need to cut the content between different pages. CSS defines how the layout must be done, how blocks and texts are displayed. "Normal" blocks are put one below the other and "normal" texts are broken between multiple lines put one below the other. The way the "normal" content is displayed is called the normal flow. CSS gives the possibility to remove blocks from the normal flow of the page and make them behave in a different way. These blocks sometimes create their own flow, creating nested or parallel flows in the page. That's where it's becoming a bit hard. When CSS 2 has been written, floats and absolute/relative blocks (and somehow tables) were (almost) the only blocks creating parallel flows, and no-one really defined how these parallel flows had to be broken between pages. That's why WeasyPrint's layout has only one flow that can be correcly broken, and the blocks that are outside this flow are seen as atomic blocks going below the bottom of the page if needed. But now, many CSS specifications have added many ways to create strange flows, such as columns, regions, flexbox and grid. It was time to define how parallel and nested flows had to be broken between pages. It's now done in the fragmentation module. It's not clearly defined but it's much better than what we had in CSS 2. Bad news: it was not written when we started WeasyPrint. Really bad news: it's really different from what we have in WeasyPrint. It's probably not that difficult to implement the parts of the fragmentation module that are needed to fix this issue (well, for really simple cases). But it will need to slightly change many functions and modules in a single atomic commit that will be huge. We can imagine that the work needed is something like #291: long, tiring and painful. But not impossible. |
OK. Where should I be looking in the code to learn about the following (beyond the peephole insight of #291):
Thanks! |
You'll find all the code you need in the
Nested flows (as defined by the fragmentation CSS module) are pretty well supported for block-level and inline-level boxes, using a variable called We need to add the support of parallel flows. Instead of one pointer pointing to one position in the flow, we need multiple pointers pointing to the "current" positions in the parallel flows. I imagine that To fix this issue, we basically need:
That's all 😄! I think that everything's not correcly defined in the spec, we'll have to make some stupid choices for stupid cases (how do you render floats whose top border is taller than the page?), but the "normal" use cases should be quite well described and easy (and long, and painful) to implement. If you need anything, I'll be really happy to help! |
Thanks. That's just the kind of overview I was looking for. One last thing: Testing driven development: (okay, two last things)
|
FYI, my habit is to do minor refactoring while I'm working to understand existing logic. So you can expect some PRs along those lines. Also, I'm completely new to CSS implementation work ! ;-) However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms. The code appears to have a good number of pointers to key CSS specs. However, if there are some spec documents that are so basic that you wouldn't mention them in code comments, they might actually be useful for me! So, I would appreciate pointers to key algorithmic starting points for CSS. Thanks. |
<style>
@page {
font-family: monospace;
height: 2.5em;
line-height: 1em;
margin: 0;
width: 10em;
}
body {
margin: 0;
}
div {
background: red;
float: left;
width: 50%;
}
</style>
<body>
<div>
float float float float float
</div>
flow flow flow flow flow
</body> You need to get something like:
You'll need these skills!
There's a very useful chapter in the documentation. In the CSS spec, the best starting point is probably the presentation of the normal flow and the implementation of 9.4.1 and 9.4.2 in Good luck! |
OK, you threw me in the deep end of CSS spec, and I'm floundering, but progressing, through prose like this:
I don't yet have a solid mental-model of what it takes to do all the layout given multiple flows and page breaks, but I'm working on it, and the WeasyPrint code-base is very approachable and the focus on |
We solved the table split problem by placing |
This problem is really annoying, I found a way to fix this. The key point is split one
will be changed to
css
This just some sample code, just try to explain the main ideas, you need to change it to fit your situations. Wish this could help someone out. |
Any news on this thread, what about the support of tables? I'm currently using wkHtmlToPdf and also have the issue with tables, the current behavior is cut every where (that is fine for me) but it also allow to cut in the middle of a line of text that makes the lib not usable for me. Do we have a patch for this lib for my desired behavior? |
No, there’s currently no patch. As said earlier, there’s no easy fix, and closing this issue requires a lot of work. |
Many operations, including page breaks, require a pointer to a specific position of the box tree. For example, we used to have this structure to point to the beginning of the first child of the second child: (1, (2, None)) We now use: {1: {2: None}} This change is the first step to handle parallel flows (see #36). It doesn’t change anything to the layout for now, but it allows us to store multiple pointers in the same structure. The next step is to handle multiple pointers in skip_stack during boxes layout. It means that most of the *_layout() function need an extra for-loop to manage multiple skip stacks. We’ll then need to split new types of boxes: table cells, floats, absolutes…
In CSS Display Module Level 3, the "display" property gets a long representation allowing: - a clear separation between inner and outer display type, - new supported types (contents, run-in, flow-root…), - inline list items. This commit allows the (retrocompatible) new long syntax for "display". It also supports the "flow-root" value. It doesn’t support values related to ruby, and it doesn’t support the new "contents" and "run-in" display types. This work gives the possibility to simplify the code in the block_*_layout functions, and to improve the overall layout. Related to #36.
We won’t break
|
We now handle parallel flows for floats, absolutes, relatives, and table-cells. This bug is now closed. It required 9 years of hard work 🚀. We’ll release a beta soon, tests and feedback are welcome! |
Please, warn us here to test when available. Thank you |
Hello, A beta has been released. Don’t hesistate to try it 😉 |
So where can I find this new feature? Is this integrated in the newest release? |
Hi! As you can see in the metadata of these issues, it’s available since version 54. |
Thanks liZe, sorry for this stupid question: |
It should work out of the box. If your table row is not split, then there may be another CSS rule avoiding breaks somewhere ( |
Floated elements that don't fit on the current page simply fall off the bottom, rather than being placed on the next page.
Here's a handy long list of floated elements to demonstrate the problem: http://www.stripey.com/demo/weasyprint/float_off_bottom.html
Look at it in Firefox and do ‘Print Preview’. You should see that there's a page break, with the list being continued on page 2. Similarly if you print from Chromium.
But WeasyPrint generates this file, where the elements simply run off the bottom of the first page: http://www.stripey.com/demo/weasyprint/float_off_bottom.pdf
The text was updated successfully, but these errors were encountered: