Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate multiple invoices in one docx, one invoice per page #182

Open
aetna-softwares opened this issue Nov 17, 2015 · 30 comments
Open

Generate multiple invoices in one docx, one invoice per page #182

aetna-softwares opened this issue Nov 17, 2015 · 30 comments
Labels

Comments

@aetna-softwares
Copy link

The current features of this library are great to generate a "single item" output file, such as for example editing an invoice.

I am looking to an enhancement that allow us to work with the same template but give it an array of data and obtain a document containing all items in one file (such as a mailing in word with an xls file).

the idea is to create each individual files and to concatenate them in a final document with a page break between each item.

I try to get this result by using loop syntax with in page-break inside :

 {#invoices} 
... contents ...
[page break here]
{/invoice} 

but it has 2 drawbacks :
1/ from a user point of view the {#invoices} tag is difficult to understand as he is creating a template for a single invoice
2/ as I force a page break at the end of content, I have a white page at the end of generated document

A workaround for this 2 point should be to automatically add the {#invoices} tags to the user template before starting the merge and to remove the last page break from the generated file but I would prefer a clean feature than to do this kind of quick hack ;)

do you think that it could be a good feature for your lib ?

@Zulus88
Copy link

Zulus88 commented Nov 18, 2015

Same problem for me - multiplying invoices ):. As far as I undestand, your syntax tells that '...contents...' consists of all remaining tags with replacement data, am I correct? Do you put {#invoices} tag at top of page 1 and {/invoice} at top of page 2 (after page break)? Or do you put some command for Word when generating output docx?

@aetna-softwares
Copy link
Author

hi,

yes '...contents...' is my template of invoice with tags of my invoice data.

the {/invoice} is indeed on the page 2 after the page break.

in this early test, it is a pure usage of the lib "as is" and it works quite well if you do yourself the templates and if you don't care about the white page at the end.

@aetna-softwares
Copy link
Author

for your information, here is a quick hack to automatically add the "array" tags with the page break :

var docx = new Docxtemplater(contentsDocx);

            if(Array.isArray(data)){
                Object.keys(docx.zip.files).forEach(function(f){
                    var asTextOrig = docx.zip.files[f].asText ;
                    docx.zip.files[f].asText = function(){
                        var text = asTextOrig.apply(docx.zip.files[f]) ;

                        text = text.replace("<w:body>", "<w:body><w:t>{#pages}</w:t>");
                        text = text.replace("</w:body>", '<w:p w:rsidR="00C7053A" w:rsidRDefault="00C7053A"><w:r><w:rPr><w:lang w:val="fr-FR" /></w:rPr><w:br w:type="page" /></w:r></w:p><w:t>{/pages}</w:t></w:body>');

                        return text
                    }
                }) ;

                data = {pages : data} ;
            }

            docx.setData(data);
            docx.render();

            var buf = docx.getZip().generate({type:"nodebuffer"});

please note that I am note an openXML guru so the breakpage syntax should probably be improved

and the last part of the trick (remove the last page break in resulting file) is not done here.

disclaimer : this is only a quick hack and use some not documented part of docxtemplater so it may break any moment on a version update !

@edi9999
Copy link
Member

edi9999 commented Nov 19, 2015

Hi,

I am quite nitpicky about integrating new features to docxtemplater. For example, in the past (v0.x), the image module was integrated inside this repository, but it was making the code base more complex and less lightweight (some people use docxtemplater in the browser). Also, it was not easy to add functionality without integrating your code inside the repository (or maintaining a fork).

That's why I have created the concept of modules. Modules makes it possible to hook into events triggered by docxtemplater.

Here's the code of the image-module :

https://github.com/open-xml-templating/docxtemplater-image-module

It would be possible to integrate the feature you want in a new module, for example with the following syntax:

var MultiDocModule=require('docxtemplater-multi-doc-module')

var opts = {}
opts.loopOver = "invoices";
var multiDocModule=new MultiDocModule(opts);

var docx=new DocxGen()
    .attachModule(multiDocModule)
    .load(content)
    .setData({invoices: [  {"customer" : "John Doe", price: "10 $"},  {"customer" : "Jane Doe", price: "20 $"},  {"customer" : "John Doe", price: "10 $"} ] })
    .render()

var buffer= docx
        .getZip()
        .generate({type:"nodebuffer"})

fs.writeFile("test.docx",buffer);

You can either do your module yourself (I can help you for specific question if you do it open-source), or we could agree on a contract (in that case, please contact me via email)

@aetna-softwares
Copy link
Author

This is a good concept !

I will certainly try to write this module when i'll have some spare time

@edi9999 edi9999 changed the title Generate multi-documents Generate document in one docx for each page Dec 1, 2015
@edi9999 edi9999 closed this as completed Dec 1, 2015
@edi9999 edi9999 reopened this Dec 1, 2015
@edi9999 edi9999 changed the title Generate document in one docx for each page Generate multiple invoices in one docx, one invoice per page in one docx Dec 1, 2015
@edi9999 edi9999 changed the title Generate multiple invoices in one docx, one invoice per page in one docx Generate multiple invoices in one docx, one invoice per page Dec 1, 2015
@andrest
Copy link

andrest commented Jul 19, 2016

@aetna-softwares solution provides what I need, however, it looks like footer and header are not kept in the same loop scope. Any ideas on how to pass along the scope?

@edi9999
Copy link
Member

edi9999 commented Jul 19, 2016

I think that this should be done from word in your template: you should select something like : "apply header to whole document" as far as I can tell. There is also the possibility to use a header for odd page numbers and a header for even page numbers.

You could also do it programmatically by copying the content header.xml and create the right rel

@motleydev
Copy link

@aetna-softwares, did you ever get this turned into a module?

@aetna-softwares
Copy link
Author

Hi,

Finally I didn't need it anymore so I didn't take the time to do it.

We ended to generate individual documents and when we need some document aggregation we do it on PDF not at the docx generation time.

@motleydev
Copy link

Cool. Not a bad approach.

@andrest
Copy link

andrest commented Oct 31, 2016

We're also about to move to what @aetna-softwares described. We've found that with larger documents memory leaks exceed our environment constraints. With 100-200 pages it needs more than 1.5GB RAM.

@edi9999
Copy link
Member

edi9999 commented Dec 11, 2016

I think with the newest version 3.0.2, they shouldn't be any more memory leaks now.

@awerlang
Copy link

@aetna-softwares Are you converting from docx to pdf? Are you using LibreOffice or another tool?

@aetna-softwares
Copy link
Author

@awerlang yes LibreOffice give me the best results with the best performances (although performances are not so good but others are worse)

@awerlang
Copy link

@aetna-softwares Cool! Thanks for sharing! I was working on a Docker image forked from https://hub.docker.com/r/xcgd/py3o.fusion/, hope to add a node server as well.

@edi9999
Copy link
Member

edi9999 commented Jan 30, 2018

There is also the idea of join which would allow to add something between each iteration of the loops and not after each item.

{:join (users,pagebreak)}
{name}
{/join}

Would give the desired output (all user information on one page, but without a blank page at the end).

@jdcrecur
Copy link

jdcrecur commented Feb 3, 2018

Does this actually work already, or a proposal for a solution?

@Zulus88
Copy link

Zulus88 commented Feb 3, 2018 via email

@jdcrecur
Copy link

jdcrecur commented Feb 3, 2018

Oh wow nice one!

Is there a page for the documentation on this join logic?

@edi9999
Copy link
Member

edi9999 commented Feb 4, 2018

The join syntax is a proposition, it is not implemented get. I posted it here for discussion

@genachka
Copy link

genachka commented Feb 12, 2018

@edi9999 I'm working with a table loop, that contains details for properties of 5 files that I'm documenting and need to place the filename in the footer, so as the filename changes, the footer of that page needs to show the correct one. If this :join concept will do it, I'm +1 for that! If not, suggestion on how?

@edi9999
Copy link
Member

edi9999 commented Feb 13, 2018

Hello @genachka , can you open a new issue for this, and also include screenshots or documents of what you have as data and what ouput you want. ? It doesn't seem to be solvable with the join module.

@genachka
Copy link

@edi9999 opened as #378

@jdcrecur
Copy link

Thought i would give this a go this evening, but not making much head way. I can easily inject a page break symbol.. but this still leaves me with a trailing page. I added an index to each page break.. now i only need to remove the last one, but not sure how to access the rendered content to do so.

    //Load the docx file as a binary
    let content = fs.readFileSync(path.resolve(__dirname, sourceFile), 'binary')
    let zip = new JSZip(content)
    let doc = new Docxtemplater()

    // Load the zip container and inject the merge variables
    doc.loadZip(zip)
    doc.setOptions({paragraphLoop: true})

    let data = Object.assign({}, jobData, {subject_loop: reports})
    doc.setData(data)

    // Hack to inject page breaks
    Object.keys(doc.zip.files).forEach((f, index) => {
      let asTextOrig = doc.zip.files[f].asText
      doc.zip.files[f].asText = () => {
        let text = asTextOrig.apply(doc.zip.files[f])
        text = text.replace('<w:t index="'+index+'">{pagebreak}</w:t>', `<w:br w:type="page"/>`)
        return text
      }
    })

    // render the document
    doc.render()

    // Now remove the page with the highest index?
    // Object.keys(doc.zip.files).forEach((f) => {
    //   let asTextOrig = doc.zip.files[f].asText
    //   console.log(asTextOrig())
    // })

    let buf = doc.getZip().generate({type: 'nodebuffer'})
    fs.writeFileSync(path.resolve(__dirname, targetFile), buf)

    return targetFile

@edi9999
Copy link
Member

edi9999 commented Feb 20, 2018

Hello, the different pages in docx are not stored as separate documents, everything is inside the file /word/document.xml, the paging is not explicit in the document, but is calculated by the rendering engine.

@jdcrecur
Copy link

Thanks, I gave this another go this evening.. not too difficult when you know where to look. Here is the code for anyone else stuck in the same boat:

    //Load the docx file as a binary
    let content = fs.readFileSync(path.resolve(__dirname, sourceFile), 'binary')
    let zip = new JSZip(content)
    let doc = new Docxtemplater()

    // Load the zip container and inject the merge variables
    doc.loadZip(zip)
    doc.setOptions({paragraphLoop: true})

    let data = Object.assign({}, jobData, {subject_loop: reports})

    // Set the data to use in the replacement
    doc.setData(data)

    // Hack to inject page breaks by replacing custom placeholder
    Object.keys(doc.zip.files).forEach((f, index) => {
      let asTextOrig = doc.zip.files[f].asText
      doc.zip.files[f].asText = () => {
        let text = asTextOrig.apply(doc.zip.files[f])
        text = text.replace('<w:t>{loop_pagebreak}</w:t>', '<w:br loop-pagebreak="true" w:type="page"/>')
        return text
      }
    })
    doc.render()

    // remove the last pagebreak via cheerio
    const $1 = cheerio.load(doc['zip']['files']['word/document.xml']['_data'], {
      xml: {
        withDomLvl1: true,
        normalizeWhitespace: false,
        xmlMode: true,
        decodeEntities: true
      }
    });
    $1("*[loop-pagebreak]").last().remove()
    doc['zip']['files']['word/document.xml']['_data'] = $1.root().html()

    let buf = doc.getZip().generate({type: 'nodebuffer'})
    fs.writeFileSync(path.resolve(__dirname, targetFile), buf)

@dracuten1
Copy link

{}, jobData, {subject_loop: reports}

can you show me jobdata and reports structure?

@Coronelpanter
Copy link

Thank you @edi9999 , this is something that could help me but not in especific how i said i need repeat the same template

@collaorodrigo7
Copy link

I tried the proposed solution from @jdcrecur but did not work for me.
It seems like the asText method is not getting executed, and it does not make any changes in my case. (Maybe because its been more than 3 years since then 😅 )
Anyways, I found a solution and I am posting it in case it helps anyone. (Screenshots below)
On your docx you can do something like this:

{#dataLoop}
{name}
{@raw_loop_pagebreak}
{/dataLoop}

And then on you doc.setData you can:

doc.setData({
  raw_loop_pagebreak: `<w:br w:type="page"/>`,
  dataLoop: [
    {
      name: "hello",
    },
    {
      name: "hello2",
      raw_loop_pagebreak: "", //overwrite raw_loop_pagebreak here so that the last element does not add a page break
    },
  ],
});

image

@edi9999
Copy link
Member

edi9999 commented Sep 6, 2021

To avoid having to overwrite the raw_loop_pagebreak in your data, you could also use the {$isLast} trick, documented here :

https://docxtemplater.readthedocs.io/en/latest/configuration.html?highlight=isLast#simple-parser-example-for-index-and-islast-inside-loops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests