Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to DOCX #1537

Closed
EugeneUvin opened this issue Oct 18, 2023 · 11 comments · Fixed by #2056
Closed

Export to DOCX #1537

EugeneUvin opened this issue Oct 18, 2023 · 11 comments · Fixed by #2056
Labels
build tool Component: Exports or the build tool enhancement Request: New feature or improvement next release Note: Features planned for next release

Comments

@EugeneUvin
Copy link

At least on Linux version there is no export to DOCX. in Ukraine many novel competitions accept novels in this format. Can it be added?

image

@EugeneUvin EugeneUvin added the enhancement Request: New feature or improvement label Oct 18, 2023
@vkbo
Copy link
Owner

vkbo commented Oct 18, 2023

There are no plans to add it at this time. The Open Document (odt) format is a well supported open format supported by most office applications, including MS Office. Any major office application can also easily convert the file to docx if needed.

These are two competing open formats. I have no idea why Microsoft had to make their own open format when one existed already, but that's Microsoft in a nutshell I guess. It's a lot of work adding writers for these very complex formats, but I can keep the feature request in the backlog for now. Maybe someone wants to contribute it at some point, or I find the time to write it.

@vkbo vkbo added the potential feature Request: May be considered later label Oct 18, 2023
@EugeneUvin
Copy link
Author

Yes, both formats exist - the use case is that one exports directly to the format that a competition accepts, makes some formatting adjustments and sends it out. My intention is to share this app with Ukrainian amateur writers who are often not familiar with format alternatives and will be confused by the absent option.

Ideally, having shortcut to format the output file specifically per common requirements would be helpful for authors.

@vkbo
Copy link
Owner

vkbo commented Oct 18, 2023

I get that, but this is not a shortcut, it requires a full implementation of the document standard to support a new file format. These XML-based document formats are complex and require a lot of research and trial and error to get to work. They aren't trivial, like HTML and the other formats supported. The Open Document writer is over 1500 lines of fairly complex code that took me weeks of my spare time to make, and then many fixes to get to work right and conform to the standard.

This is a fairly big feature request, and the result is no increase in applications that can open the file, since odt and docx have overlapping support.

@vkbo
Copy link
Owner

vkbo commented Oct 18, 2023

I'm not saying it won't be added. It's been in my long term plan for a while. I just don't have the capacity to do it in the foreseeable future, and there are a lot of feature requests higher up on the list.

@johnblommers
Copy link

johnblommers commented Oct 18, 2023 via email

@EugeneUvin
Copy link
Author

I'm not saying it won't be added. It's been in my long term plan for a while. I just don't have the capacity to do it in the foreseeable future, and there are a lot of feature requests higher up on the list.

I thought you use format specific libraries? Pandoc is able to convert from md to docx - most probably it provides a library for this conversion.

@vkbo
Copy link
Owner

vkbo commented Oct 19, 2023

No, there are no libraries in use by novelWriter aside from the Qt framework itself, and the optional spell checker library.

I used to have pandoc integration, but removed it because the quality of the result when creating ODT files is poor and not up to manuscript standards. Writing the ODT file directly is the only way to produce a good result. Converting ODT to DOCX seems to preserve the formatting, and there are numerous tools to do it, including pandoc. There are a ton online, which I do not recommend as they are likely data mining, and every single major officer application supports both formats, as I've already mentioned.

In the vast majority of cases a manuscript document needs to be opened in a word processor to add cover page and other formatting required by the various submission standards, so saving the result again as DOCX really shouldn't be an issue. You can also save as other formats if needed.

ODT was chosen because it is an open and well defined standard, with very wide support, and one I can actually test against as it is the native format of Libre Office, which again is cross platform. DOCX is only the native format of MS Office, which I don't even own, so I can't properly test the results. novelWriter is also written for Linux first, and developed on Linux. So ODT was the logical choice.

@vkbo
Copy link
Owner

vkbo commented Oct 19, 2023

Pandoc is able to convert from md to docx

Also, novelWriter is Markdown-like, not Markdown, so it has its own parser. Using pandoc directly on the source does not produce an acceptable result. That's why it was abandoned very early on in the dev history of novelWriter. @johnblommers has been around long enough to remember it I suspect.

Here are the components of the parser/writers:

@johnblommers
Copy link

johnblommers commented Oct 19, 2023

Pandoc is able to convert from md to docx

Also, novelWriter is Markdown-like, not Markdown, so it has its own parser. Using pandoc directly on the source does not produce an acceptable result. That's why it was abandoned very early on in the dev history of novelWriter. @johnblommers has been around long enough to remember it I suspect.

Yes indeed I remember it this way too.

BTW my comment about Pandoc was meant in this context:

  1. Write in novelWriter mywork document
  2. Export to mywork.odt
  3. Exit novelWriter
  4. pandoc mywork.odt -o mywork.docx

Admittedly I have blinked as I had not grasped that Veronica had a custom-written ODT exporter. Definitetly it's better to use Pandoc's power to convert ODT to DOCX. There is even a Pandoc feature to leverage a DOCX template. One has merly to review the excellent Pandoc documentation to learn more about this amazing tool.

@vkbo
Copy link
Owner

vkbo commented Oct 19, 2023

Yeah, the original issue with pandoc was that it couldn't convert Markdown to ODT and produce a good enough result for this use case, and neither can HTML. Those are limitations of the source formats. Markdown has far too little formatting options, and HTML isn't designed for pages, but for scrollable text.

With the ODT writer, I have full control. Converting the ODT document to other office document formats preserves all that.

@vkbo vkbo mentioned this issue Oct 18, 2024
6 tasks
@vkbo vkbo added build tool Component: Exports or the build tool next release Note: Features planned for next release and removed potential feature Request: May be considered later labels Oct 18, 2024
@vkbo vkbo added this to the Release 2.6 Beta 1 milestone Oct 18, 2024
@vkbo
Copy link
Owner

vkbo commented Oct 18, 2024

I've started working on this, and I'm making good progress.

Since I keep adding new features for the Manuscript tool, I wanted to get the DocX feature in before I'm adding any more. Otherwise it is a real pain to catch up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build tool Component: Exports or the build tool enhancement Request: New feature or improvement next release Note: Features planned for next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants