Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always make word counts etc. available in odt output as variables #2033

Closed
awqk opened this issue Sep 29, 2024 · 9 comments · Fixed by #2035
Closed

Always make word counts etc. available in odt output as variables #2033

awqk opened this issue Sep 29, 2024 · 9 comments · Fixed by #2035
Labels
build tool Component: Exports or the build tool enhancement Request: New feature or improvement next release Note: Features planned for next release

Comments

@awqk
Copy link

awqk commented Sep 29, 2024

When odt output is created, it would be nice to have the metadata available as variables for use with templates and LibreOffice Writer's 'Insert Field' command: word counts (text/headings), paragraph count, character count, build date, build settings name, author, title, etc.

This feature should be always enabled (without a configuration option in the build settings).

Inspired by #2023, #2024

@awqk awqk added the enhancement Request: New feature or improvement label Sep 29, 2024
@vkbo
Copy link
Owner

vkbo commented Sep 29, 2024

Does the Open Document format support this?

@awqk
Copy link
Author

awqk commented Sep 29, 2024

Does the Open Document format support this?

Uh..., yes, 'User Field' seems to be the right field type for this. Supports string and float, apparently.
Here's a snippet from an otherwise empty .fodt:

<text:user-field-decls>
 <text:user-field-decl office:value-type="string" office:string-value="qwerty" text:name="TestUserField"/>
</text:user-field-decls>

The definition is saved even if the field is not used in the document.

@peter88213
Copy link

peter88213 commented Sep 29, 2024

word counts (text/headings), paragraph count, character count,

As far as I know, most of this is generated dynamically by LibreOffice. See this Insert >Field dialog:

grafik

build date, build settings name, author, title, etc.

Actually, the right place for this is the meta.xml file in the odt zip archive. Here you can create the fields in the <office:meta> area with the ODF standard tags, such as <dc:title>for the tilte and. <meta:initial-creator> for the author. The build date is <dc:creation-date>, and there is also a <dc:description> field for a short synopsis or whatever.
<meta:generator> might be 'NovelWriter' until the file is rewritten with e.g. LibreOffice.

With Python, I use this template:

    _META_XML = '''<?xml version="1.0" encoding="utf-8"?>
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:grddl="http://www.w3.org/2003/g/data-view#" office:version="1.2">
  <office:meta>
    <meta:generator>novxlib</meta:generator>
    <dc:title>$Title</dc:title>
    <dc:description>$Summary</dc:description>
    <dc:subject></dc:subject>
    <meta:keyword></meta:keyword>
    <meta:initial-creator>$Author</meta:initial-creator>
    <dc:creator></dc:creator>
    <meta:creation-date>${Datetime}Z</meta:creation-date>
    <dc:date></dc:date>
  </office:meta>
</office:document-meta>
'''

@awqk
Copy link
Author

awqk commented Sep 30, 2024

word counts (text/headings), paragraph count, character count,

As far as I know, most of this is generated dynamically by LibreOffice. See this Insert >Field dialog: [...]

Isn't this the document word count, as opposed to the manuscript word count? Maybe I didn't specify the use case well enough, but modifying a generated document by adding/changing a title page shouldn't change the manuscript word count. If you intend to split a fine hair, you could also be interested in the word count without headings etc.

build date, build settings name, author, title, etc.

Actually, the right place for this is the meta.xml file in the odt zip archive.

Yes!

@peter88213
Copy link

peter88213 commented Sep 30, 2024

Isn't this the document word count, as opposed to the manuscript word count?

As we know from NaNoWriMo, each word processor seems to calculate a different word count for the same text.
For LibreOffice there is at least a definition, see: https://help.libreoffice.org/latest/en-US/text/swriter/guide/words_count.html
If you integrate your exported manuscript in a master document as I suggested on another topic here, you can get separated word counts for the frontmatter/backmatter and the actual manuscript.
Anyway, because of the inaccuracies mentioned above, I think the word count is more about the dimension than the exact value, isn't it?

Edit:
If it is essential to get the word count without headings, you can temporarily hide the headings in LibreOffice:
grafik

Incidentally, a static word count property would soon be outdated if the odt document were still being worked on.
It might be more practical to save a snapshot of the metadata of interest via script at the time of document export. Or to log the word counts.

@awqk
Copy link
Author

awqk commented Sep 30, 2024

[...] I think the word count is more about the dimension than the exact value, isn't it?

Agreed!

If you integrate your exported manuscript in a master document as I suggested on another topic here, you can get separated word counts for the frontmatter/backmatter and the actual manuscript.

Agreed!

Incidentally, a static word count property would soon be outdated if the odt document were still being worked on. [...]

The supposed workflow here is editing the manuscript with novelWriter only. Changes to the odt after the build should be as few as absolutely necessary.

@awqk
Copy link
Author

awqk commented Sep 30, 2024

I'll try to summarize this enhancement request:

  1. In the <office:document> xml element, add and populate an <office:meta> element. This is what it would look like in an .fodt file:
<office:meta>
 <meta:creation-date>2024-09-30T11:14:40.668845707</meta:creation-date>
 <meta:initial-creator>Firstname Lastname</meta:initial-creator>
 <meta:generator>LibreOffice/7.3.7.2$Linux_X86_64 LibreOffice_project/30$Build-2</meta:generator>
 <dc:title>Title</dc:title>
</office:meta>
  1. In <office:text>, add and populate a <text:user-field-decls> element, like this:
<text:user-field-decls>
 <text:user-field-decl office:value-type="float" office:value="1234" text:name="ManuscriptWords"/>
 <text:user-field-decl office:value-type="float" office:value="2" text:name="ManuscriptWordsInHeadings"/>
 <text:user-field-decl office:value-type="float" office:value="1232" text:name="ManuscriptWordsInText"/>
 ...
</text:user-field-decls>

The second part adds few benefits if novelWriter's .odt output is part of a master document, as outlined in #1975. Placing them in a user field protects the values from being clobbered by LibreOffice's own counts.

@vkbo
Copy link
Owner

vkbo commented Sep 30, 2024

What's the point of the meta values? These are already populated by novelWriter.

The following meta data is set:

        # Office Meta Data
        xMeta = ET.SubElement(self._xMeta, _mkTag("meta", "creation-date"))
        xMeta.text = timeStamp

        xMeta = ET.SubElement(self._xMeta, _mkTag("meta", "generator"))
        xMeta.text = f"novelWriter/{__version__}"

        xMeta = ET.SubElement(self._xMeta, _mkTag("meta", "initial-creator"))
        xMeta.text = self._project.data.author

        xMeta = ET.SubElement(self._xMeta, _mkTag("meta", "editing-cycles"))
        xMeta.text = str(self._project.data.saveCount)

        # Format is: PnYnMnDTnHnMnS
        # https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#duration
        eT = self._project.data.editTime
        xMeta = ET.SubElement(self._xMeta, _mkTag("meta", "editing-duration"))
        xMeta.text = f"P{eT//86400:d}DT{eT%86400//3600:d}H{eT%3600//60:d}M{eT%60:d}S"

        # Dublin Core Meta Data
        xMeta = ET.SubElement(self._xMeta, _mkTag("dc", "title"))
        xMeta.text = self._project.data.name

        xMeta = ET.SubElement(self._xMeta, _mkTag("dc", "date"))
        xMeta.text = timeStamp

        xMeta = ET.SubElement(self._xMeta, _mkTag("dc", "creator"))
        xMeta.text = self._project.data.author

@vkbo vkbo added this to the Release 2.6 Beta 1 milestone Sep 30, 2024
@vkbo vkbo added build tool Component: Exports or the build tool next release Note: Features planned for next release labels Sep 30, 2024
@vkbo
Copy link
Owner

vkbo commented Sep 30, 2024

Anyway, the fields were trivial to add. I just made all fields recorded available, and prefixed them with "Manuscript":

image

It only took a few lines of code.

        if self._counts:
            xFields = ET.Element(_mkTag("text", "user-field-decls"))
            for key, value in self._counts.items():
                ET.SubElement(xFields, _mkTag("text", "user-field-decl"), attrib={
                    _mkTag("office", "value-type"): "float",
                    _mkTag("office", "value"): str(value),
                    _mkTag("text", "name"): f"Manuscript{key[:1].upper()}{key[1:]}",
                })
            self._xText.insert(0, xFields)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build tool Component: Exports or the build tool enhancement Request: New feature or improvement next release Note: Features planned for next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants