From 316bfcc1560aee06b561b88c9f20428746cc6058 Mon Sep 17 00:00:00 2001 From: Andrew Heiss Date: Mon, 3 Jun 2024 12:44:29 -0400 Subject: [PATCH] Allow extension to be used as a filter instead of format --- README.md | 84 ++++++++++++++++++--------- README.qmd | 62 +++++++++++++------- _extensions/wordcount/_extension.yml | 11 ++-- template.qmd | 15 ++++- tests/testthat/test-use-as-filter.R | 15 +++++ tests/testthat/test-use-as-filter.qmd | 38 ++++++++++++ 6 files changed, 170 insertions(+), 55 deletions(-) create mode 100644 tests/testthat/test-use-as-filter.R create mode 100644 tests/testthat/test-use-as-filter.qmd diff --git a/README.md b/README.md index 763a087..dd1ace8 100644 --- a/README.md +++ b/README.md @@ -82,15 +82,20 @@ section](#how-this-all-works) to understand… um… how it works. quarto add andrewheiss/quarto-wordcount ``` -Using {quarto-wordcount} requires Quarto version \>= 1.3.0 +{quarto-wordcount} requires Quarto version \>= 1.4.551 This will install the extension under the `_extensions` subdirectory. If you’re using version control, you will want to check in this directory. ### Usage -You can specify one of three different output formats in your YAML -settings: `wordcount-html`, `wordcount-pdf`, and `wordcount-docx`: +There are two ways to enable the extension: (1) as an output format and +(2) as a filter. + +#### Output format + +You can specify one of four different output formats in your YAML +settings: `wordcount-html`, `wordcount-pdf`, `wordcount-docx`: ``` yaml title: Something @@ -99,8 +104,8 @@ format: ``` The `wordcount-FORMAT` format type is really just a wrapper for each -base format (HTML, PDF, and Word), so all other HTML-, PDF-, and -Word-specific options work like normal: +base format (HTML, PDF, Word, and Markdown), so all other HTML-, PDF-, +Word-, and Markdown-specific options work like normal: ``` yaml title: Something @@ -111,6 +116,46 @@ format: cap-location: margin ``` +#### Filter + +If you’re using a [custom output +format](https://quarto.org/docs/extensions/listing-formats.html) like +[{hikmah-academic-quarto}](https://github.com/andrewheiss/hikmah-academic-quarto) +or a [journal article +format](https://quarto.org/docs/extensions/listing-journals.html) like +[{jss}](https://github.com/quarto-journals/jss), you can’t use the +`wordcount-html` format, since you can’t combine output formats. + +To enable word counting for *any* format, including custom formats, you +can add the extension Lua scripts as filters. You need to specify three +settings: + +1. `citeproc: false` must be set so that Quarto doesn’t try to process + citations +2. The path to `citeproc.lua` so that citations are processed before + counting words—[this must come *before* + `wordcount.lua`](#how-this-all-works) +3. The path to `wordcount.lua` so that words are counted + +``` yaml +title: Something +format: + html: # Regular built-in format + citeproc: false + filters: + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/citeproc.lua + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/wordcount.lua + jss-pdf: # Custom third-party format + citeproc: false + filters: + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/citeproc.lua + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/wordcount.lua +``` + ### Terminal output The word count will appear in the terminal output when rendering the @@ -209,8 +254,8 @@ appendix words from the rest of the text. So, as a (temporary?) workaround (until I can figure out how to make this Lua filter run after the creation of the appendix div?), you can -get a separate word count for the appendix by creating your own div with -the id `appendix-count`: +get a separate word count for the appendix by creating your own fenced +div with the id `appendix-count`: ``` markdown # Introduction @@ -328,10 +373,11 @@ end format: html: citeproc: false - filters: - - "/path/to/citeproc.lua" - - "/path/to/wordcount.lua" - - quarto + filters: + - at: pre-quarto + path: "path/to/citeproc.lua" + - at: pre-quarto + path: "path/to/wordcount.lua" ``` This creates a pandoc command that looks something like this, feeding @@ -341,19 +387,3 @@ word count script: ``` sh pandoc whatever.md --output whatever.html --lua-filter citeproc.lua --lua-filter wordcount.lua ``` - -Eventually [the Quarto team is planning on allowing filter options to -get injected at different stages in the rendering -process](https://github.com/quarto-dev/quarto-cli/issues/4113), so -someday we can skip the citeproc wrapper filter and just do something -like this: - -``` yaml -format: - html: - filters: - post: - - '/path/to/wordcount.lua' -``` - -But that doesn’t work yet. diff --git a/README.qmd b/README.qmd index 4f0738e..cd25535 100644 --- a/README.qmd +++ b/README.qmd @@ -44,13 +44,17 @@ This extension fixes all three of these issues by relying on a [Lua filter](_ext quarto add andrewheiss/quarto-wordcount ``` -Using {quarto-wordcount} requires Quarto version >= 1.3.0 +{quarto-wordcount} requires Quarto version >= 1.4.551 This will install the extension under the `_extensions` subdirectory. If you're using version control, you will want to check in this directory. ### Usage -You can specify one of three different output formats in your YAML settings: `wordcount-html`, `wordcount-pdf`, and `wordcount-docx`: +There are two ways to enable the extension: (1) as an output format and (2) as a filter. + +#### Output format + +You can specify one of four different output formats in your YAML settings: `wordcount-html`, `wordcount-pdf`, `wordcount-docx`: ```yaml title: Something @@ -58,7 +62,7 @@ format: wordcount-html: default ``` -The `wordcount-FORMAT` format type is really just a wrapper for each base format (HTML, PDF, and Word), so all other HTML-, PDF-, and Word-specific options work like normal: +The `wordcount-FORMAT` format type is really just a wrapper for each base format (HTML, PDF, Word, and Markdown), so all other HTML-, PDF-, Word-, and Markdown-specific options work like normal: ```yaml title: Something @@ -69,6 +73,35 @@ format: cap-location: margin ``` +#### Filter + +If you're using a [custom output format](https://quarto.org/docs/extensions/listing-formats.html) like [{hikmah-academic-quarto}](https://github.com/andrewheiss/hikmah-academic-quarto) or a [journal article format](https://quarto.org/docs/extensions/listing-journals.html) like [{jss}](https://github.com/quarto-journals/jss), you can't use the `wordcount-html` format, since you can't combine output formats. + +To enable word counting for *any* format, including custom formats, you can add the extension Lua scripts as filters. You need to specify three settings: + +1. `citeproc: false` must be set so that Quarto doesn't try to process citations +2. The path to `citeproc.lua` so that citations are processed before counting words---[this must come *before* `wordcount.lua`](#how-this-all-works) +3. The path to `wordcount.lua` so that words are counted + +```yaml +title: Something +format: + html: # Regular built-in format + citeproc: false + filters: + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/citeproc.lua + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/wordcount.lua + jss-pdf: # Custom third-party format + citeproc: false + filters: + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/citeproc.lua + - at: pre-quarto + path: _extensions/andrewheiss/wordcount/wordcount.lua +``` + ### Terminal output The word count will appear in the terminal output when rendering the document. It shows multiple values: @@ -136,7 +169,7 @@ In academic writing, it's often helpful to have a separate word count for conten However, Quarto's appendix-generating process comes *after* any custom Lua filters, so even though the final rendered document creates a div with the id "appendix", that div isn't accessible when counting words (since it doesn't exist yet), so there's no easy way to extract the appendix words from the rest of the text. -So, as a (temporary?) workaround (until I can figure out how to make this Lua filter run after the creation of the appendix div?), you can get a separate word count for the appendix by creating your own div with the id `appendix-count`: +So, as a (temporary?) workaround (until I can figure out how to make this Lua filter run after the creation of the appendix div?), you can get a separate word count for the appendix by creating your own fenced div with the id `appendix-count`: ````markdown # Introduction @@ -224,10 +257,11 @@ end format: html: citeproc: false - filters: - - "/path/to/citeproc.lua" - - "/path/to/wordcount.lua" - - quarto + filters: + - at: pre-quarto + path: "path/to/citeproc.lua" + - at: pre-quarto + path: "path/to/wordcount.lua" ``` This creates a pandoc command that looks something like this, feeding the document to the citeproc "filter" first, then feeding that to the word count script: @@ -235,15 +269,3 @@ This creates a pandoc command that looks something like this, feeding the docume ```sh pandoc whatever.md --output whatever.html --lua-filter citeproc.lua --lua-filter wordcount.lua ``` - -Eventually [the Quarto team is planning on allowing filter options to get injected at different stages in the rendering process](https://github.com/quarto-dev/quarto-cli/issues/4113), so someday we can skip the citeproc wrapper filter and just do something like this: - -```yaml -format: - html: - filters: - post: - - '/path/to/wordcount.lua' -``` - -But that doesn't work yet. diff --git a/_extensions/wordcount/_extension.yml b/_extensions/wordcount/_extension.yml index 1f0c821..72c7b19 100644 --- a/_extensions/wordcount/_extension.yml +++ b/_extensions/wordcount/_extension.yml @@ -10,11 +10,12 @@ contributes: shortcodes: - "words.lua" format: - common: - filters: - - citeproc.lua - - wordcount.lua - - quarto + common: + filters: + - at: pre-quarto + path: citeproc.lua + - at: pre-quarto + path: wordcount.lua citeproc: false html: default pdf: default diff --git a/template.qmd b/template.qmd index 6abd647..19645c3 100644 --- a/template.qmd +++ b/template.qmd @@ -3,9 +3,18 @@ title: Some title author: Some author date: last-modified -format: - wordcount-html: - toc: false +# Use as a custom format: +format: wordcount-html + +# Or use as a set of filters: +# format: +# html: +# filters: +# - at: pre-quarto +# path: path/to/citeproc.lua +# - at: pre-quarto +# path: path/to/wordcount.lua +# citeproc: false references: - id: Lovelace1842 diff --git a/tests/testthat/test-use-as-filter.R b/tests/testthat/test-use-as-filter.R new file mode 100644 index 0000000..cc3e610 --- /dev/null +++ b/tests/testthat/test-use-as-filter.R @@ -0,0 +1,15 @@ +test_that("using the extension as a filter works", { + test_file <- test_file_parts(here::here("tests/testthat/test-use-as-filter.qmd")) + + create_local_quarto_project(test_file = test_file) + + quarto::quarto_render(input = test_file$qmd, quiet = TRUE) + + counts <- get_wordcounts(test_file$md) + + expect_equal(counts$wordcount_appendix_words, 5) + expect_equal(counts$wordcount_body_words, 6) + expect_equal(counts$wordcount_note_words, 2) + expect_equal(counts$wordcount_ref_words, 34) + expect_equal(counts$wordcount_total_words, 47) +}) diff --git a/tests/testthat/test-use-as-filter.qmd b/tests/testthat/test-use-as-filter.qmd new file mode 100644 index 0000000..01a9b3e --- /dev/null +++ b/tests/testthat/test-use-as-filter.qmd @@ -0,0 +1,38 @@ +--- +title: Use as filter instead of output format + +format: + markdown: + filters: + - at: pre-quarto + path: _extensions/wordcount/citeproc.lua + - at: pre-quarto + path: _extensions/wordcount/wordcount.lua + citeproc: false + +references: +- id: Lovelace1842 + author: + - family: Lovelace + given: Ada Augusta + citation-key: Lovelace1842 + container-title: Taylor's Scientific Memoirs + issued: + - year: 1842 + language: en-GB + page: 666–731 + title: >- + Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, + officer of the military engineers, with notes upon the memoir by the + translator + type: article-journal + volume: 3 +--- + +Here are four words [@Lovelace1842].^[A note] + +::: {#appendix-count} + +There are five words here. + +:::