Skip to content

Commit

Permalink
Allow extension to be used as a filter instead of format
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewheiss committed Jun 3, 2024
1 parent 3236fc7 commit 316bfcc
Show file tree
Hide file tree
Showing 6 changed files with 170 additions and 55 deletions.
84 changes: 57 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,15 +82,20 @@ section](#how-this-all-works) to understand… um… how it works.
quarto add andrewheiss/quarto-wordcount
```

Using {quarto-wordcount} requires Quarto version \>= 1.3.0
{quarto-wordcount} requires Quarto version \>= 1.4.551

This will install the extension under the `_extensions` subdirectory. If
you’re using version control, you will want to check in this directory.

### Usage

You can specify one of three different output formats in your YAML
settings: `wordcount-html`, `wordcount-pdf`, and `wordcount-docx`:
There are two ways to enable the extension: (1) as an output format and
(2) as a filter.

#### Output format

You can specify one of four different output formats in your YAML
settings: `wordcount-html`, `wordcount-pdf`, `wordcount-docx`:

``` yaml
title: Something
Expand All @@ -99,8 +104,8 @@ format:
```
The `wordcount-FORMAT` format type is really just a wrapper for each
base format (HTML, PDF, and Word), so all other HTML-, PDF-, and
Word-specific options work like normal:
base format (HTML, PDF, Word, and Markdown), so all other HTML-, PDF-,
Word-, and Markdown-specific options work like normal:

``` yaml
title: Something
Expand All @@ -111,6 +116,46 @@ format:
cap-location: margin
```

#### Filter

If you’re using a [custom output
format](https://quarto.org/docs/extensions/listing-formats.html) like
[{hikmah-academic-quarto}](https://github.com/andrewheiss/hikmah-academic-quarto)
or a [journal article
format](https://quarto.org/docs/extensions/listing-journals.html) like
[{jss}](https://github.com/quarto-journals/jss), you can’t use the
`wordcount-html` format, since you can’t combine output formats.

To enable word counting for *any* format, including custom formats, you
can add the extension Lua scripts as filters. You need to specify three
settings:

1. `citeproc: false` must be set so that Quarto doesn’t try to process
citations
2. The path to `citeproc.lua` so that citations are processed before
counting words—[this must come *before*
`wordcount.lua`](#how-this-all-works)
3. The path to `wordcount.lua` so that words are counted

``` yaml
title: Something
format:
html: # Regular built-in format
citeproc: false
filters:
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/citeproc.lua
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/wordcount.lua
jss-pdf: # Custom third-party format
citeproc: false
filters:
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/citeproc.lua
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/wordcount.lua
```

### Terminal output

The word count will appear in the terminal output when rendering the
Expand Down Expand Up @@ -209,8 +254,8 @@ appendix words from the rest of the text.

So, as a (temporary?) workaround (until I can figure out how to make
this Lua filter run after the creation of the appendix div?), you can
get a separate word count for the appendix by creating your own div with
the id `appendix-count`:
get a separate word count for the appendix by creating your own fenced
div with the id `appendix-count`:

``` markdown
# Introduction
Expand Down Expand Up @@ -328,10 +373,11 @@ end
format:
html:
citeproc: false
filters:
- "/path/to/citeproc.lua"
- "/path/to/wordcount.lua"
- quarto
filters:
- at: pre-quarto
path: "path/to/citeproc.lua"
- at: pre-quarto
path: "path/to/wordcount.lua"
```
This creates a pandoc command that looks something like this, feeding
Expand All @@ -341,19 +387,3 @@ word count script:
``` sh
pandoc whatever.md --output whatever.html --lua-filter citeproc.lua --lua-filter wordcount.lua
```

Eventually [the Quarto team is planning on allowing filter options to
get injected at different stages in the rendering
process](https://github.com/quarto-dev/quarto-cli/issues/4113), so
someday we can skip the citeproc wrapper filter and just do something
like this:

``` yaml
format:
html:
filters:
post:
- '/path/to/wordcount.lua'
```
But that doesn’t work yet.
62 changes: 42 additions & 20 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -44,21 +44,25 @@ This extension fixes all three of these issues by relying on a [Lua filter](_ext
quarto add andrewheiss/quarto-wordcount
```

Using {quarto-wordcount} requires Quarto version >= 1.3.0
{quarto-wordcount} requires Quarto version >= 1.4.551

This will install the extension under the `_extensions` subdirectory. If you're using version control, you will want to check in this directory.

### Usage

You can specify one of three different output formats in your YAML settings: `wordcount-html`, `wordcount-pdf`, and `wordcount-docx`:
There are two ways to enable the extension: (1) as an output format and (2) as a filter.

#### Output format

You can specify one of four different output formats in your YAML settings: `wordcount-html`, `wordcount-pdf`, `wordcount-docx`:

```yaml
title: Something
format:
wordcount-html: default
```
The `wordcount-FORMAT` format type is really just a wrapper for each base format (HTML, PDF, and Word), so all other HTML-, PDF-, and Word-specific options work like normal:
The `wordcount-FORMAT` format type is really just a wrapper for each base format (HTML, PDF, Word, and Markdown), so all other HTML-, PDF-, Word-, and Markdown-specific options work like normal:

```yaml
title: Something
Expand All @@ -69,6 +73,35 @@ format:
cap-location: margin
```

#### Filter

If you're using a [custom output format](https://quarto.org/docs/extensions/listing-formats.html) like [{hikmah-academic-quarto}](https://github.com/andrewheiss/hikmah-academic-quarto) or a [journal article format](https://quarto.org/docs/extensions/listing-journals.html) like [{jss}](https://github.com/quarto-journals/jss), you can't use the `wordcount-html` format, since you can't combine output formats.

To enable word counting for *any* format, including custom formats, you can add the extension Lua scripts as filters. You need to specify three settings:

1. `citeproc: false` must be set so that Quarto doesn't try to process citations
2. The path to `citeproc.lua` so that citations are processed before counting words---[this must come *before* `wordcount.lua`](#how-this-all-works)
3. The path to `wordcount.lua` so that words are counted

```yaml
title: Something
format:
html: # Regular built-in format
citeproc: false
filters:
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/citeproc.lua
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/wordcount.lua
jss-pdf: # Custom third-party format
citeproc: false
filters:
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/citeproc.lua
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/wordcount.lua
```

### Terminal output

The word count will appear in the terminal output when rendering the document. It shows multiple values:
Expand Down Expand Up @@ -136,7 +169,7 @@ In academic writing, it's often helpful to have a separate word count for conten

However, Quarto's appendix-generating process comes *after* any custom Lua filters, so even though the final rendered document creates a div with the id "appendix", that div isn't accessible when counting words (since it doesn't exist yet), so there's no easy way to extract the appendix words from the rest of the text.

So, as a (temporary?) workaround (until I can figure out how to make this Lua filter run after the creation of the appendix div?), you can get a separate word count for the appendix by creating your own div with the id `appendix-count`:
So, as a (temporary?) workaround (until I can figure out how to make this Lua filter run after the creation of the appendix div?), you can get a separate word count for the appendix by creating your own fenced div with the id `appendix-count`:

````markdown
# Introduction
Expand Down Expand Up @@ -224,26 +257,15 @@ end
format:
html:
citeproc: false
filters:
- "/path/to/citeproc.lua"
- "/path/to/wordcount.lua"
- quarto
filters:
- at: pre-quarto
path: "path/to/citeproc.lua"
- at: pre-quarto
path: "path/to/wordcount.lua"
```
This creates a pandoc command that looks something like this, feeding the document to the citeproc "filter" first, then feeding that to the word count script:
```sh
pandoc whatever.md --output whatever.html --lua-filter citeproc.lua --lua-filter wordcount.lua
```

Eventually [the Quarto team is planning on allowing filter options to get injected at different stages in the rendering process](https://github.com/quarto-dev/quarto-cli/issues/4113), so someday we can skip the citeproc wrapper filter and just do something like this:

```yaml
format:
html:
filters:
post:
- '/path/to/wordcount.lua'
```
But that doesn't work yet.
11 changes: 6 additions & 5 deletions _extensions/wordcount/_extension.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@ contributes:
shortcodes:
- "words.lua"
format:
common:
filters:
- citeproc.lua
- wordcount.lua
- quarto
common:
filters:
- at: pre-quarto
path: citeproc.lua
- at: pre-quarto
path: wordcount.lua
citeproc: false
html: default
pdf: default
Expand Down
15 changes: 12 additions & 3 deletions template.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,18 @@ title: Some title
author: Some author
date: last-modified

format:
wordcount-html:
toc: false
# Use as a custom format:
format: wordcount-html

# Or use as a set of filters:
# format:
# html:
# filters:
# - at: pre-quarto
# path: path/to/citeproc.lua
# - at: pre-quarto
# path: path/to/wordcount.lua
# citeproc: false

references:
- id: Lovelace1842
Expand Down
15 changes: 15 additions & 0 deletions tests/testthat/test-use-as-filter.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
test_that("using the extension as a filter works", {
test_file <- test_file_parts(here::here("tests/testthat/test-use-as-filter.qmd"))

create_local_quarto_project(test_file = test_file)

quarto::quarto_render(input = test_file$qmd, quiet = TRUE)

counts <- get_wordcounts(test_file$md)

expect_equal(counts$wordcount_appendix_words, 5)
expect_equal(counts$wordcount_body_words, 6)
expect_equal(counts$wordcount_note_words, 2)
expect_equal(counts$wordcount_ref_words, 34)
expect_equal(counts$wordcount_total_words, 47)
})
38 changes: 38 additions & 0 deletions tests/testthat/test-use-as-filter.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Use as filter instead of output format

format:
markdown:
filters:
- at: pre-quarto
path: _extensions/wordcount/citeproc.lua
- at: pre-quarto
path: _extensions/wordcount/wordcount.lua
citeproc: false

references:
- id: Lovelace1842
author:
- family: Lovelace
given: Ada Augusta
citation-key: Lovelace1842
container-title: Taylor's Scientific Memoirs
issued:
- year: 1842
language: en-GB
page: 666–731
title: >-
Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea,
officer of the military engineers, with notes upon the memoir by the
translator
type: article-journal
volume: 3
---

Here are four words [@Lovelace1842].^[A note]

::: {#appendix-count}

There are five words here.

:::

0 comments on commit 316bfcc

Please sign in to comment.