-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word count with books #1
Comments
I wrote some code to compute and aggregate the counts per markdown file: # Computes the word count of each md file and appends results to tibble
# Assumes that the lua filters from https://github.com/andrewheiss/quarto-wordcount
# are in a path that is accessible.
get_word_count <- function() {
require(purrr)
md_files <- list.files(pattern = "\\.md$")
md_stubs <- gsub("\\.md$", "", md_files)
res <-
map(md_stubs, file_word_count) %>%
map2_dfr(md_stubs, parse_output)
res
}
# ------------------------------------------------------------------------------
# helpers
# Runs a system command to get the outputs
#' @param x the file name stub
#' @param path The location of the lua filters
file_word_count <- function(x, path = "word_counts") {
require(glue)
file_path <- tempfile()
file_path <- paste0(file_path, ".html")
cmd_text <- glue::glue("pandoc {x}.md --output {file_path} --lua-filter {path}/wordcount.lua --citeproc")
system(cmd_text, intern = TRUE)
}
# operates on all results
#' @param x the results of the filter
#' @param file_name the file name to add to the results
parse_output <- function(x, file_name) {
require(purrr)
has_numbers <- grepl("[[:digit:]]", x)
x <- x[has_numbers]
res <- purrr::map_dfr(x, extract_count)
res$file <- file_name
res
}
# converts filter results to tibble
#' @param x the file name
extract_count <- function(x) {
require(tibble)
split_up <- strsplit(x, " ")[[1]]
count <- as.integer(split_up[1])
desc <- paste(split_up[-1], collapse = " ")
tibble::tibble(count = count, type = desc)
} |
Sorry for the delay here! I ended up making a bunch of other major changes to the filter and I think I can figure out a solution here now The filter works on individual documents, since it converts each document to a pandoc AST and finds the word count from that. With things like Quarto books and websites, Quarto renders each of the files separately (and would get a separate word count for each) and then does whatever magic it uses to stitch them all together into a single document. In the case of HTML output, I'm 99% sure that the separate documents are never combined into a single AST, since both books and websites are, um, websites With PDF and Word output, though, there might be one unified AST prior to converting to the final output—I'll need to check that In any case, even if PDF/Word have a single combined AST to work with, it still might be easier to to something like the |
lol nope. With this content in
…if the word count filter is run on its own just on index.qmd, there are 14 words:
When it's run as a whole book, though, index.qmd suddenly has 32 words.
I don't know where they're coming from either. If I keep the intermediate md files: project:
type: book
book:
title: "blah-book"
author: "Norah Jones"
date: "6/3/2024"
chapters:
- index.qmd
- intro.qmd
- summary.qmd
- references.qmd
bibliography: references.bib
format:
html:
theme: cosmo
keep-md: true
citeproc: false
count-code-blocks: false
filters:
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/citeproc.lua
- at: pre-quarto
path: _extensions/andrewheiss/wordcount/wordcount.lua …the resulting intermediate
Quarto's collection of book-related Lua filters are doing something extra behind the scenes that I can't track down |
Thanks for making this.
If I want to use it with a book, how would that happen? It looks like the yml that you have goes into the individual documents (I think).
Is there a way to the options into
_quarto.yml
and get either a per-chapter count or one for the entire book?The text was updated successfully, but these errors were encountered: