-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Citations #32
Comments
I like the idea of djot having a simple unambiguous syntax for this that is less tricky to parse. It not only makes djot simpler and faster, but it also makes it easier for any future alternative implementations of djot to parse as well. |
@jgm , why do you suggest adding that The
|
|
Ah. I'm not very familiar with citations. Thanks. |
This syntax seems very natural to me. Same goes for using Djot will be a perfect fit for academic writing---a natural continuation of Pandoc Markdown, which many (including me) are using in academia today. Thus, having a well-defined citation syntax seems very important to me. What will it take to implement this? I would be happy to help if I can! |
Org-Mode, another markup language added citation support in 9.5. In that release they added the following syntax to markup a citation:
Which would render as (Key123 2000, pp. 13; Key982 2009 chap. 1), for example.
The blog post from a contributor to Org-Mode lays it all out better than I could ever do in a GH issue: Crucially, this kind of syntax would allow people to set different styles on each citation, which it seems is not (easily) accomplished in the discussed syntax proposal. |
That looks similar to what @jgm is proposing and the current syntax used by pandoc-citeproc, just with an english-defined syntax (using the word I'm still in favour of encapsulating a citation fully in square brackets for easy parsing, and I think the choice of |
The org-mode syntax (which draws on and extends the pandoc syntax) gets more flexibility (different styles) at the price of verbosity and English-language keywords. So each has its drawbacks and its advantages. |
From the currect proposal this: In [+@Smith2014 page 21-23] he talks about... Turns into this: In Smith (2014, pp. 21--23) he talks about... Whereas to do the syntax ala Org-Mode: In [cite/t:@Smith2014 page 21-23] he talks about... While Org-Mode's syntax is longer winded, it is more flexible, [-@Key] /* nocite (For inclusion the printed bibliography) */
[+@Key] /* in text cite (Smith (pp. 21-23)) */
[/@Key] /* author name citation (Smith) */ Perhaps this makes more sense for djot? |
@NotAFedoraUser that is very clever! Along with |
@jgm :
I hope both will be avoided! |
Just came across @jgm - I just thought I'd remind you about one wrinkle we stumbled on in E.g. what happens if you have more than one reference in a citation with your proposed examples (the first example being where the author lists differs, and second where they don't)?
|
@bdarcus the proposal floated above was to use + for author-in-text citations. The thought was that it would go at the beginning of the citation list, thus
which would be equivalent to pandoc's
I hadn't envisioned allowing it to be put on subsequent items, and I'm not sure what sense that would make. |
@jgm - in that case, I think I misunderstood, and it's a property of the citation as a whole, which is I think right. |
One other difference between It's useful when you have a multi-cite, and a style may sort the references within the citation.
So presumably in
|
Yes, I think that would be a good approach. However, citeproc doesn't currently support two levels of affixes, so I don't know what we'd do with this. |
Maybe a simple heuristic to flatten them (like merge with the affix of the nearest reference affix?), and later add support to You may already have to do something similar when dealing with |
Is this issue pretty much resolved; just needs to be implemented? And maybe also relies on #35? I've been working on a project I have been planning from the beginning to integrate with this once it's available. https://github.com/bdarcus/csl-next ATM, I have my own AST, which is basically the new style input template model enhanced with rendered data (current example bibliography reference below), but I'm hoping it should be pretty easy to integrate with djot; both for document processing as a whole, and also to allow djot markup within field strings. [
[ { contributors: "author", procValue: "Doe, Jane" } ],
{
date: "issued",
format: "year",
wrap: "parentheses",
procValue: "2023b"
},
[ { title: "title", procValue: "The Title" } ],
undefined,
undefined
] |
I wouldn't call it resolved! There are still a lot of choice points. |
About the citation model/syntax itself, or other related issues?
|
the former |
So what are those outstanding questions? I suppose one, that you may or may not have been thinking about, is locators: string + string parsing (as with the pandoc syntax and most current other examples), vs more structured. For the project I'm working on, I just merged this, which actually isn't too bad in YAML: suffix: [see, page: 23, section: V] But I guess the pandoc optional brackets basically is the same. I guess another, that came up with org-cite, is where to allow markup within the citation? |
There are lots of questions. Do we want to support a huge range of variants like org? If so, how do we do that without English language keywords? How are prefixes and suffixes handled? How are locators handled? Do we use localized locator labels as in pandoc? How are locators distinguished from other suffix content? I don't have a lot of time right now to work on this, but this should give some idea. |
Note: I edited this a bit much later to add something I missed earlier on affixes. Since I'm thinking about and working on this area ATM, my thoughts:
This is indeed the big question, since it's hard to reverse later. My impulse is to say no, and just have two styles/commands; what in the academic literature on this are called:
These notions are very general, more so than in the TeX world, and for that reason should go fairly far. EDIT: the caveat is some of the variants in the LaTeX world are for handling capitalization, which the above would not. EDIT: Implementing the citation model now; here's for now how I'm dealing with this. pub enum CitationModeType {
/// Places the author inline in the text; also known as "narrative" or "in text" citations.
Integral,
/// Places the author in the citation and/or bibliography or reference entry.
#[default]
NonIntegral,
} But I could also see:
Do something like org-cite, but use single characters. But that has its own trade-offs.
I think you're referring to this above? In any case, yes, this is another decision point: affixes only or individual citation references (as in pandoc), or also for the citation as a whole (as in org-cite and biblatex). Per my comment there, I'd prefer the latter, because the cost is low, and the benefit in terms of flexibility for users high.
In my in-progress project (which I'm now focusing on a Rust implementation; just haven't done the citation part yet), here's the typescript definitions for locators. export type Locator = Record<LocatorTerms, string> | string;
type LocatorTerms =
| "book"
| "chapter"
| "column"
| "figure"
| "folio"
| "number"
| "line"
| "note"
| "opus"
| "page"
| "paragraph"
| "part"
| "section"
| "sub-verbo"
| "verse"
| "volume"; In YAML: suffix: [see, page: 23, section: V] But that's a format more for machines; not humans. E.g. it's what the djot markup might be converted into. This is another tricky area; my impulse is just to do what you've done in pandoc. Do you see any glaring problems with that? |
The pandoc way has worked pretty well. There are occasional requests for more expressive power, but it seems enough for most users. |
Based on my personal experience of academic writing, I concur. The less complexity, the better; that'll keep it simpler for implementors. |
For my own purposes, I started adding the citation format specified in this issue into my own
I know it's quite possible that something will block this from making it into the djot spec any time soon, but I thought I'd ask given that I am implementing anyway, and maybe that implementation will make it into |
For parsing, we just need to specify the syntax of citations and a corresponding AST element. For rendering: that's a matter of what we do with the citations. Here djot itself could be neutral, but I think the most powerful thing to do would be what pandoc does: use a citeproc processor to create citations and bibliography using a CSL stylesheet and external references. (Here in a Haskell implementation you could simply use my |
Re providing a way to put citations inside the document itself: pandoc does allow this, in a |
Random quick thoughts:
The advantage of that is that, like djot, CSL is agnostic about output format. So it's a good match. I guess the question is how closely and formally they are tied. Someone that primarily targets LaTeX might want to bypass CSL and use bibtex/biblatex. Also, I do have ambitions of finishing my CSLN project and hooking it up to djot, so hopefully there's room for that sort of alternative. |
I don't think specifying a syntax for citations (and perhaps reference lists) requires tying djot to any particular mode of rendering citations. Pandoc's citations, for example, can be rendered using CSL or natbib or bib latex or org-cite, depending on command line options. |
I know likely premature ATM, but since you've been working on it ...
Was looking at the test cases, and just wondering about one design question we hadn't settled. From what I can tell, your implementation follows the pandoc way; no global affixes? The concrete question that issue raises is what happens if you have a citation like this, and the citation processor is using a style that requires reordering the references within the citation by date issued? [see @doe24; @doe20] Without global affixes, you either end up with something like this (which is simple wrong; the author here is intending to list multiple references to "see"): (Doe, 2020; see Doe 2024) ... or you require the user to track that order, AND adjust it if the citation style changes. So in org-mode (which is an iteration of the pandoc model and syntax), for example, you would do: [cite: see; @doe24; @doe20] My argument has been it's a niche feature important in some fields (notably in the humanities and social sciences), but that adding it to djot is low-cost for users and developers alike. Regardless, you probably want to include some prefixes in the test cases? |
@bdarcus citeproc doesn't have a notion of global affixes, does it? (by the way, the way citeproc-hs handles this is just by blocking re-ordering around an affix; at least that prevents misleading things from appearing.) |
@jgm:
You mean It does not. It's an iteration that made it into org-cite. So it's supported in the org
I hadn't thought of that, but that might be a reasonable alternative. |
just for the record, I've moved forward to support |
We need a syntax for citations that can be plugged into citeproc-lua or sent to pandoc for processing.
Pandoc's citation syntax seems a good basis. One thing we might change would be the syntax for author-in-text citations, which is currently a bit tricky to parse, because it requires lookahead.
Perhaps instead of
we should have something like
The text was updated successfully, but these errors were encountered: