Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSL? #32

Closed
bdarcus opened this issue Mar 21, 2023 · 20 comments
Closed

CSL? #32

bdarcus opened this issue Mar 21, 2023 · 20 comments

Comments

@bdarcus
Copy link

bdarcus commented Mar 21, 2023

I'm admittedly biased, given I created it, but have you considered (also) supporting CSL citation styles and JSON input format?

I'm sure there are performance and other advantages to the rust-based styles, but there are thousands of CSL styles, as well as citeproc-rs.

@hcsch
Copy link

hcsch commented Mar 22, 2023

Perhaps TOML (or similar more simple languages, like JSON, as mentioned above) would also be a nice alternative to YAML for this use case, since YAML is known to be rather unintuitive/counter-intuitive in too many cases (see https://noyaml.com/ for example).

@kmaasrud
Copy link

Using a well-defined format like CSL JSON, you'd easily be able to create a data structure that can be serialized/deserialized from any input format. This is possible already with the citeproc-rs crate.

I agree with you @bdarcus, hayagriva really should leverage CSL, as the big contendors in bibliography management (notably Zotero) are fully in on it.

@reknih, you should definitely hit up @cormacrelf and try to leverage his work wherever possible. The Rust ecosystem would definitely benefit from a proper citation processing library---I know that at least me and likely the Tectonic guys are interested!

@bdarcus
Copy link
Author

bdarcus commented Mar 23, 2023

I dunno; I think toml would be a poor format for this purpose, and YAML is actually pretty good. Among other things, you can validate it's files using JSON schemas.

@clbarnes
Copy link

citeproc-rs is possibly dead; readme calls it WIP, maintainer seems to have been active in the last couple of months but no commits to master since 2021.

@kmaasrud
Copy link

citeproc-rs is possibly dead; readme calls it WIP, maintainer seems to have been active in the last couple of months but no commits to master since 2021.

Not very active, no, but it's still maintained, as evidenced by the PR opened 2 months ago that fixes a bug introduced by Rust 1.67. Also, it being part of the Zotero org, one would think it has some official weight.

That being said, Hayagriva is way more elegant IMO, and it actually exists on crates.io. Adapting to support CSL JSON shouldn't be that difficult (I am considering opening a PR for it.) CSL styles is another beast, though, but one that this library should definitely aim for supporting!

@bdarcus
Copy link
Author

bdarcus commented Mar 24, 2023

edited for clarity

Not very active, no, but it's still maintained, as evidenced by the PR opened 2 months ago that fixes a bug introduced by Rust 1.67. Also, it being part of the Zotero org, one would think it has some official weight.

I and a group of other CSL developers and contributors talked about status and strategy in general along with the Zotero folks last Summer (summary and further discussion here), and what I gather from that and from other discussions is:

  1. Zotero created and funded the project to create a replacement for citeproc-js, that is faster, more flexible, and easier to maintain and extend
  2. it is already included in Zotero, but AFAIK is still an optional engine
  3. still, after all this time, it's not on crates.io.

I don't understand the last point (cc @dstillman) , in part because I don't do Rust, and so can't really assess the codebase.

Cormac did mention during that meeting that he has maybe been held back a bit by perfectionism, but I'm not sure if it's that, or some technical issue(s) they've run into.

My impression is they're also skeptical they'll get much in the way of quality PRs (it's not code suitable, for example, for many amateur programmers); that there's not likely a market for this among other developers.

I'm more optimistic about the prospects for a community-developed Rust-based open source CSL processor :-)

It would help for the Zotero folks to communicate more clearly about this:

  1. What the future of citeproc-rs and Zotero is?
  2. whether and how they accept PRs
  3. when (not if, because it's not really an option for a robust Rust project not to be on crates.io) they plan to release the crate
  4. Bottom line: whether they're committed to it.

Absent answers, or of course if they simply say "sorry, this was an experiment, and it won't work for us", maybe some dedicated Rust developer(s) should just fork it?

That being said, Hayagriva is way more elegant IMO

In what way(s)?

... and it actually exists on crates.io.

Right.

Adapting to support CSL JSON shouldn't be that difficult (I am considering opening a PR for it.) CSL styles is another beast, though, but one that this library should definitely aim for supporting!

I will say in general :

  • citation and bibliography formatting is really complex
  • CSL reflects much of that complexity (though not all of it, because you sometimes have to draw a line)
  • so it's not trivial to implement
  • but given the number of solid implementations now (haskell, lua, rust, elisp; it's even supported in Emacs org-mode!) it's certainly doable! And creating and editing CSL XML styles is a lot easier than Rust code!

Final, much more speculative, point:

I created CSL around 2005, writing my first book.

I think it reflects sound decisions based on the technology landscape at that time; the decision to use XML and RELAX NG, to insist on output format independence and being language-agnostic, to make it suitable for hand-editing in schema-aware XML editors, and also subtle things like designing it in such a way that one could switch among radically different citation styles without editing document source.

Now, close to 20 years later, I am big on the idea of using things like machine learning to simply create language-independent styles from formatted output examples, so users don't have to edit styles at all.

inukshuk/anystyle#146

I could imagine if that could be perfected, it would open the door to different sorts of output options: CSL XML initially certainly, but also maybe formats better optimized for machine processing.

Alas, I have neither the time or the skill to explore that idea!

@kmaasrud
Copy link

In what way(s)?

@bdarcus From what I can glean: cleaner API, smaller and easier to understand codebase, all-in-all looks more elegant. This makes sense, as Hayagriva has a narrower audience of library consumers (essentially just themselves) and is newer.

@bdarcus
Copy link
Author

bdarcus commented Mar 25, 2023

News on citeproc-rs.

Basically they're stalled, with labor and technical hurdles, and need help to get the code in shape and released.

Another third-party developer is going to spend some time trying to figure if and how to do that.

@reknih
Copy link
Member

reknih commented Mar 25, 2023

Hi folks! I have already considered adding CSL, it's definitely on the roadmap!

It would be nice if I did not have to reimplement a Rust parser for CSL, is citeproc-rs up to the task?

@bdarcus
Copy link
Author

bdarcus commented Mar 25, 2023

It would be nice if I did not have to reimplement a Rust parser for CSL, is citeproc-rs up to the task?

IDK; "csl" is one of only two crates he actually released.

Parsing is easy; it's just XML after all.

It's the processing that's difficult.

@bdarcus
Copy link
Author

bdarcus commented Mar 26, 2023

Dan posted another more detailed follow-up on the technical status.

There's also the excellent Haskell based version I mentioned, which can effectively act like a JSON server.

https://github.com/jgm/citeproc/blob/master/man/citeproc.1.md

@bdarcus
Copy link
Author

bdarcus commented Apr 22, 2023

FWIW, I've been working on an experimental evolution of CSL in a typescript model; a commented YAML file of the current state.

As I say in the README, have no idea if this goes anywhere or not.

Late-May update: I've made quite a bit of progress on this, and realized in the process the typescript Style model can be auto-converted to Rust code to serialize and deserialize a style.

Here's a little demo repo that demonstrates:

https://github.com/bdarcus/csln-rs

EDIT: in looking at your YAML format now, I'm seeing your defining authors as a list of people? And assuming string parsing on those to get the components? If yes, that seems to leave out org authors.

@bdarcus
Copy link
Author

bdarcus commented Jun 2, 2023

@reknih when you and/or your other developers have a bit free time, can you take a look at this?

https://github.com/bdarcus/csln

It's a reimplementation of the csl-next draft model in pure Rust, with very tight coupling (thanks to serde) between the JSON schema input and internal model.

I'm pretty confident in that model, though it would need more review, testing, and iteration for me to be fully happy with it.

I'm much less confident in my programming skills, and the fact I'm a complete Rust newbie.

But I'm absolutely serious about building this out. I just need some help.

It should compile fine using the cargo, and I have it licensed under Mozilla 2.0, which I think should be compatible with your Apache option; probably not MIT. But my view on licenses is as a practical open source advocate. I choose the license simply because it's the same as citeproc-rs,

It's not quite pare with the typescript processor; here's an example of where I'm at:

target/debug/csln processor/examples/style.csl.yaml processor/examples/ex1.bib.yaml

Example result:

{
  "smith1": {
    "disamb-condition": false,
    "group-index": 1,
    "group-length": 1,
    "group-key": "Smith, Sam:2023-10"
  },

So the core of the processor at this point is a sorted bibliography vector, and this HashMap.

The next step is a function to iterate through the former and template, and use the latter to generate the pre-rendered AST.

bdarcus/csln#16

@reknih
Copy link
Member

reknih commented Jun 2, 2023

Hey @bdarcus, I recently started a CSL 1.0.2 XML parser and processor with typst/citationberg. Good to know that you are working on something for the next generation of CSL! What kind of help are you looking for?

@bdarcus
Copy link
Author

bdarcus commented Jun 2, 2023

@reknih

I recently started a CSL 1.0.2 XML parser and processor with typst/citationberg.

Oh, cool; didn't know!

How are you finding working with the XML?

What kind of help are you looking for?

I hadn't gotten around yet (since this is newer) to sketching out milestones, but the ones for the typscript project more-or-less apply.

https://github.com/bdarcus/csl-next/milestones?direction=asc&sort=due_date

There's still a lot of work to do on the processor, for example, and we need to figure out a way to convert 1.0 styles, which may or may not be a big task.

It may be useful for you to review the model now, and think about whether there's promise in using that, and simply converting 1.0 styles to it, if we can do it fairly losslessly?

I imagine your model would help a lot with that?

And perhaps there's a way to share code between the two projects?

This is admittedly not fully-developed at this point, but I think I've thought-through enough details that it should work out as I intend.

EDIT: I did try to sketch out where I see this going in some of the crate READMEs (for example, for cli).

On a more mundane level, since I'm a mediocre amateur programmer and rust newbie (though sometimes I think this gives me certain advantages compared to trained programmers), reviews of existing code and PRs to improve would be welcome :-)

@fredguth
Copy link

fredguth commented Oct 4, 2023

Any news on that? Still needs help? What kind of processing is needed? What makes this task difficult? (Rush newbie)

@reknih
Copy link
Member

reknih commented Oct 4, 2023

Thank you for asking. The following tasks are still open:

  • I am rewriting the frontend code for Hayagriva as well. Currently, I am moving the serialization and deserialization to serde which requires some manual trait impls. I aim to complete this today on a private branch and merge it with new-frontend.
  • The new-frontend branch needs to be merged with main and csl
  • On the csl branch, we need to translate CSL variables into Hayagriva field accesses. If need be, we need to introduce new fields.
  • The ChunkedStr needs to be integrated with the case folding in the lang module.
  • We need to fix and apply inheritable name options in both citationberg and Hayagriva, with a new context stack.
  • We need to generate citations and bibliographies with multiple elements, sort and disambiguate them according to the CSL spec.
  • We need to appropriately handle locales, including suppressing Title and Sentence case folding for non-English locales.
  • page-range-format, citation-specific options, and bibliography-specific options need to be implemented

@bdarcus
Copy link
Author

bdarcus commented Oct 4, 2023

What will be relationship between hayagriva and citationberg?

@reknih
Copy link
Member

reknih commented Oct 4, 2023

Citationberg parses CSL but makes no assumptions about how variables and data types are expressed within the consumer. Hayagriva will have a frontend to enter bibliographic information and be the CSL processor.

@reknih
Copy link
Member

reknih commented Oct 31, 2023

This has been shipped with 0.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants