-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Add" CSL YAML #278
Comments
Is pandocs CSL YAML actually identical to CSL JSON? I'm not sure, but don't they differ at least in some aspects. "issued":{"date-parts":[[2015,4,2]]}, Whereas the YAML: issued:
- year: 2015
month: 4
day: 2 (I think I've tried to use the JSON schema for autocompletion and validation once, but I wasn't so lucky. Using Atom or VS Code as plain text reference managers could be very nice for some projects...) |
I hadn't checked, but wondering if devs like @jgm would find value in this. Advantage is we have one schema, that is always in sync with the CSL spec. |
People definitely seem to like YAML in my research community. Easier to manually edit if needed, like BibTeX or RIS. |
Exactly. YAML is easily hand-editable. JSON isn't. |
And with auto-completion it would be even better! |
@retorquere Do you have any input here? |
|
To the best of my knowledge, this is the only difference. |
No, not exactly. In addition to the difference noted (and maybe others which I've forgotten), the YAML bibliographies read by pandoc can have arbitrary pandoc markdown formatting. (And NOT the CSL HTML-ish formatting.) So it's not just a YAML translation of CSL JSON. As I develop my new citeproc library, I may change things a bit to line things up more, while preserving backwards compatibility. For example, I think a |
When we originally designed the json, focus was on machines.
But given evolution since, now might be a time to rethink some of the
decisions, so we end with a solid representation well-suited to humans?
|
@jgm but the html markup that csl-json supports is also valid markdown, so for export, there'd be no problem. Edit: wait, does pandoc only support markdown tags, and not the html tags? What other tools besides pandoc read csl-yaml? |
I think accepting dates in either format would be fine in either schema. We just need to specify where the order of priority for redundant parts is (cf. there is an issue raised that we should specify in With respect to markdown, I'm leery about making that universal, but I think we could add a flag to the data indicating that the data should be read as markdown. |
@retorquere I think he is saying that pandoc CSL YAML supports markdown syntax in addition to HTML syntax. I'm a little concerned about assuming that, for example, |
And the markdown escapes of those and others of course. Even if you only use html for markup. I hadn't thought of that before and I'll have to think about what to do for BBTs csl-yaml export. I think it'd need to be explicitly marked if you want markdown processing, or markdown would have to be the default for csl-yaml. I'd rather not deal with ambiguity. Are there other csl-yaml processors? If not, then I could have BBT default to markdown. |
As the main consumer of CSL YAML is currently and I suspect will remain pandoc, I think we could make the default markdown with an option to disable? pandoc interprets the HTML markup, so I'd suggest BBT not worry about translating the HTML tags into markdown. |
I'm not worried about those, but about whether to escape if I find |
I read John to say above that he doesn't support the HTML. |
If a user is generating CSL YAML with BBT, I'd expect that not escaping markdown characters (BBT's current behavior) is the better default. |
I checked. The HTML markup is supported in CSL YAML by pandoc (which makes sense because it is valid HTML). |
Did you get YAML auto-completion working with vscode? Might be cool if we could have CSL extensions for both vscode and, per the thing I started, atom, so we could give people easy-to-install auto-completing editors for CSL styles and data. |
Both yes! |
What doesn't work in vs code is style validation... |
Seems vscode doesn't support relaxng validation, and is dependent on this issue to add it. |
Are RNC and XSD feature compatible? There are numerous XSD validators for vsscode? It might be possible to automatically generate an unofficial XSD schema from the RNC for use with editors. |
Well, pandoc will pass through raw HTML as "RawInline" elements. And these will be emitted in HTML output. But if you target, say, LaTeX, they'll just be omitted. So it's not really supported. |
That and the fact that if you assume markdown, |
I've added a new linked issue to the documentation repo, specific to the sub-field formatting discussion that has mostly been the focus of this thread.
Hoping to do a PR later today so we can get more concrete. We've discussed all the issues, and now know enough to be specific, I think.
For this issue, whether to add a YAML representation that validates against the JSON schema, I think we should keep this open, and I think we should see if we can get it to work.to everyone's satisfaction.
I expect if we do this, it will result in one or more PRs on the JSON schema (say to @jgm's point on date representation), and possibly one on the documentation repo (simply to mention the YAML format, and that one can validate it against the JSON schema).
I've updated the top post to reflect what I think are next steps.
|
@bdarcus -- So I guess this means limiting abstracts to one paragraph without any block-level formatting (no tables, lists, figures, etc.). This seems reasonable but I'm not up on abstract customs in different fields. You're also excluding hyperlinks, which would not be normal in a title but might appear in an abstract. Math is tricky. I see the point of passing it through directly to the output format. However, if you're working with MathJax you often need to escape One approach would be to have a Another approach would be to just insist on MathML. Note that MathML can include an |
Thinking about Zotero–pandoc compatibility as the major concern I have, that usually happens via interface with BBT. If there is a defined set of HTML-like tags that Zotero supports, I think BBT or similar export translators could convert those tags to Markdown syntax fairly easily. For math, I don’t see a word processor plugin directly supporting math input. But, if a user stored it in TeX, they could convert that to a Word OOML equation as a simple post processing step, and it would work out of the box with pandoc. |
I have no firm position against these. I just wanted to keep this moving forward, and wasn't really focused on those cases because they're not the primary requirement for manuscript preparation. But certainly we should consider them. |
The two fields where this might come into play are Abstracts often have subheadings—those are usually set with bold, rather than heading markers. Abstracts probably won’t have lists or tables, but |
This thing I said above didn't really make sense — it would work for a one-off bibliography but not if we were embedding CSL-JSON in a document for future processing.
I don't think passing anything through verbatim (as in, not processed according to the output format) actually works — depending on the output format, it could very well mean invalid/unescaped markup, and if the calling application doesn't know about it and deal with it appropriately, it's potentially a security flaw.
I don't think processing MathML should be the citation processor's job, for the reasons I give above: the output format abilities may not be the same as the target application abilities, it would require a duplicate bundled math processor, and it just seems generally unreasonable to ask of a citation processor. But a version of this might work:
|
For the case of Zotero's Word integration, would either of those solutions enable, for example, the title and abstract of this item to appear in Word as a math environment equation? |
I actually kind of doubt it — I suspect you can't embed an equation element in the text of a Word field, which is what we would need to do. So while I don't know for sure, realistically |
It seems like math, tables, lists might be beyond the scope of CSL; these might be things that we recommend applications support (e.g., math everywhere, tables and lists in abstract and note), but that is really up to the application to define? |
@dstillman UnicodeMath would be a good compromise to be able to convert unlinked citations/bibliographies to equations with one click or a macro |
As part of #278, and to harmonize the JSON and YAML representations around a much more concise and expressive date format, this adds a an option to use EDTF; either as a preferred string on any date, or as an "edtf" string property on the more verbose alternative object representation. While EDTF was originally an initiative of the US Library of Congress, ISO adopted it as part of 8601-2 in 2019. Note: The current regular expression pattern only checks for valid characters.
Date issue now hopefully solved with the EDTF addition.
Do we want to add the optional property for the markup that @bwiernik
suggested?
If yes, suggested values?
- html subset (default? or maybe this is default on json, and markdown on
yaml?)
- markdown (is this the value for the pandoc syntax too?)
- org
The above make sense because they have citation support in their ecosystems
(and org will be getting native citation support soon).
Not sure if any others would apply?
LaTeX, but that seems a PITA to support, and superfluous given
bibtex/biblatex?
|
Could we just leave that an open field and leave it up to processors to designate the markup they support? |
Not sure, but I suppose.
|
@larsgw suggested in this comment that we consider having two input schemas: one for humans (yaml + edtf), and the other for machines (json + structured data object). I wasn't sure how easy or possible this was in json schema, Edit: no, it's not possible. it seems. In that case, we should probably just continue as planned. |
Also, @jgm, am I correct that your current date model supports ranges? If yes how do you define an open-ended range? |
It seems that this works with pandoc-citeproc to specify an open range:
But I wouldn't worry too much about my data model, since I'm planning to transition eventually to the new citeproc library I'm writing. It already passes more citeproc tests than pandoc-citeproc, and it's much faster and more maintainable. It uses the date-parts model that is part of current CSL. |
Wow, that was fast. Do you already have more concrete plans when we can expect the new library?
|
…language#284) As part of citation-style-language#278, and to harmonize the JSON and YAML representations around a much more concise and expressive date format, this adds a an option to use EDTF; either as a preferred string on any date, or as an "edtf" string property on the more verbose alternative object representation. While EDTF was originally an initiative of the US Library of Congress, ISO adopted it as part of 8601-2 in 2019. Note: The current regular expression pattern only checks for valid characters.
Here's what I have in #301 @jgm: issued:
- date-parts:
- 2000
- {} So it merges your model and the 1.0 JSON model to match the EDTF model (which is a I believe date parts is better as an object (as you have), but I guess for compatibility we should keep the array. Anyone want to make the argument we should change this too? If yes, please state your case on #301. If not, we'll keep as is. The human-readable preference, of course, would be the preferred EDTF string: issued: 2000/.. |
This adds a definitionf for rich-text variables, and title-string definitions that uses that variables, and then redefines all title and other fields to use these definitions. Addresses in part #278
This adds a definitionf for rich-text variables, and title-string definitions that uses that variables, and then redefines all title and other fields to use these definitions. Addresses in part #278
This adds an experimental csl-rich-text.yaml schema that defines a structure for rich-text formatting in JSON. Also adds a definition for rich-text variables, and title-string definitions that uses that variable, and then redefines all title and other fields to use these definitions. So it will be easy to merge the rich text support in the future. Addresses in part #278
This adds an experimental csl-rich-text.yaml schema that defines a structure for rich-text formatting in JSON. Also adds a definition for rich-text variables, and title-string definitions that uses that variable, and then redefines all title and other fields to use these definitions. So it will be easy to merge the rich text support in the future. Addresses in part #278
This adds an experimental csl-rich-text.yaml schema that defines a structure for rich-text formatting in JSON. Also adds a definition for rich-text variables, and title-string definitions that uses that variable, and then redefines all title and other fields to use these definitions. So it will be easy to merge the rich text support in the future. Addresses in part #278
As part of #278, and to harmonize the JSON and YAML representations around a much more concise and expressive date format, this adds a an option to use EDTF; either as a preferred string on any date, or as an "edtf" string property on the more verbose alternative object representation. While EDTF was originally an initiative of the US Library of Congress, ISO adopted it as part of 8601-2 in 2019. Note: The current regular expression pattern only checks for valid characters.
Now close to two years later, I merged today #420, with examples of validating completion against the current v1.1 branch version of the schema (that allows EDTF for dates). It actually works pretty well for humans and machines, I'd say. Much of this long thread contains very useful thoughts on a more narrow aspect of this; the question of the markup, etc. within the fields. #315 was an experiment for that, though I have no idea if the idea is any good. |
This and this suggests we can use our JSON schemas to validate a YAML alternative.
Pandoc already supports a YAML alternative (cc @jgm).
I suggest we do something with this, since it's zero work for us, and would give more options for users and developers.
Perhaps most sensible option is just adding a sentence to the spec that mentions this possibility, without requiring implementations to support it?
Edit: actually, we say nothing about input in the spec currently. So we would need to add a section on input data, and say that our schema can validate either json or yaml.
Proposal
Based on this discussion, we should:
markup: org
.Originally posted by @bdarcus in #277 (comment)
The text was updated successfully, but these errors were encountered: