-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal code needed for declarative usage #247
Comments
I wrote |
Hi @fbennett. Thanks for offering to help! By "in-text" I mean inline citations, what is produced by By "full" I mean what would go into a bibliography or references list. I'm not sure if I'm wondering if a "citation cluster" is an in-text citation with one or more references? I do like the objects returned by Now that I'm looking over the docs again, I might be grokking it better. This is what I think I'm seeing: The I think this creates an enormous overhead for my needs. If I know my style and locale ahead of time and know I'm going to use those, I would like to be able to instantiate a library that can directly create citations without having to know how to go fetch styles and locales. Maybe we could consider making the code more modular? Something I would be willing to help with. |
Thanks. To answer the initial questions:
* "citation cluster" is indeed one or more citations. Some inline styles
sort a group of online citations in particular ways, so they must be
handled together. If you're rendering individual citations only, that's not
relevant, but the same function is used.
* The processor is instantiated for a specific locale and style. The
uninstantiated object is general, but an instance is pegged to a particular
style.
* If the `sys` helper functions are provided for your environment,
`makeCitationCluster`, `processCitationCluster`, and `makeBibliography`
should be able to do their business.
Whether to use `makeCitationCluster` or `processCitationCluster` depends on
your requirements. The former will work if you have no need for
back-references and you are batch-processing the document (i.e. there's no
need for dynamic editing as in a word processor).
…On Tue, Aug 27, 2024, 1:53 AM Ariel Balter ***@***.***> wrote:
Hi @fbennett <https://github.com/fbennett>. Thanks for offering to help!
By "in-text" I mean inline citations, what is produced by Cite.format('citation',
... in citation.js. This is probably close to a "citation cluster". For
example *(Loomes, 2017, pp. 23-27)*.
By "full" I mean what would go into a bibliography or references list. I'm
not sure if citeproc.makeBibliography(filter) returns single full
citations or only a full bibliography (the entire library). I don't know
what the filter variable is.
https://help.quillbot.com/hc/en-us/articles/4408078736023-What-is-the-difference-between-in-text-citations-and-full-citations
I'm wondering if a "citation cluster" is an in-text citation with *one or
more references*?
I do like the objects returned by makeCitationCluster (here
<https://www.fidgetech.org/>) and makeBibliography (here
<https://www.fidgetech.org/>).
Now that I'm looking over the docs again, I might be grokking it better.
This is what I think I'm seeing:
The citeproc instance is initialized a *library* of sources (CSL-JSON)
and *the ability to format citations and references in ANY style or
locale specified*. This is mediated by the "sys" function.
I think this creates an enormous overhead for my needs. If I know my style
and locale ahead of time and know I'm going to use those, I would like to
be able to instantiate a library that can directly create citations without
having to know how to go fetch styles and locales.
Maybe we could consider making the code more modular? Something I would be
willing to help with.
—
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASMSSLL6RKLSKIKCTTYTTZTNMSHAVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGY2DMMJVHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
First let me say that I understand if this all sounds very critical. Just having CSL, Citeproc-JS, and Citation-JS is an amazing thing! I can see an immense amount of work went into creating the specs and writing the code. It's a huge boon to the academic world. I do find both of the JS libraries to be quite difficult to use and the code looks like it could possibly be a lot simpler if it were modularized. For example, it would be fantastic if there was a single function that received a single source, style, and locale all as JSON or JavaScript objects and returned an inline citation. But that functionality appears to be entangled with other operations. Although I could be wrong about that. Maybe what I'm actually suggesting is a feature request:
Where And, why not just default to the Citation Style Language style and locale specs? Alternatively, a default I guess I just need to write my own like this: sys = {
fetchFile: async function(url) {
try {
const response = await fetch(url);
if (!response.ok) {
throw new Error('Network response was not ok');
}
const data = await response.text(); // or response.json(), response.blob() etc.
return data; // return the fetched data
} catch (error) {
console.error('There has been a problem with your fetch operation:', error);
}
},
loadLibrary: async function(library){
var library_data = await fetchFile(library);
this.library = JSON.parse(library_data);
},
retrieveItem: function(item_id){
item = this.library.items.find(x => x.id == item_id);
},
retrieveStyle: async function(style) {
// const url = `https://raw.githubusercontent.com/citation-style-language/styles/master/${style.csl_name}.csl`;
const url = `https://www.zotero.org/styles${style}`;
return await fetchFile(url);
},
retrieveLocale: async function(locale) {
const url = `https://raw.githubusercontent.com/citation-style-language/locales/master/locales-${locale}.xml`;
return await fetchFile(url);
}
} |
I'd be happy to advise on a fork that aims to simplify or otherwise improve
the code.
…On Tue, Aug 27, 2024, 3:14 AM Ariel Balter ***@***.***> wrote:
First let me say that I understand if this all sounds very critical. Just
having CSL, Citeproc-JS, and Citation-JS is an amazing thing! I can see an
immense amount of work went into creating the specs and writing the code.
It's a huge boon to the academic world.
I do find both of the JS libraries to be quite difficult to use and the
code looks like it could possibly be a lot simpler if it were modularized.
For example, it would be fantastic if there was a single function that
received a single source, style, and locale all as JSON or JavaScript
objects and returned an inline citation. But that functionality appears to
be entangled with other operations. Although I could be wrong about that.
Maybe what I'm actually suggesting is a feature request:
var citeproc = new CSL.DeclarativeEngine(style, lang);
Where style and lang are the actual CSL style and Locale as strings. Or
URLs.
And, why not just default to the Citation Style Language style
<https://github.com/citation-style-language/styles> and locale
<https://github.com/citation-style-language/locales> specs?
Alternatively, a default sys function that would work from strings or
file URLs.
I guess I just need to write my own like this:
sys = {
fetchFile: async function(url) {
try {
const response = await fetch(url);
if (!response.ok) {
throw new Error('Network response was not ok');
}
const data = await response.text(); // or response.json(), response.blob() etc.
return data; // return the fetched data
} catch (error) {
console.error('There has been a problem with your fetch operation:', error);
}
},
loadLibrary: async function(library){
var library_data = await fetchFile(library);
this.library = JSON.parse(library_data);
},
retrieveItem: function(item_id){
item = this.library.items.find(x => x.id == item_id);
},
retrieveStyle: async function(style) {
// const url = `https://raw.githubusercontent.com/citation-style-language/styles/master/${style.csl_name}.csl` <https://raw.githubusercontent.com/citation-style-language/styles/master/$%7Bstyle.csl_name%7D.csl>;
const url = `https://www.zotero.org/styles${style}`;
return await fetchFile(url);
},
retrieveLocale: async function(locale) {
const url = `https://raw.githubusercontent.com/citation-style-language/locales/master/locales-${locale}.xml`;
return await fetchFile(url);
}}
—
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASMSXI3M64TUBW23SQQG3ZTNV6PAVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQG44DMOJTGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ok. Before I jump in, just how rugged is it processing
? The XML has a lot of logic in it. Does ALL of that need to get parsed and recoded in javascript? |
When the processor ingests a style, and the style is in XML format, it will
convert it to JSON on the fly and read that into a JS object for
processing. If you plan to convert styles to JSON externally, would you
like me to identify the processor function(s) that perform the conversion?
(I should know, but it's been a couple of years since I looked at the code,
so I'd need to take a peek.)
…On Tue, Aug 27, 2024, 4:35 AM Ariel Balter ***@***.***> wrote:
Ok. Before I jump in, just how rugged is it processing
source -----> citation
^
CSL
?
The XML has a lot of logic in it. Does ALL of that need to get parsed and
recoded in javascript?
—
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASMSUKVN7RGPCGM4XH3ODZTN7RVAVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQHEZTIMJYGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Since there is so much logic in the stylesheets, really so much thought and work has gone into those, it seems to me like this is a perfect job for AI. I just handed chatGPT the "title" macro from chicago-author-date.csl and asked to to write it in javascript: title macro <macro name="title">
<choose>
<if variable="title" match="none">
<choose>
<if type="personal_communication speech thesis" match="none">
<text variable="genre" text-case="capitalize-first"/>
</if>
</choose>
</if>
<else-if type="bill book graphic legislation motion_picture song" match="any">
<text variable="title" text-case="title" font-style="italic"/>
<group prefix=" (" suffix=")" delimiter=" ">
<text term="version"/>
<text variable="version"/>
</group>
</else-if>
<else-if variable="reviewed-author">
<choose>
<if variable="reviewed-title">
<group delimiter=". ">
<text variable="title" text-case="title" quotes="true"/>
<group delimiter=", ">
<text variable="reviewed-title" text-case="title" font-style="italic" prefix="Review of "/>
<names variable="reviewed-author">
<label form="verb-short" text-case="lowercase" suffix=" "/>
<name and="text" delimiter=", "/>
</names>
</group>
</group>
</if>
<else>
<group delimiter=", ">
<text variable="title" text-case="title" font-style="italic" prefix="Review of "/>
<names variable="reviewed-author">
<label form="verb-short" text-case="lowercase" suffix=" "/>
<name and="text" delimiter=", "/>
</names>
</group>
</else>
</choose>
</else-if>
<else-if type="legal_case interview patent" match="any">
<text variable="title"/>
</else-if>
<else>
<text variable="title" text-case="title" quotes="true"/>
</else>
</choose>
</macro>
title.js function formatTitle(citation) {
// Helper functions for text formatting
const capitalizeFirst = text => text.charAt(0).toUpperCase() + text.slice(1);
const toTitleCase = text => text.replace(/\w\S*/g, txt => txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase());
const italicize = text => `<i>${text}</i>`;
const quote = text => `"${text}"`;
let result = "";
// Handling different conditions
if (!citation.title) {
if (!["personal_communication", "speech", "thesis"].includes(citation.type)) {
result = citation.genre ? capitalizeFirst(citation.genre) : "";
}
} else if (["bill", "book", "graphic", "legislation", "motion_picture", "song"].includes(citation.type)) {
result = italicize(toTitleCase(citation.title));
if (citation.version) {
result += ` (${citation.term_version || ''} ${citation.version})`;
}
} else if (citation.reviewed_author) {
if (citation.reviewed_title) {
result = `${quote(toTitleCase(citation.title))}. Review of ${italicize(toTitleCase(citation.reviewed_title))}, ${citation.reviewed_author.map(author => author.name).join(", ")}`;
} else {
result = `Review of ${italicize(toTitleCase(citation.title))}, ${citation.reviewed_author.map(author => `${author.label || ''} ${author.name}`).join(", ")}`;
}
} else if (["legal_case", "interview", "patent"].includes(citation.type)) {
result = citation.title;
} else {
result = quote(toTitleCase(citation.title));
}
return result;
} |
No. Just no. If you're going to apply an LLM to this takes, you're on your
own.
…On Tue, Aug 27, 2024, 4:46 AM Ariel Balter ***@***.***> wrote:
Since there is so much logic in the stylesheets, really so much thought
and work has gone into those, it seems to me like this is a perfect job for
AI. I just handed chatGPT the "title" macro from *chicago-author-date.csl*
and asked to to write it in javascript:
*title macro*
<macro name="title">
<choose>
<if variable="title" match="none">
<choose>
<if type="personal_communication speech thesis" match="none">
<text variable="genre" text-case="capitalize-first"/>
</if>
</choose>
</if>
<else-if type="bill book graphic legislation motion_picture song" match="any">
<text variable="title" text-case="title" font-style="italic"/>
<group prefix=" (" suffix=")" delimiter=" ">
<text term="version"/>
<text variable="version"/>
</group>
</else-if>
<else-if variable="reviewed-author">
<choose>
<if variable="reviewed-title">
<group delimiter=". ">
<text variable="title" text-case="title" quotes="true"/>
<group delimiter=", ">
<text variable="reviewed-title" text-case="title" font-style="italic" prefix="Review of "/>
<names variable="reviewed-author">
<label form="verb-short" text-case="lowercase" suffix=" "/>
<name and="text" delimiter=", "/>
</names>
</group>
</group>
</if>
<else>
<group delimiter=", ">
<text variable="title" text-case="title" font-style="italic" prefix="Review of "/>
<names variable="reviewed-author">
<label form="verb-short" text-case="lowercase" suffix=" "/>
<name and="text" delimiter=", "/>
</names>
</group>
</else>
</choose>
</else-if>
<else-if type="legal_case interview patent" match="any">
<text variable="title"/>
</else-if>
<else>
<text variable="title" text-case="title" quotes="true"/>
</else>
</choose>
</macro>
*title.js*
function formatTitle(citation) {
// Helper functions for text formatting
const capitalizeFirst = text => text.charAt(0).toUpperCase() + text.slice(1);
const toTitleCase = text => text.replace(/\w\S*/g, txt => txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase());
const italicize = text => `<i>${text}</i>`;
const quote = text => `"${text}"`;
let result = "";
// Handling different conditions
if (!citation.title) {
if (!["personal_communication", "speech", "thesis"].includes(citation.type)) {
result = citation.genre ? capitalizeFirst(citation.genre) : "";
}
} else if (["bill", "book", "graphic", "legislation", "motion_picture", "song"].includes(citation.type)) {
result = italicize(toTitleCase(citation.title));
if (citation.version) {
result += ` (${citation.term_version || ''} ${citation.version})`;
}
} else if (citation.reviewed_author) {
if (citation.reviewed_title) {
result = `${quote(toTitleCase(citation.title))}. Review of ${italicize(toTitleCase(citation.reviewed_title))}, ${citation.reviewed_author.map(author => author.name).join(", ")}`;
} else {
result = `Review of ${italicize(toTitleCase(citation.title))}, ${citation.reviewed_author.map(author => `${author.label || ''} ${author.name}`).join(", ")}`;
}
} else if (["legal_case", "interview", "patent"].includes(citation.type)) {
result = citation.title;
} else {
result = quote(toTitleCase(citation.title));
}
return result;}
—
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASMSSC43FGNHU6BAWW2T3ZTOA23AVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQHE2TIOBSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That's not my goal. I just thought I'd give it a try and see what happens. Not a good approach though, because then each style gets its own javascript which needs to be maintained. I guess the goal is to write javascript that knows how to interpret and act on the logic in the macros. |
Before I say this I just want to denounce LLMs as well. However, I've been thinking about "compiling" CSL into JS or other imperative languages as well, but programmatically of course. You'd need the appropriate helper functions, but it might lend to some interesting optimizations. Is that what you're after @abalter, or do you mean a single function that initializes citeproc to simplify the API? |
I wasn't thinking about using LLMs the way I think you might be, anyway. I use them to help write code, do some of the dirty work. It's actually quite good at that. Of course, it's just a helper, so I double check everything. That's all. I wasn't thinking: "hand this over to an AI". I did a little exploring to understand the limits of XML and XSLT. In a perfect work, each bit of logic in the stylesheet should directly translate to a logical statement in another computer language. Thus, something like My impression of the codebase is that interpreting and applying the logic in the stylesheets was pretty hellish. I see a lot of stuff that looks like trying to handle edge case after edge case. Is that just the way it is? Or would a fresh approach find common patterns and shortcuts? I haven't studied a lot of CSL stylesheets yet to see if there are commonalities. I'm assuming each one has a few macros for handling authors, a few for titles, a few for publishers, etc. Maybe there is an ontology somewhere. |
Sorry for my misinterpretation.
Just my perspective: I tried such a fresh approach a while back to get to know CSL a bit better and found that (1) the specification covers a lot of edge cases, so the actual behavior is sometimes a lot more complex that the XML itself suggests (e.g. handling of names, punctuation, indentation, suppression) and (2) citeproc-js has a lot of heuristics to be able to properly follow the specifications in the first place, and covers plenty more edge cases which didn't make it to the specifications. You can't easily get red of those and still get good results unless you keep to the most basic references.
The macros can differ between styles, and as far as I know there are no guidelines. |
@abalter: There are a couple of projects that might be of interest, given your objectives (apologies if you already know of these):
|
This is somewhat unfair to ask, but if someone can help me, it would mean a huge amount. The citeproc-js library is pretty complex and using it requires creating other functions (retrieveItem, retrieveLocale) that don't fully make sense to me. The package is designed to be able to do a large number of things across a large number of use cases.
All I want to do is generate formatted citations given a CSL-JSON library, CSL stylesheet, and locale spec like this:
Could someone guide me to the pertinent methods that I could use to build this simple application?
The text was updated successfully, but these errors were encountered: