Skip to content

Commit

Permalink
Update to v1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
korintje committed May 30, 2021
1 parent d18f2a1 commit 860eac0
Show file tree
Hide file tree
Showing 3 changed files with 195 additions and 96 deletions.
139 changes: 72 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,72 @@
# pandoc-doi2cite
This pandoc lua filiter helps users to insert references in a document with using DOI(Digital Object Identifier) tags.
With this filter, users do not need to make bibtex file by themselves. Instead, the filter automatically generate .bib file from the DOI tags, and convert the DOI tags into citation keys available by `pandoc-crossref`.

<img src="https://user-images.githubusercontent.com/30950088/117561410-87ec5d00-b0d1-11eb-88be-931f3158ec44.png" width="960">

What the filter do are as follows:

1. Search citations with DOI tags in the document
2. Get bibtex data of the DOI from http://api.crossref.org
3. Add reference data to a .bib file
4. Replace DOI tags to the correspoinding citation keys.

# Prerequisites
- Pandoc version 2.0 or newer
- This filter does not need any external dependencies
- This filter must be executed before `pandoc-crossref` or `--citeproc`

# DOI tags
Following DOI tags can be used:
* @https://doi.org/
* @doi.org/
* @DOI:
* @doi:

The first one (@https://doi.org/) may be the most useful because it is same as the accessible URL.

# Specify auto-generated bibliography file path
The path of the auto-generated bibliography file can be designated in the document yaml header.
The yaml key is `bib_from_doi`.
Both of the string and array are acceptable(If it is given as an array, only first item will be used).
Note that users typically should add same file path also in `bibliography`, in order to be recognized by `--citeproc`.

# Example
example paper.md:

<pre>
---
bibliography:
- "doi_refs.bib"
- "my_refs.bib"
bib_from_doi:
- "doi_refs.bib"
---

# Introduction
Electrophoresis is one of the most usable methodologies to separate proteins.[@https://doi.org/10.1038/227680a0]
By the way, Einstein is genius.[@doi.org/10.1002/andp.19053221004; @doi:10.1002/andp.19053220806]


</pre>

Example command 1 (.md -> .md)

```sh
pandoc --lua-filter doi2cite.lua --wrap=preserve -s example.md -o ourpur.md
```

Example command 2 (.md -> .pdf with [JACS](https://pubs.acs.org/journal/jacsat) style):

```sh
pandoc --lua-filter doi2cite.lua --filter=pandoc-crossref --citeproc --csl=journal-of-the-american-chemical-society.csl -s example.md -o example.pdf
```

Example result

![4](https://user-images.githubusercontent.com/30950088/117563789-3947be80-b0e3-11eb-974c-565af3902dca.png)
# pandoc-doi2cite
This pandoc lua filiter helps users to insert references in a document
with using DOI(Digital Object Identifier) tags. With this filter, users
do not need to make bibtex file by themselves. Instead, the filter
automatically generate .bib file from the DOI tags, and convert the DOI
tags into citation keys available by --citeproc.

<img src="https://user-images.githubusercontent.com/30950088/117561410-87ec5d00-b0d1-11eb-88be-931f3158ec44.png" width="960">

What the filter do are as follows:
1. Search citations with DOI tags in the document
2. Search corresponding bibtex data from `__from_DOI.bib` file
3. If not found, get bibtex data of the DOI from
http://api.crossref.org
4. Add reference data to `__from_DOI.bib` file
5. Check duplications of reference keys
6. Replace DOI tags to the correspoinding citation keys

# Prerequisites
- Pandoc version 2.0 or newer
- This filter does not need any external dependencies
- This filter should be executed before `pandoc-crossref` or
`--citeproc`

# DOI tags
Following DOI tags can be used:
- @https://doi.org/
- @doi.org/
- @DOI:
- @doi:

The first one (@https://doi.org/) may be the most useful because it is
same as the accessible URL.

# YAML header
The file **name** of the auto-generated bibliography file **MUST** be
`__from_DOI.bib`, but the **place** of the file can be changed (e.g.
`'./refs/__from_DOI.bib'` or `'refs\\__from_DOI.bib'` for Windows). Yo
u can designate the filepath in the document yaml header. The yaml key
is `bibliography`, which is also used by --citeproc.


# Example

example1.md:

---
bibliography:
- 'my_refs.bib'
- '__from_DOI.bib'
---

# Introduction
The Laemmli system is one of the most widely used gel systems for the separation of proteins.[@LAEMMLI_1970]
By the way, Einstein is genius.[@https://doi.org/10.1002/andp.19053220607; @doi.org/10.1002/andp.19053220806; @doi:10.1002/andp.19053221004]

Example command 1 (.md -\> .md)

``` {.sh}
pandoc --lua-filter=doi2cite.lua --wrap=preserve -s example1.md -o expected1.md
```

Example command 2 (.md -\> .pdf with
[ACS](https://pubs.acs.org/journal/jacsat) style):

``` {.sh}
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc --csl=sample1.csl -s example1.md -o expected1.pdf
```

Example result

![expected1](https://user-images.githubusercontent.com/30950088/119964566-4d952200-bfe4-11eb-90d9-ed2366c639e8.png)
143 changes: 114 additions & 29 deletions doi2cite.lua
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,28 @@
-- Global variables --
--------------------------------------------------------------------------------
base_url = "http://api.crossref.org"
bibpath = "./bib_from_doi.bib"
mailto = "[email protected]"
bibname = "__from_DOI.bib"
key_list = {};
doi_key_map = {};
doi_entry_map = {};
error_strs = {};
error_strs["Resource not found."] = 404
error_strs["No acceptable resource available."] = 406
error_strs["<html><body><h1>503 Service Unavailable</h1>\n"
.."No server is available to handle this request.\n"
.."</body></html>"] = 503


--------------------------------------------------------------------------------
-- Pandoc Functions --
--------------------------------------------------------------------------------
-- Get bibliography filepath from yaml metadata
function Meta(m)
local bp = m.bib_from_doi
if bp ~= nil then
if bp[1].text ~= nil then
bibpath = bp[1].text
elseif bp[1][1] ~= nil then
bibpath = bp[1][1].text
else end
end
local bib_data = m.bibliography
local bibpaths = get_paths_from(bib_data)
bibpath = find_filepath(bibname, bibpaths)
bibpath = verify_path(bibpath)
local f = io.open(bibpath, "r")
if f then
entries_str = f:read('*all')
Expand All @@ -39,8 +42,9 @@ function Meta(m)
end
end
f:close()
else
make_new_file(bibpath)
end
print(bibpath .. " is created for bibliography from DOI.")
end

-- Get bibtex data of doi-based citation.id and make bibliography.
Expand All @@ -58,18 +62,16 @@ function Cite(c)
doi = nil
end
if doi then
if doi_key_map[doi] ~= nil then
local entry_key = doi_key_map[doi]
citation.id = entry_key
print("Existing DOI: "..doi)
if doi_key_map[doi] then
citation.id = doi_key_map[doi]
else
local entry_str = get_bibentry(doi)
if entry_str == nil or entry_str == "Resource not found." then
if entry_str == nil or error_strs[entry_str] then
print("Failed to get ref from DOI: " .. doi)
else
entry_str = replace_symbols(entry_str)
entry_str = tex2raw(entry_str)
local entry_key = get_entrykey(entry_str)
if key_list[entry_key] ~= nil then
if key_list[entry_key] then
entry_key = entry_key.."_"..doi
entry_str = replace_entrykey(entry_str, entry_key)
end
Expand All @@ -81,7 +83,7 @@ function Cite(c)
f:write(entry_str .. "\n")
f:close()
else
error("Can not open file: "..bibpath)
error("Unable to open file: "..bibpath)
end
end
end
Expand All @@ -99,21 +101,71 @@ function get_bibentry(doi)
local entry_str = doi_entry_map[doi]
if entry_str == nil then
print("Request DOI: " .. doi)
local url = base_url.."/works/"..doi.."/transform/application/x-bibtex"
local url = base_url.."/works/"
..doi.."/transform/application/x-bibtex"
.."?mailto="..mailto
mt, entry_str = pandoc.mediabag.fetch(url)
end
return entry_str
end

-- Replace some symbols inorder to escape (maybe) bugs of pandoc/citeproc
function replace_symbols(string)
local buggystrs = {};
buggystrs["{\textendash}"] = ""
buggystrs["{\textemdash}"] = ""
buggystrs["{\textquoteright}"] = ""
buggystrs["{\textquoteleft}"] = ""
for buggystr, altanative in pairs(buggystrs) do
local string = string:gsub(buggystr, altanative)
-- Extract designated filepaths from 1 or 2 dimensional metadata
function get_paths_from(metadata)
local filepaths = {};
if metadata then
if metadata[1].text then
filepaths[metadata[1].text] = true
elseif type(metadata) == "table" then
for _, datum in pairs(metadata) do
if datum[1] then
if datum[1].text then
filepaths[datum[1].text] = true
end
end
end
end
end
return filepaths
end

-- Extract filename and dirname from a given a path
function split_path(filepath)
local delim = nil
local len = filepath:len()
local reversed = filepath:reverse()
if filepath:find("/") then
delim = "/"
elseif filepath:find([[\]]) then
delim = [[\]]
else
return {filename = filepath, dirname = nil}
end
local pos = reversed:find(delim)
local dirname = filepath:sub(1, len - pos)
local filename = reversed:sub(1, pos - 1):reverse()
return {filename = filename, dirname = dirname}
end

-- Find bibname in a given filepath list and return the filepath if found
function find_filepath(filename, filepaths)
for path, _ in pairs(filepaths) do
local filename = split_path(path)["filename"]
if filename == bibname then
return path
end
end
return nil
end

-- Make some TeX descriptions processable by citeproc
function tex2raw(string)
local symbols = {};
symbols["{\textendash}"] = ""
symbols["{\textemdash}"] = ""
symbols["{\textquoteright}"] = ""
symbols["{\textquoteleft}"] = ""
for tex, raw in pairs(symbols) do
local string = string:gsub(tex, raw)
end
return string
end
Expand Down Expand Up @@ -157,11 +209,44 @@ function get_doi_key_map(bibtex_string)
return keys
end

-- function to make directories and files
function make_new_file(filepath)
if filepath then
print("doi2cite: creating "..filepath)
local dirname = split_path(filepath)["dirname"]
if dirname then
os.execute("mkdir -p "..dirname)
end
f = io.open(filepath, "w")
if f then
f:close()
else
error("Unable to make bibtex file: "..bibpath..".\n"
.."This error may come from the missing directory. \n"
)
end
end
end

-- Verify that the given filepath is correct.
-- Catch common Pandoc user mistakes about Windows-formatted filepath.
function verify_path(bibpath)
if bibpath == nil then
print("[WARNING] doi2cite: "
.."The given file path is incorrect or empty. "
.."In Windows-formatted filepath, Pandoc recognizes "
.."double backslash ("..[[\\]]..") as the delimiters."
)
return "__from_DOI.bib"
else
return bibpath
end
end

--------------------------------------------------------------------------------
-- The main function --
--------------------------------------------------------------------------------
return {
{ Meta = Meta },
{ Cite = Cite }
}
}
9 changes: 9 additions & 0 deletions sample1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
bibliography:
- 'my_refs.bib'
- '__from_DOI.bib'
---

# Introduction
The Laemmli system is one of the most widely used gel systems for the separation of proteins.[@LAEMMLI_1970]
By the way, Einstein is genius.[@https://doi.org/10.1002/andp.19053220607; @doi.org/10.1002/andp.19053220806; @doi:10.1002/andp.19053221004]

0 comments on commit 860eac0

Please sign in to comment.