Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Getting Zenodo citation #10

Merged
merged 13 commits into from
Jun 21, 2021

Conversation

davibarreira
Copy link
Contributor

@davibarreira davibarreira commented Jun 19, 2021

I've implemented two new functions in citations.jl:
get_zenodo_badge - This function reads the README.md file and extracts the link from the zenodo's badge;
get_citation_zenodo - This functions takes the badge link and obtains the bibtex.

I've modified the get_citations function to search for the badge in case no Citation.bib file is found. I've also added an argument zenodo=false to get_citation, get_tool_citations and collect_citations. The default is false so that your previous tests do not break.

I've also implemented a testset for the zenodo functions. Note that some of your tests were already failing, and they are not related to the new functions.

I hope this PR is helpful.

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2021

Codecov Report

Merging #10 (3b6cef2) into master (32fff07) will increase coverage by 1.59%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #10      +/-   ##
==========================================
+ Coverage   90.00%   91.59%   +1.59%     
==========================================
  Files           3        3              
  Lines          90      119      +29     
==========================================
+ Hits           81      109      +28     
- Misses          9       10       +1     
Impacted Files Coverage Δ
src/citations.jl 95.58% <93.93%> (-1.92%) ⬇️
src/tool_report.jl 85.41% <100.00%> (+2.08%) ⬆️
src/utils.jl 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8256957...3b6cef2. Read the comment docs.

Copy link
Owner

@SebastianM-C SebastianM-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

I left some comments and suggestions on the specific parts of the code. It looks good, but I think it would be better to refactor the code a bit to return a merged citation dictionary from get_citation and maybe have a function get_badge_citations that would call get_citation_zenodo. This way it will be easier to extend for JOSS (and other types) of badges. I also think that we should default to getting all the citation data.

I'm wondering if we could just setup a regex for DOIs in badges.

Note that some of your tests were already failing, and they are not related to the new functions.

Can you provide some information about the failing tests? The CI doesn't show failing package tests and I also checked locally and I didn't encounter any problems.
The errors in CI like

Error: Can't open display: (null)

are due to the clipboard integration tests because I haven't managed to install xclip properly (#11).

src/citations.jl Outdated
@@ -5,32 +5,43 @@ function citation_path(pkg)
end
end

function get_citation(pkg)

# Added `zenodo` flag to avoid breaking current tests
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to default to including the citations from badges and change / fix the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I thought this should be done in another PR, cause it would be too much change.

src/citations.jl Outdated
Comment on lines 24 to 25
elseif !isnothing(urlzenodo)
get_citation_zenodo(urlzenodo)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to merge the citations from all sources. It would also be nice to have a duplication check in case package authors have the citation info also in CITATION.bib and use that instead (I'm not sure if it's worth it to make this configurable. Preferring the CITATION.bib over the badge seems reasonable to me.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it's now, it just get's the zenodo citation if no CITATION.bib is not present. Isn't this your suggestion?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that there might be packages that have both a badge and a CITATION.bib file, but if we prefer the CITATION.bib anyway, then it would be simpler to go with the current implementation. I was thinking that there might be some edge cases when we need to merge badge citations with CITATION.bib citations, but it would be better to just document that we will prefer the .bib file over badges.

src/citations.jl Outdated Show resolved Hide resolved
src/citations.jl Outdated
readme_path = joinpath(pkg.source, "README.md")
if isfile(readme_path)
readme = open(f->read(f, String), readme_path);
index_init = findlast("https://zenodo.org/badge", readme)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there are packages with more than one zenodo badge. We can leave it like this and update in the future if it's necessary.

src/PkgCite.jl Outdated
@@ -2,13 +2,15 @@ module PkgCite

using Pkg: include
using DataStructures: getkey, values
using Base: String
using Base: String, indent_width
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is indent_width used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to remove it. It's not used.

src/citations.jl Outdated Show resolved Hide resolved
src/citations.jl Outdated Show resolved Hide resolved
src/citations.jl Outdated Show resolved Hide resolved
src/citations.jl Outdated
function get_citation(pkg)

# Added `zenodo` flag to avoid breaking current tests
function get_citation(pkg; zenodo=false)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it might be better to have a more general flag like badge (or include_badges or something like that).

@davibarreira
Copy link
Contributor Author

davibarreira commented Jun 21, 2021

Thank you for your contribution!

I left some comments and suggestions on the specific parts of the code. It looks good, but I think it would be better to refactor the code a bit to return a merged citation dictionary from get_citation and maybe have a function get_badge_citations that would call get_citation_zenodo. This way it will be easier to extend for JOSS (and other types) of badges. I also think that we should default to getting all the citation data.

I'm wondering if we could just setup a regex for DOIs in badges.

Note that some of your tests were already failing, and they are not related to the new functions.

Can you provide some information about the failing tests? The CI doesn't show failing package tests and I also checked locally and I didn't encounter any problems.
The errors in CI like

Error: Can't open display: (null)

are due to the clipboard integration tests because I haven't managed to install xclip properly (#11).

What exactly do you mean by using "regex for DOI"? Do you mean, search for the "DOI" term, and then extract the badge link? I guess this could be done, although I'm not very good with regex.

Also, can you give an example of what do you mean by the " but I think it would be better to refactor the code a bit to return a merged citation dictionary from get_citation and maybe have a function get_badge_citations that would call get_citation_zenodo"? I don't fully understand what exactly would be this merged citation dictionary.

davibarreira and others added 5 commits June 21, 2021 08:26
Co-authored-by: Sebastian Micluța-Câmpeanu <[email protected]>
Co-authored-by: Sebastian Micluța-Câmpeanu <[email protected]>
Co-authored-by: Sebastian Micluța-Câmpeanu <[email protected]>
Co-authored-by: Sebastian Micluța-Câmpeanu <[email protected]>
@davibarreira
Copy link
Contributor Author

By the way, here are the test failures I get when I just run the forked repo without my changes:

 [0c5d862f] Symbolics v0.1.27
┌ Warning: There was an error reading the CITATION.bib file for Distributions
│   exception = KeyError: key "july" not found
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:15
┌ Warning: There was an error reading the CITATION.bib file for Distributions
│   exception = KeyError: key "july" not found
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:15
┌ Warning: Overwriting julia_citations.bib
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:53
[ Info: A julia_citations.bib file with the citations for LabelledArrays, QuadGK, Symbolics, RecursiveArrayTools, ArrayInterface and AbstractAlgebra was generated in the current working directory (/home/davibarreira/MEGA/EMAp/PkgCite.jl/test).
┌ Warning: There was an error reading the CITATION.bib file for Distributions
│   exception = KeyError: key "july" not found
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:15
┌ Warning: There was an error reading the CITATION.bib file for Distributions
│   exception = KeyError: key "july" not found
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:15
[ Info: The above sentence was copied to your clipboard.
┌ Warning: Overwriting julia_citations.bib
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:53
[ Info: A julia_citations.bib file with the citations for LabelledArrays, QuadGK, Symbolics, RecursiveArrayTools, Julia, ArrayInterface and AbstractAlgebra was generated in the current working directory (/home/davibarreira/MEGA/EMAp/PkgCite.jl/test).
Clipboard: Error During Test at /home/davibarreira/MEGA/EMAp/PkgCite.jl/test/runtests.jl:81
 Unexpected Pass
 Expression: clipboard() == CITE_STR_JL
 Got correct result, please change to @test if no longer broken.

┌ Warning: There was an error reading the CITATION.bib file for Distributions
│   exception = KeyError: key "july" not found
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:15
[ Info: The above sentence was copied to your clipboard.
┌ Warning: Overwriting julia_citations.bib
└ @ PkgCite ~/MEGA/EMAp/PkgCite.jl/src/citations.jl:53
[ Info: A julia_citations.bib file with the citations for LabelledArrays, QuadGK, Symbolics, RecursiveArrayTools, Julia, ArrayInterface and AbstractAlgebra was generated in the current working directory (/home/davibarreira/MEGA/EMAp/PkgCite.jl/test).
Clipboard: Error During Test at /home/davibarreira/MEGA/EMAp/PkgCite.jl/test/runtests.jl:89
 Unexpected Pass
 Expression: clipboard() == CITE_STR
 Got correct result, please change to @test if no longer broken.

Test Summary:              | Pass  Error  Broken  Total
PkgCite.jl                 |   24      2       1     27
  Empty env                |    1                     1
  Packages                 |    6              1      7
  Citations                |    4                     4
  Citations in .bib        |    5                     5
  Only direct dependencies |    3                     3
  PkgCite sentence         |    5      2              7
    Clipboard              |    2      2              4
ERROR: LoadError: Some tests did not pass: 24 passed, 0 failed, 2 errored, 1 broken.
in expression starting at /home/davibarreira/MEGA/EMAp/PkgCite.jl/test/runtests.jl:10
ERROR: Package PkgCite errored during testing

Copy link
Owner

@SebastianM-C SebastianM-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you give an example of what do you mean by the " but I think it would be better to refactor the code a bit to return a merged citation dictionary from get_citation and maybe have a function get_badge_citations that would call get_citation_zenodo"? I don't fully understand what exactly would be this merged citation dictionary.

I was thinking if there would be an edge case where we need to merge citations form the badge with the ones in the CITATION.bib file, but as I've said it the response comment, it would be better to just document that we prefer the CITATION.bib file over the badges and leave the implementation as is.

What exactly do you mean by using "regex for DOI"? Do you mean, search for the "DOI" term, and then extract the badge link? I guess this could be done, although I'm not very good with regex.

I was referring to parsing the README for badges and then look for DOI-like strings (things like "10.5281/ZENODO.4692172") insde them since the badge links contain the DOI information, but I see now that this will not work for Zenodo since there can be links with latestdoi that need to be followed for the DOI. I thought that the DOI info is present in the link like in the badge for Plots or DynamicalSystems.

Looking for the "DOI" string is indeed a better solution since it seems to be used in all the badges I've encountered.

src/citations.jl Outdated
Comment on lines 24 to 25
elseif !isnothing(urlzenodo)
get_citation_zenodo(urlzenodo)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that there might be packages that have both a badge and a CITATION.bib file, but if we prefer the CITATION.bib anyway, then it would be simpler to go with the current implementation. I was thinking that there might be some edge cases when we need to merge badge citations with CITATION.bib citations, but it would be better to just document that we will prefer the .bib file over badges.

@davibarreira
Copy link
Contributor Author

Also, can you give an example of what do you mean by the " but I think it would be better to refactor the code a bit to return a merged citation dictionary from get_citation and maybe have a function get_badge_citations that would call get_citation_zenodo"? I don't fully understand what exactly would be this merged citation dictionary.

I was thinking if there would be an edge case where we need to merge citations form the badge with the ones in the CITATION.bib file, but as I've said it the response comment, it would be better to just document that we prefer the CITATION.bib file over the badges and leave the implementation as is.

What exactly do you mean by using "regex for DOI"? Do you mean, search for the "DOI" term, and then extract the badge link? I guess this could be done, although I'm not very good with regex.

I was referring to parsing the README for badges and then look for DOI-like strings (things like "10.5281/ZENODO.4692172") insde them since the badge links contain the DOI information, but I see now that this will not work for Zenodo since there can be links with latestdoi that need to be followed for the DOI. I thought that the DOI info is present in the link like in the badge for Plots or DynamicalSystems.

Looking for the "DOI" string is indeed a better solution since it seems to be used in all the badges I've encountered.

Yeah, there are some cases where the DOI is in the badge, but for Zenodo, at least, this is not the correct way to setup. The reason is that for every new package version a new DOI is generated. So using "latestdoi" is the correct setup, hence, why I wrote that "warning". Perhaps, one can check if the badge has a DOI, and use that instead.

@SebastianM-C
Copy link
Owner

Regarding the failing tests, I marked the clipboard ones as broken since that functionality is broken on CI (and it didn't work on the local computer where I tested because xclip/xsel wasn't installed), but it worked on your computer. I think I will wrap those tests in a try that checks clipboard() works in a future PR. We can safely ignore this for now.

@SebastianM-C
Copy link
Owner

Yeah, there are some cases where the DOI is in the badge, but for Zenodo, at least, this is not the correct way to setup. The reason is that for every new package version a new DOI is generated. So using "latestdoi" is the correct setup, hence, why I wrote that "warning". Perhaps, one can check if the badge has a DOI, and use that instead.

It looks like looking for a badge that starts with "DOI" will cover all cases (and also work for JOSS badges). We could issue a warning in the case of Zenodo if we detect a badge that doesn't use latestdoi, but using a DOI for all the package versions (like Plots) or the latest one seems like something that not everyone will agree on, so we should test that the code works on both variants.

@davibarreira
Copy link
Contributor Author

I'll do the following. I'll create a function that searchers for a badge with DOI, and get's the link. Now, if the link if from zenodo, then uses my code to get the bibtex. If it's a doi link, then I'll create another function to generate the bibtex from it.

@davibarreira
Copy link
Contributor Author

Made all the suggested changes. The function get_zenodo_badge now is get_badge, and it looks for the DOI badge, and extracts the link. The function get_citation_zenodo is now get_citation_badge and it checks weather the link is already the DOI or if it's the latestdoi (for the Zenodo case). I guess this is all.
As I said in the other comment, I left the default as false and I think a new PR should be opened to turn true into the default, improve the documentation and fix the tests.

Copy link
Owner

@SebastianM-C SebastianM-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution!

test/cite_str.jl Outdated Show resolved Hide resolved
@SebastianM-C SebastianM-C merged commit e6bbf4e into SebastianM-C:master Jun 21, 2021
@SebastianM-C SebastianM-C mentioned this pull request Jun 21, 2021
@SebastianM-C SebastianM-C linked an issue Jun 21, 2021 that may be closed by this pull request
@davibarreira
Copy link
Contributor Author

Thank you very much for your contribution!

Not a problem! Thanks for the pkg by the way :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zenodo tags
3 participants