Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot import references in Chinese from Google Scholar. #1694

Closed
TLCFEM opened this issue Aug 8, 2016 · 8 comments
Closed

Cannot import references in Chinese from Google Scholar. #1694

TLCFEM opened this issue Aug 8, 2016 · 8 comments
Assignees
Labels
component: fetcher [outdated] type: bug Confirmed bugs or reports that are very likely to be bugs

Comments

@TLCFEM
Copy link

TLCFEM commented Aug 8, 2016

JabRef 3.6dev--snapshot--2016-08-06--master--381b569
windows 7 6.1 amd64
Java 1.8.0_101

Function: Web Search.

Problem: Cannot import references in Chinese from Google Scholar.

Test with following string.

微分求积方法及其在力学应用中的若干新进展

Items in the inspection window are all empty.

Any relavent codes handling these things?

Can anyone check it?

Attached are two pics.

aaaa

wwww

@Siedlerchr
Copy link
Member

Confirmed. Seems like there is some Encoding problem in the fetcher. I'll take a look

@Siedlerchr Siedlerchr self-assigned this Aug 8, 2016
@Siedlerchr Siedlerchr added the [outdated] type: bug Confirmed bugs or reports that are very likely to be bugs label Aug 8, 2016
@Siedlerchr
Copy link
Member

I digged a bit further and the problem is getting the last URL in the downloadToString method in the URLDownload class.
I also tested a downloadToFile method, Same result.
Opening URL in Browser is okay. Still need to figure out why JabRef does not get the Chinese chars.
Some encoding of the stream I think. Will investigate further tomorrow.
http://scholar.googleusercontent.com/scholar.bib?q=info:YViNXU26OC8J:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAV6jlOozgO5h9-NnX7dATaWPgd498yl5h&scisf=4&ct=citation&cd=0&hl=en

@TLCFEM
Copy link
Author

TLCFEM commented Aug 9, 2016

I found another problem.

After searching for roughly 40 items by Web Search, now I cannot fetch anything. Just no item in result window. Then I visit scholar using browser and the page gives me this.

aaa

After verification, I can get proper results.

I do not know the mechanism of the fetcher in JabRef but I think it should be something similar to others.

Is there any possibility that the results are blocked by such a verification?

Any way to tackle this?

@matthiasgeiger
Copy link
Member

Google scholar is blocking "automatated" crawls after generating too much traffic in short time. We already use the two-step approach with the prefetched list before crawling the actual bibtex data to circumvent this.
However, after too much crawls JabRef - or more correct: your IP adress - will be blocked.

Thus, the Google Scholar fetcher is not the best way to obtain lots of entries at the same time. If you are Mozilla Firefox, the JabRef Plugin "JabFox" might be an alternative to download the bibtex data directly from the browser. You can find the PlugIn here: https://addons.mozilla.org/en-US/firefox/addon/jabfox/?src=external-jabrefSite

@matthiasgeiger
Copy link
Member

Side note: Using JabFox it is also possible to correctly import the entry you mentioned in your initial bug report

@matthiasgeiger
Copy link
Member

matthiasgeiger commented Aug 19, 2016

The encoding problem should be fixed with #1785.

You can checkout a devbuild at http://builds.jabref.org/fix-scholar-encoding/ to check whether searching for chinese letters is working again!

Oh... just realized that is still not working... 😞

@matthiasgeiger
Copy link
Member

Okay... better luck this time ;-)

Should be working now: http://builds.jabref.org/fix-scholar-encoding

@oscargus
Copy link
Contributor

I merged #1785 and hope that the encoding issue here is fixed. Feel free to reopen if it doesn't work for you @TLCFEM .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: fetcher [outdated] type: bug Confirmed bugs or reports that are very likely to be bugs
Projects
None yet
Development

No branches or pull requests

5 participants