Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cloneNode and toxml as some pages fail when extracting RDFa #38

Merged
merged 1 commit into from
Mar 30, 2017

Conversation

andrix
Copy link
Member

@andrix andrix commented Mar 26, 2017

Trying to extract some RDFa from a site, I got AttributeError exception because of the lack of implementation of cloneNode and toxml.

I've added both methods and now it's working.

@codecov-io
Copy link

codecov-io commented Mar 26, 2017

Codecov Report

Merging #38 into master will decrease coverage by 0.55%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage    86.8%   86.25%   -0.56%     
==========================================
  Files           3        3              
  Lines         235      240       +5     
  Branches       47       47              
==========================================
+ Hits          204      207       +3     
- Misses         29       31       +2     
  Partials        2        2
Impacted Files Coverage Δ
extruct/rdfa.py 74.79% <66.66%> (-0.63%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f61cc45...d18a6d1. Read the comment docs.

@redapple
Copy link
Contributor

Good catch @andrix !
I can reproduce the exceptions with this page for example: http://nielslubberman.nl/drupal/
It's related to 'http://purl.org/rss/1.0/modules/content/encoded' property (with RDFa 1.0 at least)

@redapple redapple merged commit 353e2b3 into scrapinghub:master Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants