Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

desy: introduce desy spider #155

Merged
merged 11 commits into from
Aug 24, 2017

Conversation

spirosdelviniotis
Copy link
Contributor

@spirosdelviniotis spirosdelviniotis commented Jul 6, 2017

Signed-off-by: Spiros Delviniotis [email protected]

Description

Related Issue

Closes #133

Motivation and Context

Checklist:

  • I have all the information that I need (if not, move to RFC and look for it).
  • I linked the related issue(s) in the corresponding commit logs.
  • I wrote good commit log messages.
  • My code follows the code style of this project.
  • I've added any new docs if API/utils methods were added.
  • I have updated the existing documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@spirosdelviniotis spirosdelviniotis self-assigned this Jul 6, 2017
@spirosdelviniotis spirosdelviniotis force-pushed the hepcrawl_desy_spider branch 2 times, most recently from 07cf5e6 to 54970bf Compare July 12, 2017 16:22
@spirosdelviniotis spirosdelviniotis force-pushed the hepcrawl_desy_spider branch 12 times, most recently from abcb28c to 840f18a Compare August 2, 2017 08:37
@spirosdelviniotis spirosdelviniotis changed the title desy: introduce desy spider -- WIP desy: introduce desy spider Aug 2, 2017
@david-caro
Copy link
Contributor

This is still missing the part of the media loader that populates the fft_file_paths no?

@david-caro
Copy link
Contributor

btw. I want to pair program this with you :)

@spirosdelviniotis
Copy link
Contributor Author

This is still missing the part of the media loader that populates the fft_file_paths no?

@david-caro the FftFilesPipeline populates the file_paths at item_completed (btw I should rename it to fft_file_paths).


btw. I want to pair program this with you :)

@david-caro Sure! That would be great! 😄

@spirosdelviniotis spirosdelviniotis force-pushed the hepcrawl_desy_spider branch 2 times, most recently from 7baed3d to 9c9bc7d Compare August 7, 2017 11:09
from scrapy.http import TextResponse
def _get_processed_item(record, spider):
item = pipeline.process_item(record, spider)
return item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/item/record

pipeline = InspireCeleryPushPipeline()
pipeline.open_spider(spider)
return [pipeline.process_item(record, spider) for record in records]

return [_get_processed_item(parsed_item, spider) for parsed_item in parsed_items]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/_get_processed_item/_get_processed_record/

def many_results(spider):
"""Return results generator from the arxiv spider. Tricky fields, many
records.
"""
from scrapy.http import TextResponse
def _get_processed_item(record, spider):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/record/item/

return parsed_record

parsed_item = spider.parse_node(response, nodes)
assert parsed_item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert parsed_item.record

Signed-off-by: Spyridon Delviniotis <[email protected]>
@spirosdelviniotis
Copy link
Contributor Author

Build fixed! 😄

LGTM

@david-caro
Copy link
Contributor

david-caro commented Aug 23, 2017

Looks like cheating to me ;)
it's actually ok... I think I had a mess with my local code...

@david-caro david-caro merged commit f86419a into inspirehep:master Aug 24, 2017
@spirosdelviniotis spirosdelviniotis deleted the hepcrawl_desy_spider branch August 24, 2017 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

desy spider
2 participants