Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spider for Rr Boa Vista #101

Merged
merged 4 commits into from
Aug 2, 2018
Merged

Conversation

weibemoura
Copy link
Contributor

No description provided.

  - Updated status Aparecida de Goiânia/GO
  - Updated status Boa Vista/RR
url = response.urljoin(url)

power = "executive_legislature"
items.append(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to include all the gazettes in a list before yielding it. Remove the items list and yield each item as they are scraped:

               yield Gazette(
                   date=date,
                   file_urls=[url],
                   is_extra_edition=False,
                   territory_id=self.TERRITORY_ID,
                   power=power,
                   scraped_at=dt.datetime.utcnow(),
               ))

scraped_at=dt.datetime.utcnow(),
))

return items
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

yield scrapy.Request(url, self.parse_period)

def parse_period(self, response):
items = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

for option in options:
data = option.extract()

url = "".join([self.start_urls[0], "?Periodo=", data])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use w3lib to include this new parameter to the initial URL (http://w3lib.readthedocs.io/en/latest/w3lib.html#w3lib.url.add_or_replace_parameter)

import w3lib.url
url = w3lib.url.add_or_replace_parameter(response.url, 'Periodo', data)

@weibemoura
Copy link
Contributor Author

Thanks @rennerocha I just updated the code

@cuducos cuducos merged commit 6404a6e into okfn-brasil:master Aug 2, 2018
@trevineju trevineju added this to the Capitais | Capital Cities milestone Oct 10, 2022
@trevineju trevineju added the spider Adiciona robô raspador para município(s) label Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spider Adiciona robô raspador para município(s)
Projects
Development

Successfully merging this pull request may close these issues.

4 participants