Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First search result for taxi (which we encourage the user to try out) is bill that has been deleted rom legistar #87

Closed
fgregg opened this issue Dec 13, 2017 · 8 comments
Milestone

Comments

@fgregg
Copy link
Contributor

fgregg commented Dec 13, 2017

https://nyc-council-staging.datamade.us/search/?q=taxi&page=1

https://nyc-council-staging.datamade.us/legislation/t-2017-6878/

@reginafcompton
Copy link
Contributor

Weird.

So, this bill does not exist on the Legistar UI, but it does exist in the web API:
https://webapi.legistar.com/v1/nyc/matters/58413?token=....

Here's the OCD API for reference.

@hancush - do we want to scrape data not available in the Legistar user interface?

@hancush
Copy link
Member

hancush commented Dec 14, 2017

I'd like to know why it's not in Legistar. (I searched for the identifier and didn't come up with anything.)

But in the meantime, it looks like this is another instance of Legistar returning a 200 when it shouldn't.

In [1]: import requests

In [2]: r = requests.get('http://legistar.council.nyc.gov/LegislationDetail.aspx?ID=3289669&GUID=718D3F80-59AB-4D69-B3B0-C832B0A506E8')

In [3]: r.status_code
Out[3]: 200

In [4]: r.text
Out[4]: 'Invalid parameters!'

I think we should add a condition to _check_errors in python-legistar that raises a ScrapeError when response.text is "Invalid parameters!" so we can skip these bills.

@hancush
Copy link
Member

hancush commented Dec 14, 2017

It looks like a version of that bill does exist in Legistar – do we have this version in the OCD API?

@hancush
Copy link
Member

hancush commented Dec 15, 2017

So, this actually seems like a case of a duplicate bill. There is updated version of this bill in Legistar, the OCD API, and Councilmatic.

The bill was inserted again rather than updated because the identifier is slightly different – "T 2017-6878" vs. "Res 1762-2017". Unfortunately, we don't have a mechanism for deleting old information when this happens.

Seems like we want to avoid situations like this. Should we be checking bills for the same API source URL, perhaps? (The matter ID is consistent across versions here.) Alternatively, or additionally, perhaps we should check Legistar source URLs to see if they're active?

@reginafcompton
Copy link
Contributor

reginafcompton commented Jan 19, 2018

Related to: opencivicdata/pupa#295

To close this issue, let's simply suggest another search query in the input bar....

@reginafcompton reginafcompton added this to the Post launch milestone Jan 19, 2018
@jeancochrane jeancochrane self-assigned this Jan 22, 2018
@reginafcompton
Copy link
Contributor

I removed the duplicate bill from the OCD API and Councilmatic database (i.e., the bill with id "ocd-bill/afef2cb7-2b8d-4ce9-916b-34725ffa47f4", which duplicated this bill).

@jeancochrane
Copy link
Contributor

Since the missing bill has been removed from the database, I think this issue has been fixed -- is that right @reginafcompton?

@reginafcompton
Copy link
Contributor

Not yet! We actually need to change this:

SEARCH_PLACEHOLDER_TEXT = "Taxi, Resolution 815-2015, etc."

(The conversation above discloses that, in addition to the Taxi bill, the suggested resolution is not in Legistar or our databases: Resolution 815-2015)....Let's just suggest a bill that people can find "Introduction 2018-0327" - it will also be a nice test of the relevance search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants