Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaks to new citation lookup tool #3924

Closed
mlissner opened this issue Mar 27, 2024 · 12 comments · Fixed by #3943
Closed

Tweaks to new citation lookup tool #3924

mlissner opened this issue Mar 27, 2024 · 12 comments · Fixed by #3943
Assignees

Comments

@mlissner
Copy link
Member

First, I just tried clicking here:

https://www.courtlistener.com/api/rest/v3/citation-lookup/

And got a crash: COURTLISTENER-6ZE

@mlissner
Copy link
Member Author

Second, looks like we didn't do any auth. It should use the same auth as other areas of the API at least at first, though we may want to tighten it down the road, if performance is an issue.

@ERosendo ERosendo self-assigned this Mar 27, 2024
@ERosendo ERosendo moved this to ✍🏻In Progress in @erosendo's backlog Mar 27, 2024
Copy link

sentry-io bot commented Mar 28, 2024

I tossed a really big blob of text at it and got this:

Sentry Issue: COURTLISTENER-6ZF

@ERosendo
Copy link
Contributor

ERosendo commented Mar 28, 2024

I tossed a really big blob of text at it and got this:

Sentry Issue: COURTLISTENER-6ZF

@mlissner It seems like Eyecite was able to identify the reporter and page number in a citation but could not extract the volume. The citation object is:

FullCaseCitation('Thompson, 394', groups={'volume': None, 'reporter': 'Thompson', 'page': '394'}, metadata=FullCaseCitation.Metadata(parenthetical=None, pin_cite=None, year='1969', court=None, plaintiff='Shapiro', defendant='', extra='U. S. 618, 634'))

should we ignore this citation in the response? or should we include it with an error message?

@mlissner
Copy link
Member Author

Interesting. This fails too:

s2 = 'Shapiro v. Thompson, 394 U. S. 618'
r = requests.post('https://www.courtlistener.com/api/rest/v3/citation-lookup/', data={"text":s2 })
r.status_code
500

But the front end looks fine:

image

@ERosendo
Copy link
Contributor

I tried "Shapiro v. Thompson, 394 U. S. 618" in the web citation tool and it's also failing. 🤔

@mlissner
Copy link
Member Author

APIs, man, they never fail to find problems in other places! Good news, we get to fix both!

@ERosendo
Copy link
Contributor

COURTLISTENER-6ZE is fixed:

image

@mlissner
Copy link
Member Author

mlissner commented Apr 1, 2024

Let's not forget here that we also need to add auth to this endpoint before anybody can start using it. Want to suggest something, @ERosendo?

@ERosendo
Copy link
Contributor

ERosendo commented Apr 1, 2024

we also need to add auth to this endpoint before anybody can start using it.

@mlissner You're right. We can add auth to this endpoint using two approaches:

Option 1: Simple Authentication

The IsAuthenticated class provides a simple way to restrict access to this endpoint. Users only need to include the authentication token found in their profile's developer tab within their request. This ensures only registered users can access the tool.

Option 2: Granular Permissions

We use a custom class inheriting from DjangoModelPermissions. This approach offers finer control over who can access the endpoint because it allows us to define specific user permissions that grant access. However, users would need to request API access beforehand, and permissions are assigned manually through the admin interface. We're using this approach for some endpoints like the one to retrieve/list docket entries(the RECAPUsersReadOnly class uses the perms_map property to define the custom model permissions)

Let me know what you think

@mlissner
Copy link
Member Author

mlissner commented Apr 1, 2024

Simple auth seems fine. What about throttling?

@ERosendo
Copy link
Contributor

ERosendo commented Apr 1, 2024

What about throttling?

Considering that users can send large payloads, I believe that the ratelimiter_all_2_per_m limiter is appropriate. This would allow users to send up to 2 request per minute. Does this seem reasonable?

@mlissner
Copy link
Member Author

mlissner commented Apr 2, 2024

I was thinking more about a throttle_class: https://www.django-rest-framework.org/api-guide/throttling/.

We'll need more than 2/m though. Maybe what we really need are citations/s. I don't care how much empty text you send at me. I care how many citations I have to look up. It looks like making a custom throttle to do that should be pretty easy using DRF, if you look at the SimpleRateThrottle's thottle_success method.

But assuming we can pull that off, how many do we want people to be able to look up per second? I'm not sure. I think these queries are very well optimized. Maybe 1/s?

@ERosendo ERosendo linked a pull request Apr 9, 2024 that will close this issue
@ERosendo ERosendo moved this from ✍🏻In Progress to 🔎In Review in @erosendo's backlog Apr 9, 2024
@github-project-automation github-project-automation bot moved this from 🔎In Review to Done in @erosendo's backlog Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants