Skip to content

Access tons of case law data in an easy format with no cap(s)

Notifications You must be signed in to change notification settings

harvard-lil/nocap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Easy Bulk export, no cap

This repository provides scripts and notebooks that make it easy to export data in bulk from CourtListener's freely available downloads.

  • Create first version of notebook suitable for Data Scientists
    • Create the appropriate dtypes to optimize panda storage
    • Select necessary cols usecols, for example 'created_by' date field indicating a database insert isn't necessary
    • Read the opinions.csv (190+gb) chunk at a time from disk while converting into JSON
  • Create a standalone script that can be piped to other tools
  • Improve speed by using DASK DataFrame

About

Access tons of case law data in an easy format with no cap(s)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published