-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datamule v0.400 update megathread #28
Comments
Features in the pipeline:
|
Hey @john-friedman,hope you have been well, how has the (2) been coming a long based on the previous closed issue: #19 |
Hey @firmai, should be coming along this month. I pushed parsing back a bit due to setting up my own SEC archive. |
Are you under time pressure for one of your projects? |
That's helpful to know, I just want to coordinate with you to make sure I am not replicating what you are doing uncessarily. Currently my time is spent on a patents dataset, but will be moving to filings in about 3 weeks. If you have something by then I will run it and offer suggestions on a thread over here. |
Gotcha. Btw, for patents dataset are you using Google BigQuery? I did some work on patents back in 2022. |
Yeah, using the Bigquery dataset, but it has some issues which I have to resolve! And takes suprisingly long to do. |
Downloading the data or cleaning it? Big Query has some big downsides - ended up doing a lot of stuff locally |
The cleaning is the tough part! Will talk to you about it on our next call. |
Datamule has been updated to v0.400. There will be bugs, please report them.
The good stuff:
The bad stuff:
I'm going to be moving away from using the sec.gov API towards my own infrastructure for several reasons. While, the sec api is great, it has become frustrating for me to use: the rate limits are low, historical files are missing, it goes down sometimes, etc. Hosting my own archive lets me expand the historical timeline, optimize download speed, and lets me iterate faster.
I'm open to making my SEC archive a public utility. If you are someone who can make that happen, feel free to reach out. I will also be releasing a guide on how to host your own SEC archive. The code is open-source.
Benchmarks
Note: soon after release I will be updating the Premium Downloader to be 10x as fast.
The text was updated successfully, but these errors were encountered: