-
-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Support? #209
Comments
I'm totally down! you caught me at the perfect time, I was literally considering releasing Couple of follow up questions:
|
Let me see if I can answer these coherently: 1&4.) Totally! Really all I did in my implementation was slightly modify a PyCap Project object to have the "_call_api()" method return the payloads instead of actually running the request. Then I extend the payloads by records and chunk them into groups. Each chunk is then added to a list of tasks and then I use asyncio and aiohttp to submit all of the requests instead of using the requests module. My goal was building this out to avoid reinventing the wheel and lean on PyCap, so I do think this is very possible! 2.) Potentially. Though it would probably be a few extra dependencies (aiohttp, aiofiles, aiopath, maybe a few more or a few less). Though I do think a better implementation is to hide async logic under the hood of synchronous functions and leave the PyCap function signatures relatively unchanged. I'm pretty sure this is how REDCapR works. 3.) Instead of my method, it could be possible to just add an "async" variable that defaults to None, but can be given a dictionary to act as kwargs for any extra info needed to run an async request. Like I said, I think the functions and their signatures could be left relatively unchanged. Right now, running my version looks something like this (gc is an instance of my wrapper object): Though I think a better implementation would be to contain that within the object like this (in this case, gc would be the new and improved PyCap Project object): Not sure if I explained all of my thoughts well enough here. Happy to set up a phone call or something over email if that would be helpful! |
this looks sweet 🍰 very cool stuff on About I think the from redcap import Project
proj = Project(url, token)
records = proj.export_records(async=True)
users = proj.export_users(async=True) This means that my program wouldn't need to wait for |
You know...now that you mention it, I think you're right that it just chunks and runs synchronously. Would make sense why I can pull data faster with GREENCap! So Python relatively recently implemented a task-based asynchronous paradigm. It's based on the async and await keywords. So, the code would look more like this if you wouldn't want these to be blocking (or something close to it):
What I had in mind was to leave the implementation looking the same as before:
TL;DR: The goal would be to have the stuff like |
Awesome, I like adding extensibility/possibility for composition for people who want to opt into it. We can write some good docs as well about how use the async features in tandem with libraries like asyncio Id say you're good to go on getting started on the work. Lmk if you have any questions/trouble at all with setting up the development environment/running tests |
Heard! May need some help with PyTest since that'll be new for me and I'm obviously gonna wanna make sure the methods work as they did before! 😄 Will probably create the fork tomorrow and should hopefully have time Friday to dive into this! |
Awesome. I'm on a trip this weekend, so any questions you have I'll get to them next week |
Spent some time on this over the weekend getting familiar with the repo and getting some pseudocode and a plan together for the refactor to run asynchronous requests while keeping the "user facing" project class relatively unchanged. For transparency (and my own sake 😄), here are my notes: base.py:
method classes:
coroutine.py:
request.py:
project.py:
|
Fantastic, this all makes sense How do you feel about adding an article to the documentation site as well? mkdocs is what builds the site and it just uses markdown, so it shouldn't be too bad to have an article about how to take advantage of the asynchronous execution. Also, let me know if you need help updating tests at all. You won't be able to run any of the integration tests locally since you don't have a super user token for the redcapdemo site (which allows you to create projects using the API). And the unit tests have a little bit of architecture behind them using the responses package, which mocks a redcap server for us to test against. I refactored he unit tests to make them easier to update but it's a lot to wrap your head around at first. On a side note, how was the project to wrap your head around? Was the architecture easy to understand? I'm open to any other refactoring during this that would improve the package in general. |
I think a markdown page on how to use the package asynchronously makes a lot of sense. And yeah, totally makes sense for me to handle that. Ran the test-suite locally and noticed the super user token issue. I'm not exactly new to unit testing, but definitely new to pytest, so I'll certainly be bugging you about that! The architecture was pretty easy to follow. Was just some mental gymnastics to figure out how I wanted to refactor this stuff. Speaking of markdown and understanding the architecture...I am so hyped to learn that mermaid works in comments as well! classDiagram
Base <|-- Method
Method *-- Project
Project --o Request
and I'll probably be updating it to something like this (I include the interface because I would like to add a ProjectStream object at some point for GreenCap, but that's down the road): classDiagram
Base <|-- Method
Method *-- Project
Project --o Request
Request --o Coroutine
Method *-- Coroutine
IMethod <|-- Method
<<interface>> IMethod
IProject <|-- Project
<<interface>> IProject
Not 100% on the Coroutine yet. I'm contemplating having the Method object being somehow included in the Coroutine, but am not married to the idea. Just trying to think through where it best makes sense to keep the Request validation logic. A bunch of the Request logic has to be moved into the Coroutine so that bit is really what I have left to think about before the rubber meets the road on all of this. I think adding async is enough in general, don't want to delay your release for too long! 😄 Although perhaps loading Project classes and re-used requests from YAML could be easy enough to add and I think it's super handy. Let's users just update a config file instead of re-writing a new script. I had this setup in GreenCap where the config stuff was saved under |
This sounds like useful stuff -- also sounds like it could be included in a 2.1 release since there purely new features that wouldn't break existing architecture
I'm not terribly familiar with async stuff, one idea I had while reading this though was what if when |
Your logic for I'm more just trying to wrap my head around chunking the api calls and recombining them and how that effects xml, json, and csv returns along with binary/file returns and making sure all of the type-checking is right. |
Nitpick: What about @tshanebuckley Are you still planning on tackling this within the next few weeks? It seems like this won't be a breaking change in terms of the package API, just an additional argument (which is great). If you think it might be a while before you get to it, then I think I'll release 2.0 in the meantime, and this can be a feature of 2.1 However, I'm willing to hold off if you're planning to work on this, because I know this will change a lot of how the package works internally, and I don't want to add a bunch of new endpoints in the next week just to make you have to convert those over to the new internal architecture as well. |
I have been working on this, just haven't pushed anything up to GitHub as of yet. I'm trying to keep each of the methods completely self-contained in the returned coroutine, but I think the original architecture works counter to this. I need Coroutine to call, for example export_records to generate the payloads, but export_files runs _call_api to instantiate the Coroutine. This is basically for chunking the payload since I would rather chunk then generate each payload based on those chunks. I definitely over-estimated how long this would take me. 😅 I think I figured out how to implement this without breaking away from the current architecture (just breaking out the code that generates the payload into a method and passing that method to _call_api). I'll push my working repo tonight and let you decide on the 2.0 release. I'm cool either way! |
Awesome, looking forward to seeing what you've got 👍🏻 |
Just pushed. Started some work in the method classes, mainly files. You'll see that I took a decent amount of the code from request.py and have been replacing the request package with aiohttp and moving it into coroutine.py. Basically have to get every method call end with _call_api, where the plan is to have that return the coroutine or prior expected input. Still have to figure out what I am going to do with _return_data along with identifying errors on multiple requests...so that's still a work in progress. |
Hello! Happy to see some significant progress/updates have been made over the past few months on this project! I was curious if there are any plans to add async support to PyCap?
I've recently been working on some code that encapsulates PyCap and uses it to at least fetch records in an asynchronous fashion. Also played around with a FastAPI middleware to allow easier access to data in a more REST-like manner to support application development. (It's not very production-worthy as of yet, though)
If there is no async work in the pipelines yet, I would be happy to help contribute! :-D
The text was updated successfully, but these errors were encountered: