-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/gfql endpoint #615
base: master
Are you sure you want to change the base?
Dev/gfql endpoint #615
Conversation
graphistry/compute/python_remote.py
Outdated
:param api_token: Optional JWT token. If not provided, refreshes JWT and uses that. | ||
:type api_token: Optional[str] | ||
|
||
:param dataset_id: Optional dataset_id. If not provided, will uplaod current data, store that dataset_id, and run GFQL against that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment references gfql here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also could clarify that if a dataset_id exists on the plottable it will be reused, as opposed to reuploading every time python_remote is called with it omitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a bit problematic in that this would trigger re-upload, and not sure if that's 'right':
g2 = g1....
# auto-upload 1
res1 = g2.python_remote(script1)
# auto-upload 2
res2 = g2.python_remote(script2)
A few ideas:
- inplace update of
g2
, breaking purity ofv3 = g2.xyz()
- Some sort of global memoizable trick, so even if
g2._dataset_id
is not mutated on first upload, we can detectg2
did have a recent upload? Ex: some sort of global weak reference lookup table of last ng
objects? - Don't worry about it, and encourage users to do an explicit
g2 = g1.upload()...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didnt consider this... this cant happen in python_remote because that never modifies the plottable, but chain_remote definitely does.
my opinion is that perhaps we should avoid self mutation of plottable and instead return a "clone" (with identical metadata etc) of the original plottable with the updated node and edge lists? this should also better mirror the behaviour of non-remote chain anyway?
i dont have strong opinions on autouploading the clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the python endpoint return a Plottable, JSON, or any? How would I know ahead of time, or how would the python client sniff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now the python endpoint returns either a string or a json. which one it returns is completely dependent on the return type of the function defined (by the user) in execute
. the python endpoint code will detect the type of the returned value and use the appropriate response format.
you can read the MIME type of the response to get which
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it might not be so bad to extend the python endpoint to reuse the gfql endpoint's machinery here, i'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(mimetype is cool, that helps!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on avoiding mutation on the base obj, maybe we do some variants like:
# generic helper
_ : Any = g1.remote_python("...", output_mode=...)
# explicit mypy-friendly
g2 = g1.remote_python_g("...")
g2, g1_bound = g1.remote_python_g2("...")
o = g1.remote_python_json("...")
o, g1_bound = g1.remote_python_json2("...")
I'm tempted to not do the <xyz>2
variants above, and if someone wants to reuse uploads, they have to be explicit:
g1 = g0.upload(...)
# or g1 = graphistry.bind(dataset_id="...")
g2 = g1.remote_python_g(...)
Some very exciting things happening here -- maybe time to call pygraphistry
1.x.x
??Remote dataset binding
Bind to remote (lazy):
Decouple upload from plot(), and bind the upload:
Remote GFQL
Remote Python too!
Notes
Changes
Now tracks
dataset_id
,nodes_file_id
,edges_file_id
, andurl
, including clearing them out whenever the nodes/edges gets re-bound with new dataplot(render=
expanded frombool
toUnion[bool, Literal["auto", "g", "ipython", "databricks", "browser"]]
to make more predictable