-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix vfs.ls
with access_credentials_name
#512
Conversation
c6cf92f
to
4e9cf15
Compare
This pull request has been linked to Shortcut Story #35137: 1-line SOMA ingest should only require an access credential name. |
@thetorpedodog ping 🙏 |
@thetorpedodog thank you for punctuational review. What I really need help with is the two big-picture questions I wrote about in the description field. Thanks in advance for any help you can offer. I am in need of expert guidance here. |
I guess the question I have at this stage is: why does the setup task run on the client side? It seems like if that were run on the server side, that would solve the problem. |
This PR has the setup task run server-side as described in the description section of this PR. |
Coming back to this after way too long, I think I have a solution here. The change that needs to happen is in the Right now it starts the one-node graph to build the actual ingestion graph, but it returns the ID of the one-node graph. Instead, we need to wait for that one-node graph to complete (i.e., for it to build and launch the actual ingestion graph). Then, the one-node graph will return the ID of the actual ingestion graph, and the def run_ingest_workflow(...) -> Dict[str, str]:
...
grf = build_ingest_workflow_graph(...)
grf.compute()
the_node = next(iter(grf.nodes.values()))
real_graph_uuid = the_node.result()
return {
"status": "started",
"graph_id": str(real_graph_uuid),
} |
8e0973e
to
afa2f06
Compare
vfs.ls
with access_credentials_name
[WIP]vfs.ls
with access_credentials_name
@thetorpedodog ready for next round of review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things, otherwise this looks pretty good!
66f0af7
to
37fe7de
Compare
@thetorpedodog ready for the next round of review! |
@thetorpedodog ping 🙏 |
logging.debug("ENUMERATOR ENTER") | ||
logging.debug("ENUMERATOR INPUT_URI %s", input_uri) | ||
logging.debug("ENUMERATOR OUTPUT_URI %s", output_uri) | ||
logging.debug("ENUMERATOR DRY_RUN %s", str(dry_run)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for str
here; %s transforms everything into a string.
Overview
When attempting to use the
run_ingest_workflow
function to ingest H5AD files into SOMA, users are expected to provide AWS credentials through theextra_tiledb_config
argument. This is a departure from other 1-line ingestors, which allow users to provide only anaccess_credential_name
(or aconfig
) referencing their role-based credential.The function fails if AWS credentials are not defined in the local environment or passed through the
extra_tiledb_config
argument, even whenaccess_credential_name
is provided because atiledb.VFS
instance is created locally to determine whether theinput_uri
points to a file or directory.Why this is
run_ingest_workflow
runs on the clientvfs
in order to:.h5ad
) or not (a prefix).h5ad
leaves at that prefixSample script
clingt.py
Expected behavior
Since the prefix provided to
clingt.py
iss3://tiledb-johnkerl/s/a/stack-small
, I expect one enumerator node tovfs.ls
that prefix, and four leaves to be launched, one for each.h5ad
file at that prefix.Sample logs
The first one is user-visible, from
clingt.py
, and the second one is linked to from there:The next one (the enumerator node) can't be found from the ones above. The user has to discover it by searching compute logs :(. The first one is the enumerator (four-point candelabra); the ones after are the four leaves:
See also
[sc-35137]