Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an 'official' caller for batch UDFs #632

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

spencerseale
Copy link
Collaborator

@spencerseale spencerseale commented Aug 22, 2024

This is more approachable from a docs point of view compared to as_batch and accepts both in-memory and references to registered UDFs.

Because of the interests of users for calling UDFs with custom resource specifications and access credential names, this function acts to satisfy that interest.

For unit tests, didn't want to interfere with unittest based tests, so created a new test module for pytest-based.

@spencerseale spencerseale self-assigned this Aug 22, 2024
@@ -46,6 +49,8 @@
from . import visualization as viz
from .mode import Mode

logger = logging.getLogger(__name__)
Copy link
Contributor

@JohnMoutafis JohnMoutafis Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For uniformity reasons, I prefer that we use tiledb.cloud.utilities.get_logger_wrapper for logging everywhere.

The method also allows for a verbosity level (verbose=True/False) that should be set from the UDF's arguments (as is the case with the as_batch method) so the logger can be declared inside the exec_batch_udf and the verbose should be "grabbed" from the kwargs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohnMoutafis I agree, but we have a circular import problem because run_dag is in the utilities module.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerseale @JohnMoutafis are we still stuck here? Do I understand correctly that we don't have a circular import yet, but will when run_dag() calls this new function?

If we refactored and moved get_logger_wrapper() to, for example, tiledb.cloud.logging, that would eliminate the potential circular import, yes? I'm willing to do that work.

@spencerseale spencerseale removed the request for review from thetorpedodog October 7, 2024 23:13
@spencerseale spencerseale requested a review from sgillies October 8, 2024 19:26
@spencerseale
Copy link
Collaborator Author

@gspowley @sgillies got kinda buried, but still would like this function for calling batch UDFs in a more "official" manner. Currently we just have the as_batch decorator and that won't work for registered UDFs that are not in memory.

Copy link
Member

@gspowley gspowley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the idea of improving the UX for running tasks and taskgraphs (like this PR, #641, and as_batch). I believe @sgillies has the perspective to organize these ideas in a cohesive way.

@spencerseale
Copy link
Collaborator Author

Up for any ideas, including @taskgraph and @udf decorators that could also just do udf("registered_udf")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants