-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate component registry to metadata service #2031
Comments
@kiersten-stokes and I met yesterday to talk about some approaches and thought the following might satisfy a number of requirements (although we also realized we might not have a crystal clear understanding of the requirements). (All names can be changed and currently exist to help with the communication)...
Notes: We can also hang location-specific attributes/properties within a location specifier - they are essentially object-valued relative to the schema. So something like a COS location-specifier could include specific attributes like credentials, bucket name, etc. Although location-specifier properties like categories reside on the specifier, what the front-end receives could include those values on the component. I.e., location specifier (and even component registry) properties can be distributed onto the component definition upon the component definition's retrieval. @kiersten-stokes - please add additional comments if I missed or misrepresented anything. |
I have the groundwork for implementation of this issue laid out now on a local branch. Planning on moving forward with the design as laid out in the issue description above. @lresende or anyone else, let me know if you think another design discussion is needed before I dive in |
Is your feature request related to a problem? Please describe.
The component registries are currently stored in
etc/config/components
. Every time a change is made to the registry, a rebuild is required. Additionally, it is not very user-friendly to add or remove components from the registry JSON files, as seen in discussion #1881.Describe the solution you'd like
As we look toward adding a GUI for managing custom components (#1880), it could make sense to move the implementation of the registries to the metadata service. This also has the added benefit of improved consistency among stored information.
Design Considerations
I think this issue can be broken down into the following 3 parts:
Description and design considerations and questions for each to follow.
1. Restructuring component registry
Motivation
Allows for a user to easily add more components, rather than having to define each individual component one-by-one. This will pair nicely with the ability for users to disable/not display certain components in the palette (issue #2009).
As a standalone feature, this may be of more 'moderate' importance, but if we intend to support users customizing their palette components via moving the component registries to metadata, then it would probably be best to implement this first so as to not cause confusion when we change things down the line.
High-level direction
Currently, the component registries (one for each runtime) contain a list of components and the location of each component spec in an 'unofficial' JSON format like the below.
Instead, we want to move to having user-defined component registries, where only one location is specified per registry "location specifier" and that specifier points to one or several component spec files to be parsed. To start, a location specifier will be one of three types filename, url, and directory, as shown below. See comment below for more details.
Requirements:
airflow_component_catalog.json
andkfp_component_catalog.json
) and instead create a component-registry metadata instance using a schema like the above JSON, with a directory location of where the preconfigured component specs reside (probably anENV_JUPYTER_PATH
in our case)elyra-preconfigured-kfp
andelyra-preconfigured-airflow
[filename w/o extension]_[component name]
, which should provide enough of an assurance against repeatscatalog_entry_id
attribute onComponent
among other minor changes to existing functions, such as the function that retrieves only a singular component (again, due to caching and howget_component
currently works, this shouldn't cause a huge latency issue)reader
(based on registry location type) will need to be pushed intocomponent_registry.py
and out of the parser classes (this helps to clear up the function of the parsers as well, which is simply "to parse" given content)Questions:
2. Moving component registry to metadata
Motivation
Allows a user to add/modify/delete their own component registries, as explained in the above section. This is fairly high importance/impact as it will significantly improve the user experience for customizing their component list.
(If we decide to not implement component registry definitions and instead stick to defining individual components and their locations, most things in this section still apply with minor changes to the schema and other details.)
High-level direction
The potential design raises some questions because components (and the preprocessing required to define them) are quite different from the other metadata namespaces that we currently store and access.
At high level, I think it might be best to keep much of the component parsing logic as-is. The metadata service has its own fetch endpoints, but we can't really use these due to the difference mentioned above. I think the best way to proceed would be to continue to use the existing palette and properties API endpoints (in
pipeline/handlers
) that invoke the processors to parse and return their component details. The majority of changes will occur in thecomponent_registry.py
_read_component_registry
function, which will callMetadataManager.get_all()
to retrieve registry details and loop through specs to constructComponent
objects as it does currently. And then the Metadata API endpoints to add/modify (PUT method), or delete (DELETE method) can be used for these additional operations.Requirements:
component-registries
namespacecomponent-registries
schema and add tometadata/schemas
etc/config/metadata/component-registries
, as explained in the above section (these will be loaded into the share folder on build as they are currently)Questions/Considerations:
3. Handling component categories
Motivation
Allows a user to assign multiple categories to a single component or set of components.
Moderate importance. Would allow users to organize components in a way that makes sense to them.
High-level direction
We want to highly simplify the way that categories are fetched and rendered today, but also include the option for the user to enter some list of categories (that will be translated to a list of strings).
Currently, we only have two component categories:
KFP
andAirflow
. We can have the preconfigured components assigned to these categories, with the option for the user to edit the registry definition to change the category.Requirements:
ComponentCategory
class and all functions that fetch or create such an instancecategories
attribute on aComponent
object to be a list of stringsto_canvas_palette
and in the jinja template to organize components by category (components that have more than one category will therefore be rendered multiple times)Questions/Considerations:
ComponentCategory
class will result in losing the ability to add a description to the category (viewable when hovering over a component in the palette)Additional changes required
ComponentCategory
objects (details to be added)processor_kfp.py
may require minor changes to ensure that filename-based components are loaded correctly according to their stored pathThe text was updated successfully, but these errors were encountered: