-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shareable Kedro-viz backend implementation #1498
Conversation
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
…dro-viz into kedro-viz-save-load Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
…dro-viz into shareable-flowchart
…dro-viz into shareable-flowchart
Signed-off-by: ravi-kumar-pilla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to see all tests passing on Circle CI. Awesome !!
Most of the code looks good to me but I have a suggestion for the feature compatibility API design. Please have a look. Thank you !
package/kedro_viz/api/rest/router.py
Outdated
) | ||
|
||
@router.get( | ||
"/package_compatibilities", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, REST API routes should not contain special characters, the only exception being hyphen. It would be better if we call this as /package-compatibilities.
However, I feel we should tackle this route differently if we check for compatibility in the backend. Something like a [POST] /feature-compatibility route which accepts a json body like
{
"feature": "DEPLOY_VIZ"
}
and have some kind of dictionary in the backend for feature_package_compatibility like -
feature_package_compatibility = {
"DEPLOY_VIZ": {
"fsspec": "2023.9.0"
}
}
This way we can check for feature compatibility and extend this if we need to check for multiple features in future. Since the frontend needs information on why some feature is not compatible, we can send specific message (containing the missing package dependency) or a generic message to the frontend like -
{
"feature": "DEPLOY_VIZ",
"is_compatible": true,
"compatible_packages": {
"fsspec": "2023.9.0"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, already updated it to package-compatibilities
in the FE PR and will also do the same in this PR.
So for now it's only one package compatibility that we return but yes in future there might be more. In that case I was thinking we would return a List[PackageCompatibilityAPIResponse].
Also, at this point, I am not so sure if we want to make it a POST request having said that I do feel we can better this approach in the future. Let's also hear what others thing about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with both of you: this would be something we can scale going forward, though right now, without a use case to do so, I don't think we need to implement it at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i read this earlier today, and thought it seemed relevant here. Hence sharing - https://www.techtarget.com/whatis/definition/You-arent-gonna-need-it#:~:text=YAGNI%20principle%20(%22You%20Aren',desired%20increased%20frequency%20of%20releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a first review and left some comments. I think my main points are around the "architecture" of the viz backend and where some of this new could should actually be. I'm sure this all works, but I'd be cautious about just breaking the intended architecture design that is in place now.
return version(package_name) # pragma: no cover | ||
|
||
|
||
def get_package_compatibilities_response(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be very fsspec
specific, so I'd include that in the name to avoid confusion. Should this be "public" method? Is anything accessing this directly or will it only be called from within the viz backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, the reason I have kept this generic is incase we have more package compatibilties to test in the future.
) | ||
|
||
|
||
def write_api_response_to_fs(file_path: str, response: Any, _remote_fs: Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does _remote_fs
start with an underscore?
self._upload_deploy_viz_metadata_file() | ||
|
||
def get_deployed_url(self): | ||
"""Returns an S3 URL where Kedro viz is deployed""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also calls _deploy()
so that's more than just returning the S3 URL, I'd make that clearer in the function name and doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another look after our discussion about the architecture. I left some minor comments, but I think it's nearly ready 🙂
file.write(encoded_response) | ||
|
||
|
||
def save_api_main_response_to_fs(main_loc: str, remote_fs: Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main_loc
is a slightly vague variable name, I'd name this differently and also add some doc string to describe what this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps main_path
?
raise exc | ||
|
||
|
||
def save_api_node_response_to_fs(nodes_loc: str, remote_fs: Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change loc
to path
here as well if it makes sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't change the --save-file
arguments, I think we could still update the help text to clarify it is saving multiple files instead of one.
It's confusing where I should provide a filename or a directory (maybe it should be an optional argument and have a default name?)
In addition, when I run kedro viz --save-file
it try to open a new tab in my chrome browser, I think it's a bug. Should file be save as a .json
suffix as seems like they are all JSON.
|
||
import fsspec | ||
from kedro.io.core import get_protocol_and_path | ||
from semver import VersionInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see earlier you are using packaging
, I think we should be consistent using either that of semver
. And if I remember correctly I read it in some other issue that packaging
is a better option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I have reopened the issue and we should replace it. #1460
encoded_response = EnhancedORJSONResponse.encode_to_human_readable( | ||
jsonable_response | ||
) | ||
|
||
with remote_fs.open(file_path, "wb") as file: | ||
file.write(encoded_response) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised it is using "wb" instead of "w". I expect the result of encode_to_human_readable
should be text that is readable, but seems like it is returning bytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's currently returning bytes. I will have to create an issue to figure the best way to make this more clearer. But it would be out of scope for the current one.
@noklam -- definitely will fix the text... thanks for flagging! Good question -- I am not sure if it's a bug. @stephkaiser , @tynandebold , @merelcht -- what do you think of the above? To be frank; i was also initially confuse because kedro-viz would open when I did |
Thanks for pointing that out, Nok! The action of Viz starting and opening when you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in a good state now. Great work @rashidakanchwala ⭐ !
IMO the Viz backend should be refactored before adding any other new big features, this PR showed some of the strange stuff going on with the pydantic
models and dataclasses. It might be a good time to do that after shareable viz is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I agree with Merel that it would be great to spend some time after this to clean up the backend.
It's a nice feature to add and I am interested to make a GitHubPageDeployer after this is merged :)
Description
Resolves #1520
This PR involves the backend implementation of Shareable URLs on Kedro-viz. In this PR, we have used fsspec to facilitate the uploading of the Kedro-viz build and API responses (in JSON format) to an S3 storage location. Kedro-viz, when deployed in a hosted environment, functions as a serverless application, serving static HTML content and retrieving Kedro project data from the API JSON files.
This PR solves two many issues
Development notes
Router.py:
This file defines two routers in this PR:
apps.py:
The main change in this file is the
create_api_app_from_file
function, which is invoked when the user runs kedro-viz --load-file xyz. Now, this function loads everything, including metadata for nodes and other registered pipeline information.responses.py:
This file has undergone some refactoring, and one of the main changes is that it contains functions that enable the saving of backend responses to a specified filepath (either local or remote).
integrations/deployment/s3-deployer:
This class, S3Deployer, is the first of many deployer classes. As we expand to more deployer classes, we will introduce a baseDeployer class. The S3Deployer class is self-explanatory and handles the task of uploading content to S3.
server.py:
In this file, we make a change to the kedro-viz --save-file method. It now saves everything, including the flowchart, metadata, and other pipeline information, to the specified filepath.
QA notes
If you have an s3 bucket or a Minio. You can simply do a post request using curl or something else to upload your kedro-viz to s3.
Checklist
RELEASE.md
file