Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shareable Kedro-viz backend implementation #1498

Merged
merged 76 commits into from
Sep 27, 2023
Merged

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Aug 21, 2023

Description

Resolves #1520

This PR involves the backend implementation of Shareable URLs on Kedro-viz. In this PR, we have used fsspec to facilitate the uploading of the Kedro-viz build and API responses (in JSON format) to an S3 storage location. Kedro-viz, when deployed in a hosted environment, functions as a serverless application, serving static HTML content and retrieving Kedro project data from the API JSON files.

This PR solves two many issues

  • Hosting Kedro-viz.
  • Fixing the bug with the kedro viz --save-file and --load-file methods. These methods were previously not saving and loading the metadata associated with each node as well as registered pipelines flowchart. They were only saving flowchart for default pipeline

Development notes

Router.py:

This file defines two routers in this PR:

  • api/deploy: A POST request that sends the bucket_name and region from the frontend (FE) and forwards this information to S3Deployer.
  • api/package_compatibilities: A GET request sent by the FE to the BE to check if the fsspec version is greater than or equal to 2023.9.0, which is a requirement for Shareable Kedro-viz. If the user has an older version, the FE will prompt the user to update the dependency to the correct version before enabling this feature.

apps.py:

The main change in this file is the create_api_app_from_file function, which is invoked when the user runs kedro-viz --load-file xyz. Now, this function loads everything, including metadata for nodes and other registered pipeline information.

responses.py:

This file has undergone some refactoring, and one of the main changes is that it contains functions that enable the saving of backend responses to a specified filepath (either local or remote).

integrations/deployment/s3-deployer:

This class, S3Deployer, is the first of many deployer classes. As we expand to more deployer classes, we will introduce a baseDeployer class. The S3Deployer class is self-explanatory and handles the task of uploading content to S3.

server.py:

In this file, we make a change to the kedro-viz --save-file method. It now saves everything, including the flowchart, metadata, and other pipeline information, to the specified filepath.

QA notes

If you have an s3 bucket or a Minio. You can simply do a post request using curl or something else to upload your kedro-viz to s3.

curl -X POST http://localhost:4142/api/deploy \
-H "Content-Type: application/json" \
-d '{"region": "us-east-1", "bucket_name": "s3://kedroviz2"}'

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@rashidakanchwala rashidakanchwala marked this pull request as ready for review September 18, 2023 17:30
@rashidakanchwala rashidakanchwala changed the title [Draft] Shareable Kedro-viz backend implementation Shareable Kedro-viz backend implementation Sep 18, 2023
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to see all tests passing on Circle CI. Awesome !!

Most of the code looks good to me but I have a suggestion for the feature compatibility API design. Please have a look. Thank you !

)

@router.get(
"/package_compatibilities",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, REST API routes should not contain special characters, the only exception being hyphen. It would be better if we call this as /package-compatibilities.

However, I feel we should tackle this route differently if we check for compatibility in the backend. Something like a [POST] /feature-compatibility route which accepts a json body like

   {
       "feature": "DEPLOY_VIZ"
   }

and have some kind of dictionary in the backend for feature_package_compatibility like -

feature_package_compatibility = {
  "DEPLOY_VIZ": {
     "fsspec": "2023.9.0" 
  }
}

This way we can check for feature compatibility and extend this if we need to check for multiple features in future. Since the frontend needs information on why some feature is not compatible, we can send specific message (containing the missing package dependency) or a generic message to the frontend like -

{
"feature": "DEPLOY_VIZ",
"is_compatible": true,
"compatible_packages": {
        "fsspec": "2023.9.0" 
    }
}

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, already updated it to package-compatibilities in the FE PR and will also do the same in this PR.

So for now it's only one package compatibility that we return but yes in future there might be more. In that case I was thinking we would return a List[PackageCompatibilityAPIResponse].

Also, at this point, I am not so sure if we want to make it a POST request having said that I do feel we can better this approach in the future. Let's also hear what others thing about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with both of you: this would be something we can scale going forward, though right now, without a use case to do so, I don't think we need to implement it at this stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a first review and left some comments. I think my main points are around the "architecture" of the viz backend and where some of this new could should actually be. I'm sure this all works, but I'd be cautious about just breaking the intended architecture design that is in place now.

package/kedro_viz/api/rest/requests.py Outdated Show resolved Hide resolved
package/kedro_viz/api/rest/responses.py Show resolved Hide resolved
package/kedro_viz/api/rest/responses.py Show resolved Hide resolved
return version(package_name) # pragma: no cover


def get_package_compatibilities_response():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be very fsspec specific, so I'd include that in the name to avoid confusion. Should this be "public" method? Is anything accessing this directly or will it only be called from within the viz backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, the reason I have kept this generic is incase we have more package compatibilties to test in the future.

)


def write_api_response_to_fs(file_path: str, response: Any, _remote_fs: Any):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does _remote_fs start with an underscore?

package/kedro_viz/api/rest/responses.py Outdated Show resolved Hide resolved
self._upload_deploy_viz_metadata_file()

def get_deployed_url(self):
"""Returns an S3 URL where Kedro viz is deployed"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also calls _deploy() so that's more than just returning the S3 URL, I'd make that clearer in the function name and doc string.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look after our discussion about the architecture. I left some minor comments, but I think it's nearly ready 🙂

file.write(encoded_response)


def save_api_main_response_to_fs(main_loc: str, remote_fs: Any):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main_loc is a slightly vague variable name, I'd name this differently and also add some doc string to describe what this is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps main_path?

raise exc


def save_api_node_response_to_fs(nodes_loc: str, remote_fs: Any):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change loc to path here as well if it makes sense to you.

package/kedro_viz/integrations/deployment/s3_deployer.py Outdated Show resolved Hide resolved
package/tests/test_api/test_rest/test_responses.py Outdated Show resolved Hide resolved
package/tests/test_api/test_rest/test_responses.py Outdated Show resolved Hide resolved
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't change the --save-file arguments, I think we could still update the help text to clarify it is saving multiple files instead of one.

image

It's confusing where I should provide a filename or a directory (maybe it should be an optional argument and have a default name?)

In addition, when I run kedro viz --save-file it try to open a new tab in my chrome browser, I think it's a bug. Should file be save as a .json suffix as seems like they are all JSON.

package/kedro_viz/api/apps.py Outdated Show resolved Hide resolved
package/kedro_viz/api/rest/responses.py Outdated Show resolved Hide resolved
package/kedro_viz/api/rest/router.py Show resolved Hide resolved

import fsspec
from kedro.io.core import get_protocol_and_path
from semver import VersionInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see earlier you are using packaging, I think we should be consistent using either that of semver. And if I remember correctly I read it in some other issue that packaging is a better option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I have reopened the issue and we should replace it. #1460

Comment on lines +399 to +405
encoded_response = EnhancedORJSONResponse.encode_to_human_readable(
jsonable_response
)

with remote_fs.open(file_path, "wb") as file:
file.write(encoded_response)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit surprised it is using "wb" instead of "w". I expect the result of encode_to_human_readable should be text that is readable, but seems like it is returning bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's currently returning bytes. I will have to create an issue to figure the best way to make this more clearer. But it would be out of scope for the current one.

@rashidakanchwala
Copy link
Contributor Author

If we can't change the --save-file arguments, I think we could still update the help text to clarify it is saving multiple files instead of one.

image

It's confusing where I should provide a filename or a directory (maybe it should be an optional argument and have a default name?)

In addition, when I run kedro viz --save-file it try to open a new tab in my chrome browser, I think it's a bug. Should file be save as a .json suffix as seems like they are all JSON.

@noklam -- definitely will fix the text... thanks for flagging!

Good question -- I am not sure if it's a bug.
So when you do kedro viz --save-file it does 2 things -- it opens kedro viz and it also saves your kedro-viz api files in the folder you specify.

@stephkaiser , @tynandebold , @merelcht -- what do you think of the above? To be frank; i was also initially confuse because kedro-viz would open when I did kedro viz --save-file but then I assumed this was the functionality and user's haven't highlighted this.

@tynandebold
Copy link
Member

If we can't change the --save-file arguments, I think we could still update the help text to clarify it is saving multiple files instead of one.
image
It's confusing where I should provide a filename or a directory (maybe it should be an optional argument and have a default name?)
In addition, when I run kedro viz --save-file it try to open a new tab in my chrome browser, I think it's a bug. Should file be save as a .json suffix as seems like they are all JSON.

@noklam -- definitely will fix the text... thanks for flagging!

Good question -- I am not sure if it's a bug. So when you do kedro viz --save-file it does 2 things -- it opens kedro viz and it also saves your kedro-viz api files in the folder you specify.

@stephkaiser , @tynandebold , @merelcht -- what do you think of the above? To be frank; i was also initially confuse because kedro-viz would open when I did kedro viz --save-file but then I assumed this was the functionality and user's haven't highlighted this.

Thanks for pointing that out, Nok! The action of Viz starting and opening when you run kedro viz --save-file has always been there and wasn't introduced in this PR. We can decide if we think it's a bug or not, though either way, we don't need to implement a solution in this PR, so it shouldn't block this work.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in a good state now. Great work @rashidakanchwala ⭐ !

IMO the Viz backend should be refactored before adding any other new big features, this PR showed some of the strange stuff going on with the pydantic models and dataclasses. It might be a good time to do that after shareable viz is merged.

@noklam noklam self-requested a review September 26, 2023 11:59
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree with Merel that it would be great to spend some time after this to clean up the backend.

It's a nice feature to add and I am interested to make a GitHubPageDeployer after this is merged :)

@rashidakanchwala rashidakanchwala merged commit 1fa5a02 into main Sep 27, 2023
1 check passed
@rashidakanchwala rashidakanchwala deleted the shareable-flowchart branch September 27, 2023 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shareable Kedro-viz backend implementation
5 participants