Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2035] Expose parameter metadata #275

Merged
merged 13 commits into from
Oct 14, 2020
Merged

Conversation

921kiyo
Copy link
Contributor

@921kiyo 921kiyo commented Oct 6, 2020

Description

As an extension for metadata side panel, we are exposing parameters metadata to Viz. Here are examples of the endpoints and and outputs.

http://127.0.0.1:4141/api/nodes/d577578a (endpoint for params:xxx)

{
  "parameters": 0.2
}

http://127.0.0.1:4141/api/nodes/f1f1425b (endpoint for parameters)

{
  "parameters": {
    "example_learning_rate": 0.01,
    "example_num_train_iter": 10000,
    "example_test_data_ratio": 0.2
  }
}

Development notes

QA notes

Modified unit tests

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Legal notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@921kiyo 921kiyo self-assigned this Oct 6, 2020
@921kiyo 921kiyo requested review from limdauto and removed request for yetudada October 6, 2020 12:02
@richardwestenra
Copy link
Contributor

This is great, thank you! So if I were to click on a single parameter node on the chart, which type of endpoint would I get? I assume it'd be the single params:xxx one, right? Can you please give a bit more info about the difference between these two?

@921kiyo
Copy link
Contributor Author

921kiyo commented Oct 7, 2020

This is great, thank you! So if I were to click on a single parameter node on the chart, which type of endpoint would I get? I assume it'd be the single params:xxx one, right? Can you please give a bit more info about the difference between these two?

Yes! In Kedro side, there are 2 ways to define parameters. Let's say in my parameters.yml in Kedro side, I have the following parameter definition.

example_test_data_ratio: 0.2
example_num_train_iter: 10000
example_learning_rate: 0.01

If I want to access all of them in one of my Kedro nodes, I can use a keyword parameters, which returns all of them as a dictionary like

"parameters": {
    "example_learning_rate": 0.01,
    "example_num_train_iter": 10000,
    "example_test_data_ratio": 0.2
  }

But image if I have 1000 parameters in my parameters.yml, I probably don't want to grab all of them at once. If I only need to access just one of them, I can use params:example_test_data_ratio

From Kedro Viz UI, both params:xxx and parameters show up as parameter nodes

Screenshot 2020-10-07 at 10 46 38

Hope this helps :)

Copy link
Collaborator

@limdauto limdauto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@richardwestenra
Copy link
Contributor

richardwestenra commented Oct 7, 2020

ah okay interesting, thanks! Looks like you can access them both from clicking nodes on the graph then. It'd be great for these to have the same structure if possible, but I can't think of a way I'd do it better. I guess when receiving the data, we can just use typeof parameters === 'object', as long as the content of the single-param property parameters will only ever be either an object literal or a number/string/boolean. If it's possible for it to be something else (e.g. an array or object literal) then maybe we should use a different structure.

To explain better what I mean:

// params:xxx
{
  // these are fine:
  "parameters": 0.2
  "parameters": 'this is a string'
  "parameters": true
  // these would cause problems, because I can't distinguish them from the full dictionary:
  "parameters": [0, 1, 2, 3]
  "parameters": { a: 1, b: 2 }
}

The more I think about it, the more I think that this structure is running too many risks in assuming that these are distinguishable. If we want to be more clear about distinguishing them, maybe we should use a different property name, e.g. parameter vs parameters?

However you could also make the case that they just don't need to be distinguished, and then we can just display whatever the content is? Depends whether we should just display multiple parameters as code, or separate them and design them differently

@921kiyo
Copy link
Contributor Author

921kiyo commented Oct 7, 2020

To give you more example, parameters can be nested.
If I have the following paramter definitions in parameters.yml,

example_test_data_ratio:
  nested1: 0.2
  nested2: 0.3
  nested3: 0.4
example_num_train_iter: 10000
example_learning_rate: 0.01

Hitting params:example_test_data_ratio will return

{
  "parameters": {
    "nested1": 0.2,
    "nested2": 0.3,
    "nested3": 0.4
  }
}

and parameters endpoint will return

{
  "parameters": {
    "example_learning_rate": 0.01,
    "example_num_train_iter": 10000,
    "example_test_data_ratio": {
      "nested1": 0.2,
      "nested2": 0.3,
      "nested3": 0.4
    }
  }
}

Viz UI will look the same as above screenshot (just metadata is nested)

@richardwestenra
Copy link
Contributor

@921kiyo cool in that case, as mentioned on Slack, I propose that for single params, you just include a single key/value pair. instead of this:

{
  parameters: 0.01,
}

do this instead:

{
  parameters: {
    "example_learning_rate": 0.01,
  }
}

So that way, the formatting is consistent and we can treat them the same way.

@richardwestenra richardwestenra changed the title Expose parameter metadata [KED-2035] Expose parameter metadata Oct 8, 2020
# In case of 'params:' prefix
parameters_metadata = {
"parameters": {
next(iter(parameters)): next(iter(parameters.values())).load()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readability is not great here. Can you reduce the nesting by doing

key, value = next(iter(parameters.items()))

# return empty JSON for parameters type
return jsonify({})
parameters = node["obj"]
if isinstance(parameters, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of such isinstance here. Can we maybe explicitly save parameter name in the dict we put in _JSON_NODES?
Something like:

if "parameter_name" in node:
    # handle "params:..."
else:
    # handle "parameters"

@limdauto wdyt?

Copy link
Contributor Author

@921kiyo 921kiyo Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could do that but how do you get "parameter_name" in this function? we can only access to parameter_name from next(iter(parameters.items())) correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant I don't particularly like the fact that instead of the object we put the dictionary with name and object. Parameter name can have its own top-level key rather than being squeezed into the obj key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriiDeriabinQB Hope I understand you correctly, but it sounds like you're asking for the single-parameter endpoint to have a different structure from the multiple-parameter endpoint, is that right? Kiyo had it like that originally, but I asked him to change it.

If you think about this from the front-end perspective, we might want to separate out different keys to display them differently in the viz meta panel when clicking on a 'Parameters' node - e.g.:
image
but if you click on a single node, we might want to display it the same way, but if it's not in the same format we might not be able to recognise it as a distinct type just from the structure, because a parameter can be one of lots of different types. So we're using the same structure to keep it simple on the front end. Does that help? Hope I haven't misunderstood you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardwestenra I'm still keeping the format you asked, his suggestion was more about how to generate it in code :)

Copy link
Contributor

@DmitriiDeriabinQB DmitriiDeriabinQB Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardwestenra yeah, I read through your discussion with Kiyo above and understand the solution. I wasn't talking about the format it is exposed to the frontend but rather the way the data stored in the backend. I'm moderately cautious about the complexity of the data wrangling logic on Python side since the data itself is quite trivial.
TLDR: It's purely the discussion about the Python side of things and should not affect the API contract.

@921kiyo 921kiyo mentioned this pull request Oct 12, 2020
6 tasks
if "parameter_name" in node:
# In case of 'params:' prefix
parameters_metadata = {
"parameters": {node["parameter_name"]: node["obj"].load()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry could you remind me what does node["obj"].load() do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node["obj"] is MemoryDataSet, so it will load the value of params:key parameter (e.g. 0.2)

@@ -539,8 +541,15 @@ def nodes_metadata(node_id):
dataset_metadata = _get_dataset_metadata(node)
return jsonify(dataset_metadata)

# return empty JSON for parameters type
return jsonify({})
if "parameter_name" in node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better (to me personally at least), thanks @921kiyo 👍

@921kiyo 921kiyo merged commit 7073653 into main Oct 14, 2020
@921kiyo 921kiyo deleted the feature/parameter-endpoint branch October 14, 2020 14:06
@richardwestenra richardwestenra mentioned this pull request Oct 20, 2020
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants