Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add runtime fields/deltas to def, proxy, job data elements #5138

Merged
merged 13 commits into from
Nov 18, 2022

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Sep 16, 2022

closes #5054

Includes runtime field for def, proxy, and jobs .. with:

  • def the task/family original workflow definition
  • proxy the task/family point version, that changes (deltas available) as soon as you broadcast
  • jobs, a live record of what they ran with.

example:
image
broadcasting:

cylc broadcast -s '[environment]GREETING = Broadcasting hello' '~sutherlander/linear/run1'

and then running foo again results in:

          {
            "id": "~sutherlander/linear/run1//20220901T00/foo",
            "state": "succeeded",
            "runtime": {
              "platform": "",
              "script": "sleep 2; echo \"$GREETING\"",
              "initScript": "echo 'Me first'",
              "envScript": "echo \"Hi first, I'm second\"",
              "errScript": "echo 'Boo!'",
              "exitScript": "echo 'Yay!'",
              "preScript": "sleep 1",
              "postScript": "sleep 1",
              "workSubDir": "",
              "executionTimeLimit": 0,
              "directives": {},
              "environment": {
                "GREETING": "Broadcasting hello"
              },
              "outputs": {}
            },
            "jobs": [
              {
                "id": "~sutherlander/linear/run1//20220901T00/foo/02",
                "state": "succeeded",
                "jobLogDir": "/home/sutherlander/cylc-run/linear/run1/log/job/20220901T00/foo/02",
                "runtime": {
                  "platform": "localhost",
                  "script": "sleep 2; echo \"$GREETING\"",
                  "initScript": "echo 'Me first'",
                  "envScript": "echo \"Hi first, I'm second\"",
                  "errScript": "echo 'Boo!'",
                  "exitScript": "echo 'Yay!'",
                  "preScript": "sleep 1",
                  "postScript": "sleep 1",
                  "workSubDir": "",
                  "executionTimeLimit": 0,
                  "directives": {},
                  "environment": {
                    "GREETING": "Broadcasting hello"
                  },
                  "outputs": {}
                }
              },
              {
                "id": "~sutherlander/linear/run1//20220901T00/foo/01",
                "state": "succeeded",
                "jobLogDir": "/home/sutherlander/cylc-run/linear/run1/log/job/20220901T00/foo/01",
                "runtime": {
                  "platform": "localhost",
                  "script": "sleep 2; echo \"$GREETING\"",
                  "initScript": "echo 'Me first'",
                  "envScript": "echo \"Hi first, I'm second\"",
                  "errScript": "echo 'Boo!'",
                  "exitScript": "echo 'Yay!'",
                  "preScript": "sleep 1",
                  "postScript": "sleep 1",
                  "workSubDir": "",
                  "executionTimeLimit": 0,
                  "directives": {},
                  "environment": {
                    "GREETING": "Hello from foo!"
                  },
                  "outputs": {}
                }
              }
            ]
          }

Cancel broadcast will create deltas on effected proxy nodes also.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PRs raised to both master and the relevant maintenance branch.

@dwsutherland dwsutherland self-assigned this Sep 16, 2022
@dwsutherland dwsutherland changed the title add runtime fields/deltas to def, proxy, jobs add runtime fields/deltas to def, proxy, job data elements Sep 16, 2022
@hjoliver
Copy link
Member

Did you mean to remove cylc/flow/data_messages_pb2.py ?

@dwsutherland
Copy link
Member Author

dwsutherland commented Sep 16, 2022

Did you mean to remove cylc/flow/data_messages_pb2.py ?

It's not removed (nothing would work otherwise), I used the corresponding protoc:

$ protoc --version
libprotoc 3.19.4

It's just massively reduced.. like in:
#4901

(but not using that new version, so must have happened in protobuf==3.19.*)..

optional string env_script = 11;
optional string err_script = 12;
optional string exit_script = 13;
optional float execution_time_limit = 14;
optional string platform = 15;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left this in, so current API doesn't break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to keep this as the job platform is chosen based on the runtime configuration (e.g. platform could be set to a platform-group) causing these two values to differ.

@@ -617,40 +617,6 @@ def _create_job_log_path(workflow, itask):
exc.filename = target
raise exc

@staticmethod
def _get_job_scripts(itask, rtconfig):
Copy link
Member Author

@dwsutherland dwsutherland Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed to be a double up of what's already done in the config.py .. The only difference is the replacement of environmental variable CYLC_TASK_CYCLE_POINT for str(itask.point) (which isn't available earlier):

            comstr = (
                "cylc workflow-state "
                + " --task=" + itask.tdef.workflow_polling_cfg['task']
                + " --point=" + str(itask.point)
            )

It's removal doesn't appear to have an impact, however, I could be wrong.. (is the env variable ever not available?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CYLC_TASK_CYCLE_POINT should always be available in job execution environments so should be good.

Took a look and I think we are safe to remove this, cheers.

@dwsutherland dwsutherland force-pushed the runtime-api-fields branch 3 times, most recently from a18042d to a95d9be Compare September 16, 2022 04:33
@oliver-sanders oliver-sanders added this to the cylc-8.1.0 milestone Sep 21, 2022
@MetRonnie MetRonnie self-requested a review September 27, 2022 15:59
cylc/flow/data_messages.proto Outdated Show resolved Hide resolved
Comment on lines +197 to +200
try:
platform = rtconfig['platform']['name']
except (KeyError, TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't rtconfig['platform'] just be a string of the platform name? In what circumstance would it be a dict containing a name key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusing thing is.. Platform gets changed before job submission, both exist.

The dict version appears to be more common..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is confusing. Is it just because one of your calls to runtime_from_config passes in the job config instead of the taskdef.rtconfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @dwsutherland - one question to respond to here. As I recall, I didn't think it should be like this, hence my suggested reason for it.

Copy link
Member Author

@dwsutherland dwsutherland Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the platform gets fully resolved/loaded on job construction (into a parsec dict).. however it's just a string in the workflow config runtime.

cylc/flow/network/schema.py Outdated Show resolved Hide resolved
cylc/flow/scripts/broadcast.py Show resolved Hide resolved
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested out in graphiql, looks good. Just noticed 1 typo

cylc/flow/network/schema.py Outdated Show resolved Hide resolved
@dwsutherland
Copy link
Member Author

dwsutherland commented Sep 29, 2022

This will need to be rebased once #4901 goes in (or visa versa)

(and protobuf module regenerated)

@oliver-sanders

This comment was marked as resolved.

@MetRonnie
Copy link
Member

Another problem Oliver has just realised: the order of the [environment] variables needs to be preserved, which we don't think it currently is by the GraphQL mechanism if using a dictionary as you've done at present

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 3, 2022

I think we'll need a list of environment variables to get around this e.g:

{
  "environment": [
    {"key": "answer", "value": 42}
  ]
}

@dwsutherland

This comment was marked as resolved.

@dwsutherland
Copy link
Member Author

Ok I've fixed the problem.. I was using get() with ordered default dict, I changed to that as a workaround for some tests (so they might fail, and I'll just need to adjust the test(s))..

WRT the environment order:

Another problem Oliver has just realised: the order of the [environment] variables needs to be preserved, which we don't think it currently is by the GraphQL mechanism if using a dictionary as you've done at present

Dictionaries are ordered, and I dump them as string fields:

PbRuntime(
        platform=platform,
        script=rtconfig['script'],
        init_script=rtconfig['init-script'],
        env_script=rtconfig['env-script'],
        err_script=rtconfig['err-script'],
        exit_script=rtconfig['exit-script'],
        pre_script=rtconfig['pre-script'],
        post_script=rtconfig['post-script'],
        work_sub_dir=rtconfig['work sub-directory'],
        execution_time_limit=rtconfig['execution time limit'],
        directives=json.dumps(directives),
        environment=json.dumps(environment),
    )

The resolver does:

def resolve_json_dump(root, info, **args):
    field = getattr(root, to_snake_case(info.field_name), '{}') or '{}'
    return json.loads(field)

So unless something odd happens between this and firing it off to the client, the fields should be fine..

@MetRonnie
Copy link
Member

I've had a look at [environment] order and confirmed the response order is identical to that which you get in log/config/NN-start-MM.cylc, including when inheritance is involved.

@MetRonnie

This comment was marked as resolved.

@dwsutherland
Copy link
Member Author

Getting a KeyError on tests/functional/retries/01-submission-retry.t

Sorted now.. It's because some submission failures happen before full job construction.
(this is why I had changed it before, causing the field deletion due to parsec dictionary behavior with get())

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 18, 2022

WRT the environment order:

Another problem Oliver has just realised: the order of the [environment] variables needs to be preserved, which we don't think it currently is by the GraphQL mechanism if using a dictionary as you've done at present

Dictionaries are ordered, and I dump them as string fields:

Dictionaries are ordered in Python land, however, they are not in either JSON or JS. Because JSON is the transport format GraphQL uses and the UI code is JS I would not expect this order to be preserved.

So I think we need a JSON structure like this to preserve order:

[
  {
    "key": "FOO",
    "value": "42"
  },
  {
    "key": "BAR",
    "value": "answer"
  }
]

@dwsutherland dwsutherland force-pushed the runtime-api-fields branch 2 times, most recently from a9b8c07 to 18aa91a Compare November 11, 2022 00:49
@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 11, 2022

Ok, I've put a workaround in that will check and alter 8.0.x broadcast db loads by;

  • creating a DurationFloat out of the seconds float string.
  • strip the loaded list-strings of [ ], if they both exist on the respective ends. (this is narrowed down to only list settings in the config)

here's an old and new (via a couple of restarts):

2022-11-11T00:44:45Z INFO - + [*/foo] execution retry delays=[120.0, 180.0]
2022-11-11T00:44:45Z INFO - + [*/foo] submission retry delays=PT2M, PT3M

This workaround can be removed some time in the future...

@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 11, 2022

(most the missing coverage is pre-existing, and just replicated-in/copied-to the workaround)

@MetRonnie
Copy link
Member

I have opened a PR against this with a functional test for the restart: dwsutherland#10

@@ -617,40 +617,6 @@ def _create_job_log_path(workflow, itask):
exc.filename = target
raise exc

@staticmethod
def _get_job_scripts(itask, rtconfig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CYLC_TASK_CYCLE_POINT should always be available in job execution environments so should be good.

Took a look and I think we are safe to remove this, cheers.

cylc/flow/parsec/validate.py Outdated Show resolved Hide resolved
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested runtime fields on the following objects:

  • Task (task config as defined in flow.cylc, no broadcasts applied)
  • TaskProxy (task config + broadcasts)
  • Family (family config as defined in flow.cylc, no broadcasts applied)
  • FamilyProxy (family config + broadcasts)
  • Job (task config + broadcasts at the time of job submission, no backsies)

All worked as expected 🚀

I spotted two minor issues during the course of testing...

1) Duration lists get formatted as integers when cleared.

When I create a broadcast it is presented to me in native-ISO8601 format:

$ cylc broadcast broad -s 'execution retry delays = PT1M'
Broadcast set:
+ [*/root] execution retry delays=PT1M

But when I clear the broadcast it is presented to me in integer format:

$ cylc broadcast --clear broad
Broadcast cancelled:
- [*/root] execution retry delays=[60.0]

2) Updated deltas contain all fields

E.G. for this GraphQL query:

subscription {
  deltas {
    updated(stripNull: true) {
      taskProxies {
        id
        runtime {
          script
          executionRetryDelays
        }
      }
    }
  }
}

And the following broadcast:

$ cylc broadcast broad -s 'execution retry delays = PT10M'
Broadcast set:
+ [*/root] execution retry delays=PT10M

I get an updated delta like this:

{
  "data": {
    "deltas": {
      "updated": {
        "taskProxies": [
          {
            "id": "~osanders/broad//20191209T1200Z/pub",
            "runtime": {
              "script": "sleep 1",
              "executionRetryDelays": "PT10M"
            }
          },
          {
            "id": "~osanders/broad//20191209T1200Z/wipe_bar",
            "runtime": {
              "script": "sleep 1",
              "executionRetryDelays": "PT10M"
            }
          },

Which contains the execution retry delays which have changed but also the script which hasn't.

As long as this doesn't cause broadcasted runtime fields to get re-sent for things like task state changes this is probably harmless.

@dwsutherland
Copy link
Member Author

dwsutherland commented Nov 16, 2022

Duration lists get formatted as integers when cleared.

Fixed.. Internal objects/format needed stringified for response.

Updated deltas contain all fields

Yes, it's not fine grained at the moment.. It would be a tricky thing to do, as some have been added or cleared (and then there's inheritance).. I filter out those runtimes whose serialized form hasn't changed.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dwsutherland 🚀

@oliver-sanders oliver-sanders merged commit 90c3c3c into cylc:master Nov 18, 2022
@MetRonnie MetRonnie added the schema change Change to the Cylc GraphQL schema label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema change Change to the Cylc GraphQL schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

graphql: add [runtime] information to schema
4 participants