API/YAML: Support using env vars to define file_mounts. #2146

concretevitamin · 2023-06-29T03:27:08Z

# Use env vars in the "name" (bucket name) and "source" (local dir/file to
# upload) fields.
#
# Both syntaxes work: ${MY_BUCKET} and $MY_BUCKET.
file_mounts:
  /mydir:
    name: ${MY_BUCKET}  # Name of the bucket.
    store: gcs
    mode: MOUNT

  /another-dir:
    name: ${MY_BUCKET}-2
    source: ["~/${MY_LOCAL_PATH}"]
    store: gcs
    mode: MOUNT

Overriding with sky launch --env works too.

Also fixes sky launch parsing errors:

Do not load YAML twice (which creates storage twice)
Properly load empty / name-only YAMLs

Fixes #2093.

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
- pytest tests/test_smoke.py::test_using_file_mounts_with_env_vars
- pytest tests/test_optimizer_dryruns.py
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Michaelvll

Thanks for adding this @concretevitamin! This should make our YAML interface more flexible. Left several questions.

sky/task.py

Michaelvll · 2023-06-29T04:22:20Z

sky/task.py

+    def _substitute_env_vars(target: Union[str, List[str]],
+                             envs: Dict[str, str]) -> Union[str, List[str]]:
+        for key, value in envs.items():
+            pattern = r'\$\{?\b' + key + r'\b\}?'


We may need to add a check for the key of the task.envs somewhere in our code, i.e. we need to make sure the key ~ '[a-zA-Z][a-zA-Z0-9]*'

This will get checked by storage.py's validate_name().

Ahh, I was trying to say that if the user provides a envvar like the following:

envs: hello world: some_value run: | echo ${hello world}

Here the run section will fail, as it is not a valid syntax in bash. However, if this env var is used in the file_mounts section, the envvar will be replaced correctly.

file_mounts: /dst: source: ${hello world}

Good call! Done.

Michaelvll · 2023-06-29T04:29:00Z

sky/task.py

+                storage_config = _fill_in_env_vars_in_storage_config(
+                    storage[1], task.envs)
+                storage_obj = storage_lib.Storage.from_yaml_config(
+                    storage_config)


Instead of overriding the storage_config, should we just replace all the env var in the file_mounts section? For example, in L285 we do:

if config.get('file_mounts') is not None: file_mounts_str = yaml.dumps(config['file_mounts']) new_file_mounts_str = _replace_env_var(file_mounts_str, config.get('envs', {})) config['file_mounts'] = yaml.loads(new_file_mounts_str)

With that, the user can do something like the following:

file_mounts: /model_path/llama-${SIZE}b: s3://llama-weights/llama-${SIZE}b

One worry with string replacement is that it'll make some weird things "work":

file_mounts: /path: mo${empty_envvar}de: xxxx

I think we should make replacement more structured.

That said, supporting env vars in src/dst seems a useful case. How about we support it when users ask?

Sounds good, though I still think the replacement for dst and src might be quite important, as that was one thing we would like to have for the Vicuna for downloading the correct llama weight from the bucket based on the envvar.

For the problem you mentioned, we can do the schema validation before the replacement happens to avoid that issue.

Good call! Updated as such.

tests/test_smoke.py

sky/cli.py

Michaelvll

Thanks for the fix @concretevitamin! The code looks good to me with a small corner case that we may need to be careful.

Michaelvll · 2023-06-29T23:21:29Z

sky/task.py

+    # TODO(zongheng): support ${ENV:-default}?
+    file_mounts_str = json.dumps(file_mounts)
+    for key, value in task_envs.items():
+        pattern = r'\$\{?\b' + key + r'\b\}?'


nit: it seems bash replaces the envvar by looking for the envvar in the string first and match it with the existing envvar, while here we do it in a reversed way. That will cause a slight inconsistency in the behavior:

HELLO_WORLD=1 echo $HELLO_WORLD_1

The block above will output an empty string, but in our implementation, we will output 1_1.

Probably, an alternative way to do this is:

# Create a replacement function def replace_var(match): var_name = match.group(1) # Now directly accessing the variable name return task_envs.get(var_name, match.group(0)) # If the variable isn't in the dictionary, return it unchanged pattern = r'\$\{?\b([a-zA-Z_][a-zA-Z0-9_]*)\b\}?' # Use re.sub with the replacement function result = re.sub(pattern, replace_var, file_mounts_str)

This is a very corner case, we can also leave it as is to see how people feel about it.

Great catch! Used this & reran the smoke test.

Michaelvll · 2023-06-29T23:22:56Z

tests/test_smoke.py

+    name = _get_cluster_name()
+    test_commands = [
+        *storage_setup_commands,
+        (f'sky launch -y -c {name} --cpus 2+ --cloud {generic_cloud} '


nice, we should have all our tests to be --cpus 2+ in the future to save cost. ; )

concretevitamin · 2023-06-30T04:17:08Z

All smoke tests: pytest tests/test_smoke.py

concretevitamin added 3 commits June 28, 2023 08:45

WIP: storage creation is eager

174f26e

Fixes.

1a3d0a4

Fixes

db85fe6

concretevitamin requested a review from Michaelvll June 29, 2023 03:27

Michaelvll reviewed Jun 29, 2023

View reviewed changes

Updates

2a47cf9

concretevitamin requested a review from Michaelvll June 29, 2023 05:27

concretevitamin added 4 commits June 29, 2023 10:54

Guard against invalid env keys.

8aaafcd

Updates

eae0c0d

cleanup

18e4177

Pylint

2e259d4

Michaelvll approved these changes Jun 29, 2023

View reviewed changes

Update

460b07b

concretevitamin merged commit 3385975 into master Jun 30, 2023

concretevitamin deleted the env-vars-in-mounts branch June 30, 2023 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API/YAML: Support using env vars to define file_mounts. #2146

API/YAML: Support using env vars to define file_mounts. #2146

concretevitamin commented Jun 29, 2023 •

edited

Loading

Michaelvll left a comment

Michaelvll Jun 29, 2023

concretevitamin Jun 29, 2023

Michaelvll Jun 29, 2023 •

edited

Loading

concretevitamin Jun 29, 2023

Michaelvll Jun 29, 2023

concretevitamin Jun 29, 2023

Michaelvll Jun 29, 2023

concretevitamin Jun 29, 2023

Michaelvll left a comment

Michaelvll Jun 29, 2023

concretevitamin Jun 29, 2023

Michaelvll Jun 29, 2023

concretevitamin commented Jun 30, 2023

API/YAML: Support using env vars to define file_mounts. #2146

API/YAML: Support using env vars to define file_mounts. #2146

Conversation

concretevitamin commented Jun 29, 2023 • edited Loading

Michaelvll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michaelvll Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michaelvll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

concretevitamin commented Jun 30, 2023

concretevitamin commented Jun 29, 2023 •

edited

Loading

Michaelvll Jun 29, 2023 •

edited

Loading