Skip to content

Commit

Permalink
Add design proposal for sync improvements, continued
Browse files Browse the repository at this point in the history
Signed-off-by: Cornelius Weig <[email protected]>
  • Loading branch information
corneliusweig committed Mar 20, 2019
1 parent 268c36e commit b6cb503
Showing 1 changed file with 155 additions and 36 deletions.
191 changes: 155 additions & 36 deletions docs/design_proposals/sync-improvements.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Title
# Sync improvements

* Author(s): @corneliusweig
* Design Shepherd: As mentioned in the document here.
* Author(s): Cornelius Weig (@corneliusweig)
* Design Shepherd: Techal Desai (@tejal29)
* Date: 03/20/2019
* Status: Draft

## Background

Expand All @@ -25,11 +26,32 @@ This will result in much faster deploy cycles, without users having to bother ab
## Design
### Goals
The sync mechanism in skaffold should be able to infer the destination location of local files.
For example: given the following Dockerfile
```dockerfile
FROM nginx
WORKDIR /var
COPY static *.html www/
ADD nginx.conf .
```

The sync locations should be inferred as
- `nginx.conf` -> `/var/nginx.conf`
- `static/page1.html` -> `/var/www/page1.html` (! no static folder at destination !)
- `static/assets/style.css` -> `/var/www/assets/style.css`
- `index.html` -> `/var/www/index.html`

A fundamental change compared to the current sync mechanism is that one source path may by sync'ed to multiple destinations in the future.

### Assumptions

Builders know where to put static files in the container.
This information needs to be pulled from the builders and made available in Skaffold.
Currently, we know how to do this for Docker artifacts (@r2d4 suggested this in #1166).
Whether jib and bazel builders can provide that information too, is unclear at the moment.

### Interface changes

Expand Down Expand Up @@ -58,89 +80,186 @@ There are at least two relevant changes

DependenciesForArtifact(ctx context.Context, artifact *latest.Artifact) (map[string][]string, error)

// NEW
// ADDED
SyncMap() map[string]string
}
```

### Config changes

The `artifact.sync` needs to support additional requirements:
The `artifact.sync` config needs to support additional requirements:

- Support a backwards compatible configuration
- Make clear if destination inference will be applied or not
- Support special existing use-cases.
- Support existing use-cases for backwards compatibility, i.e.
- Glob patterns on the host path
- Transplant a folder structure, or flatten the folder structure
- Offer a way to use destination inference.
- Avoid ambiguity

#### Problems with existing syntax
- The flattening of the file structure at the destination is surprising. For example
- The flattening of the file structure at the destination is surprising. For example:
```yaml
sync:
'src/**/*.js': dest/
```
will put all `.js` files in the `dest` folder.
- The triple-star syntax is is quite implicit and a surprising extension of the standard globbing syntax
- The triple-star syntax is is quite implicit and a surprising extension of the standard globbing syntax. For example:
```yaml
sync:
'src/b/c/***/*.py': dest/
```
will recreate subfolders _below_ `src/b/c` at `dest/`.
will recreate subfolders _below_ `src/b/c` at `dest/` and sync `.py` files.
The triple-star therefore triggers two actions which should not depend on each other:
- do not flatten the directory structure
- strip everything on the source path up to `***`

These issues can be made less surprising by a more explicit syntax.

#### Suggestion
These above problems can be made less surprising by a more explicit configuration syntax.
The `artifact.sync` will be upgraded to a list of *sync rules*.

```yaml
sync:
# Existing use-case: Copy all js files under 'src' flat into 'app'
# This corresponds to the current `'src/**/*.js': app/`
- from: 'src/**/*.js'
to: app/
flatten: true
flatten: true # default to false

# Existing use-case: Copy all js under 'src' into 'dest', re-creating the directory structure below 'src'
# This corresponds to the current `'src/***/*.js': dest/`
- from: 'src/**/*.js'
to: dest/
strip: 'src'
strip: 'src' # default to ""

# New use-case: Copy all python files and infer their destination.
- from: '**/*.py'
inferTo: true # it should be an error

# ERROR
- from: '**/*.py'
to: dest/
inferTo: true # inferTo and to may not be present in the same rule
inferTo: true # <- tbd, default to false
```
It needs to be decided, if `inferTo` could even be implied by a missing `to` field.
When determining the destination for a given file, all sync rules will be considered.
Thus, a single file may be copied to multiple destinations.
Error cases:
- It should be an error to specify `inferTo=true` together with `to` in the same sync rule.
```yaml
- from: '*.js'
to: dest/
inferTo: true
```
- It should be an error to specify either of `flatten` or `strip` together with `inferTo`.
```yaml
- from: '*.js'
inferTo: true
flatten: true # ERR
- from: 'src/**/*.js'
inferTo: true
strip: src/ # ERR
```
- It should be an error to specify `strip` with a prefix which which does not match the static prefix of the `from` field.
```yaml
- from: 'src/**/*.js'
to: dest/
strip: src/sub # ERR
- from: 'src/**/*.js'
to: dest/
strip: app/ # ERR
```

Open questions about the configuration:

**\<Is the syntax clear enough?\>**
Is the syntax clear?
Is the configuration intuitive to use?
Is there any ambiguity which needs attention?

Resolution: __Not Yet Resolved__

**\<Does it improve clarity to have the `inferTo` field?\>**
A different approach could be to fall back to inference, if the `to` field is left blank.
This would have the advantage that there needs to be no validation about having `to` and `inferTo` in the same rule, which should cause an error.

---
**WIP BELOW**
---
Resolution: __Not Yet Resolved__

**\<Error for `strip` and `flatten`\>**
Given the new sync config, users may try to combine various flags in unintended ways.
For example, should `strip` and `flatten` in the same rule always be an error?
My intuition would say, that `strip+flatten` is the same as `flatten`, but this should to be discussed.

Resolution: __Not Yet Resolved__


#### Migration plan
- make users aware of the new feature

The new configuration supports all existing use-cases and is fully backwards compatible.
An automatic schema upgrade shall be implemented.

### Open Issues/Question

Please list any open questions here in the format.
**\<Overlapping copies\>**
Overlapping copies can be caused by at least two ways:

**\<Question\>** Overlapping copies
1. The `COPY/ADD` instructions in the Dockerfile may copy to the same destination. For example:
```dockerfile
COPY foo /foo
COPY bar/ / # bar contains a file called "foo"
```
2. Two sync rules may copy files to the same destination. For example:
```yaml
sync:
- from: foo
to: /baz
- from: bar
to: /baz
```

Should such cases be treated as errors, warnings, or not be detected at all?

Resolution: Please list the resolution if resolved during the design process or
specify __Not Yet Resolved__
Resolution: __Not Yet Resolved__

**\<Question\>** triple-star syntax
**\<Transitive copies\>**
Multi-stage Dockerfiles can lead to quite contrived scenarios. For example:
```dockerfile
FROM scratch
ADD foo baz
FROM scratch
COPY --from=0 baz bar
```
Ideally, the sync destination should be inferred as `foo -> bar`.

This is quite an edge case and can easily be circumvented.
However, it is quite complex to implement.
Should we bother about this special case?

Resolution: __Not Yet Resolved__

**\<Other builders\>**
We don't know how to obtain a mapping of local paths to remote paths for other builders (jib, bazel).
This may even have to be implemented upstream.
Until we can support those builders, we to handle the case when a user tries to use inference with those builders.

- Option 1: bail out with an error
- Option 2: copy the file into the container with the destination set to the relative path wrt the context directory.

Resolution: __Not Yet Resolved__

**\<Upgrade hint\>**
The path inference allows to simplify the sync specification considerably.
Besides, subdirectory syncing makes the sync functionality even feasible for large projects that need to sync many folders into the container.
Therefore, we could consider to advertise this new functionality when running Skaffold or when an old-style sync map as found.

Resolution: __Not Yet Resolved__

## Implementation plan

1. Add new config key `infer` to `artifact.sync` and test schema validation.
2. Add inference logic for docker and examples.
3. Support both `infer` and user defined map with precedence rules implemented.
1. Upgrade `artifact.sync` to the new sync rule structure and test schema validation.
Should already support multiple destination paths.
2. Add inference logic for docker and examples. (#1812)
3. Support support sync rules with inference. (former #1812, will become separate PR)
4. Finally, support builder plugins to add sync patterns.
5. (future) Add inference logic for jib and examples.
6. (future) Add inference logic for bazel and examples.

## Integration test plan

Please describe what new test cases are you going to consider.
- **step 3** Change one of the existing sync examples so that it uses inference (e.g. #1826).
The test should change a local source file and wait for the change to be reflected in the container.

- **step 4** Add a test case that that features builder plugin sync patterns.

0 comments on commit b6cb503

Please sign in to comment.