Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial auto sync support design doc #2901

Merged
merged 17 commits into from
Jan 28, 2020
Merged
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
292 changes: 292 additions & 0 deletions docs/design_proposals/sync-auto-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
# Title
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved

* Author(s): Appu Goundan (@loosebazooka)
* Design Shepherd: Tejal Desai (@tejal29), Priya Wadhwa (@priyawadhwa)
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved
* Date: 09/17/2019
* Status: New
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved

## Background

Currently skaffold does not support `sync` for files that are generated
during a build. For example when syncing java files to a container, one
would normally expect `.class` files to be sync'd, but skaffold is
really only aware of `.java` files in a build.

1. Why is this required?
- We would like to support the following features
- Skaffold can sync files that are not watched (these files are not
considered inputs for the container builder)
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved
- Skaffold can sync files that are generated by a buildscript
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved
- And one or both of the following:
- Skaffold can be configured to run a custom script to when certain files change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from the Skaffold can sync files that are generated by a buildscript feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default build script isn't necessarily the one you want to use to generate intermediate assets, for example

  • jib full container build: ./gradlew jib
  • jib build requirements: ./gradlew classes

- Skaffold can sync an externally generated tar file to a remote container
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved

2. If this is a redesign, what are the drawbacks of the current implementation?
- This is not a redesign, but a new sync mode that may or may not be available
in the skaffold.yaml directly to users. It could simply be an internal API
that is usable by builders like Jib.
- This is similar to `_smart_` described in [sync-improvements](sync-improvements.md)

3. Is there any another workaround, and if so, what are its drawbacks?
- Currently one can create a docker file that only copies specific build
results into the container and relies on a local build to generate those
intermediate artifacts. This requires a user to trigger a first build
manually before docker kicks in to containerize the application. While
it may be possible to automate this (for example: gradle --continuous), it
is not an acceptable solution to require manual external processes for
a build to succeed. Covered in [Hack it](#hack-it)

4. Mention related issues, if there are any.
- This is not trying to solve the problem of dealing with a multistage
dockerbuild. Intermediate build artifacts might still be possible to
determine, however that would require an extra mechanism to do a partial
docker build and sync files from a built container to a running container --
something we do not intend to cover here.

#### Problems with current API/config
The current `sync` system has the following problems:
1. No way to trigger local external processes - for example, in jib, to build a
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved
container, one would run `./gradlew jib`, but to update class files so the
system may sync them, one would only be required to run `./gradlew classes`.
balopat marked this conversation as resolved.
Show resolved Hide resolved
2. No way to tell the system to sync non build inputs. Skaffold is
normally only watching `.java` files for a build, in the sync case, we want
it to watch `.java` files, trigger a partial build, and sync `.class` files
so a remote server can pick up and reload the changes.

## Design

### Hack it
To get close to the functionality we want, without modifying skaffold at all, a
Dockerfile which depends on java build outputs could be used, like:
```
FROM openjdk:8
COPY build/dependencies/ /app/dependencies
COPY build/classes/java/main/ /app/classes

CMD ["java", "-cp", "/app/classes:/app/dependencies/*", "hello.Application"]
```

with a skaffold sync block that looks like:
```
sync:
manual:
- src: "build/classes/java/main/**/*.class"
dest: "/app/classes"
strip: "build/classes/java/main/"
```

A user's devloop then looks like this:

1. run `./gradlew classes copyDependencies`
1. run `skaffold dev`
1. *make changes to some java file*
1. run `./gradlew classes`
1. *skaffold syncs files*

which is far from ideal.

### A new `auto` option

Provide users with an `auto` option, users that use `auto` should expect
the builder-sync to do the right thing and will not be required to do much
configuration.

`auto` will only work with builders that have implemented the `auto` spec. We
expect at least `jib` to do implement the spec.

#### User Configuration

```yaml
build:
artifacts:
- image: ...
context: jib-project
jib: {}
sync:
auto: {}
```


#### Get necessary information from the builder

Skaffold can expose an API that can accept a complex configuration on how
balopat marked this conversation as resolved.
Show resolved Hide resolved
skaffold should be doing synchronization.

The builder will talk to sync component by providing it with the following data
1. A list of `generated` configs, each containing
1. A `command` to generate files to sync
1. A list of inputs to watch as triggers
1. A list of syncs (src, dest) to execute after generation
1. A list of `direct` sync directives (src, dest) to execute without any script
execution

So maybe some datastructures like (I don't really know a lot of go, so assume
this will be written in some consistent way eventually):

```
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved
type AutoSync struct {
generated []Generated
direct []Direct
}

type Generated struct {
command []String
inputs []File
syncables []Syncables
}

type Direct struct {
syncables []Syncables
}

type Syncables struct {
src String
dest String
}
```

#### Jib - Skaffold Sync interface

How a tool like Jib might surface the necessary information to Skaffold

I would expect to add a task like `_jibSkaffoldSyncMap` that will produce
json output for the skaffold-jib intergration to consume and forward to the sync
system. And example output might look like:

```
BEGIN JIB JSON: SYNCMAP/1
{
"generated": [
{
src: “target/classes”
balopat marked this conversation as resolved.
Show resolved Hide resolved
dest: "app/classes",
},
{
src: “target/resources"
dest: "app/resources",
}
]
"direct": [
{
src: "src/main/extra1",
dest: "/"
},
{
src: "src/main/extra2",
dest: "/"
},
{
src: ".m2/some/dep/my-dep-SNAPSHOT.jar",
dest: "app/dependencies/my-dep-SNAPSHOT.jar"
}
}
}
```

Files in the `generated` section will trigger a partial rebuild of the container
(everything before containerization) while files in the `direct` section can
just be synchronized to the running container without a rebuild of anything.

##### Sync or Rebuild?

Each builder implementing an `auto` builder should be able to decide when a sync
should be skipped and a rebuild done instead. In the jib case for instance, a
rebuild will be triggered if:
- a build file has changed (`build.gradle`, `pom.xml`, etc)
- a file is deleted

#### Open Issues/Questions

**What about files that have specific permissions?**

Jib allows users to customize file permissions, for instance a file on the
container can be configured to be 0x456 or something. One option instead of
dealing with this, is to just make all sync'd files 777? Or we allow passthrough
balopat marked this conversation as resolved.
Show resolved Hide resolved
of permissions from the build system to the sync system.

**Should we allow the user to configure the auto block?**
balopat marked this conversation as resolved.
Show resolved Hide resolved

Perhaps the user knows something that jib doesnt, and wants to ignore some files
from synchronization. They might want to do:

```
sync:
tejal29 marked this conversation as resolved.
Show resolved Hide resolved
auto:
ignored:
- "src/main/jib/myExtraFilesThatBreakADeployment"
```

TODO (comment here if you need this be elaborated more on)
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved

## Implementation plan
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved

- [`schemas/<version>.go`] Add `Auto` to the schema under `Sync`
- Before first build, initialize the `sync` state for builders in `auto` mode, this
sync state is saved in the builder's specific implementation of `auto`

- On file change in dev mode:
```
if (files were deleted)
return REBUILD

if (changes were made to build def)
return REBUILD

lastSyncState = syncStates["this project"]

if (if all files changes are in lastSyncState.direct)
return SYNC{list of direct files}

newSyncState = buildAndCalculateNewSyncState("this project")
syncStates["this project"] = newSyncState

diff = diff(lastSyncState, newSyncState)
return SYNC{files in diff}
```

## Integration test plan

*TBD*
loosebazooka marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives Explored

The following implementation had a few problems and that were potential deal
breakers, this implementation is left in here for information purposes, no
attempt was made to implement this. It has the following issues:
1. Depends on `sync` using `tar` which should be an implementation detail
2. Doesn't really provide a good base for exposing the `auto` sync configuration
block. As skaffold takes care of fewer things, the user must be responsible
for implementing it.

#### Delegate generation of sync tar to builder

balopat marked this conversation as resolved.
Show resolved Hide resolved
One option is to complete hand over the detection of container updates to the
builder which would provide skaffold with a tar to synchronize.

A potential runthrough might look like:

1. Skaffold detects changes
1. Skaffold asks build system if sync should happen
1. Build system can reply
1. Yes: and here is the `tar` to sync
1. No: this will tell skaffold it should do a rebuild

This allows the builder implementation to make the sync decisions within it's
own system.

#### Jib - Skaffold Sync interface

For a potential consumer of this mechanism like jib, we would expose a task like
`_jibCreateSyncTar` which would so something like

1. If we should rebuild -> tell skaffold to rebuild, so output something like
```
REBUILD
```
1. If we can sync
1. Compare against last build
2. Tar up files for synchronization
3. Tell skaffold where to find that tar
```
SYNC: /path/to/sync.tar
```