Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial auto sync support design doc #2901

Merged
merged 17 commits into from
Jan 28, 2020
Merged

Initial auto sync support design doc #2901

merged 17 commits into from
Jan 28, 2020

Conversation

loosebazooka
Copy link
Member

@loosebazooka loosebazooka commented Sep 17, 2019

Design doc for how we might try to solve the auto synchronization of files that require a build first.

Useful if want to support java servers that can reload classes and resources -- and perhaps many other user cases!

Staged Doc: https://github.com/GoogleContainerTools/skaffold/blob/sync-auto-proposal/docs/design_proposals/sync-auto-support.md

@codecov
Copy link

codecov bot commented Sep 17, 2019

Codecov Report

Merging #2901 into master will increase coverage by 1.79%.
The diff coverage is n/a.

Impacted Files Coverage Δ
pkg/skaffold/build/cluster/kaniko.go 0% <0%> (-55.13%) ⬇️
pkg/skaffold/runner/build_deploy.go 58.1% <0%> (-5.13%) ⬇️
pkg/skaffold/deploy/kustomize.go 71.62% <0%> (-5%) ⬇️
pkg/skaffold/util/tar.go 52.87% <0%> (-4.6%) ⬇️
pkg/skaffold/sync/sync.go 71.92% <0%> (-3.08%) ⬇️
pkg/skaffold/build/buildpacks/lifecycle.go 81.08% <0%> (-2.92%) ⬇️
pkg/skaffold/generate_pipeline/profile.go 32.87% <0%> (-2.65%) ⬇️
...affold/kubernetes/portforward/kubectl_forwarder.go 65.85% <0%> (-2.05%) ⬇️
pkg/skaffold/deploy/kubectl/visitor.go 93.54% <0%> (-0.21%) ⬇️
pkg/skaffold/docker/image.go 76.36% <0%> (ø) ⬆️
... and 88 more

@loosebazooka
Copy link
Member Author

@corneliusweig FYI since I think you worked on the sync stuff recently?

@corneliusweig
Copy link
Contributor

@loosebazooka hey thanks for looping me in. I'll take a look at it!

docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
What we might want, is an API that can accept configuration that looks like:

```
sync: auto {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop auto and just add another mode generated along with existing manual and infer?

sync: 
  generated:
  - command: "./gradlew jib classes"
    inputs:
      - src/main/java
      - src/main/resources
    files:
      - src: "target/classes"
        dest: "/app/classes"
      - src: "target/resources"
        dest: "/app/resources"
  - command: "compile_c_stuff.sh"
    inputs:
    - asdf.c
    files:
     - src: "asdf.o"
       dest: "/libs/asdf.o"
  manual:
    - src: "local/computer/extrafiles"
      dest: "/extrafiles"
  includeTools: true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this variant for the yaml syntax. As I understand, the existing sync specs would keep their semantics, but the generated is a new build hook to be triggered before sync. (Maybe buildHooks is also a more natural name than generated). In other words:

  • if generated is present, execute the build hooks when changes matching inputs are detected
  • if generated is absent -> current behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a file is present in both generated and manual we would either have to give one preference or throw an error. Personally I'm leaning towards throwing an error, so that user's don't end up with unexpected sync stuff happening.

should just be considered the full list of inputs that are normally
watched for this build minus anything in `direct` (below).
3. `files` - the built files to sync (and where to sync them)
3. `direct` - same as manual sync, just copy them directly, don't wait on some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this replace manual? or another definition in auto ?

Currently, we don't support manual and infer sync rules together.
https://skaffold.dev/docs/how-tos/filesync/#limitations
As per this proposal, do we plan to support auto as well manual sync modes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that sync does not currently support multiple sync configurations, this is designed is that this is all one large block that does many things. A builder can have some files that need to be compiled, and others that don't -- so if I was configuring this from jib, I could separate them out.

However this design may be flawed in that it redoes the work that other sync options already do.

3. `files` - the built files to sync (and where to sync them)
3. `direct` - same as manual sync, just copy them directly, don't wait on some
command to transform them
4. `includeTools` - useful for containers that don't contain, for example, `tar`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this for cases where a generated file is bundled up in the container?
So the sync will now,

  • unzip the tar
  • sync file
  • zip again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do specify which tools to include? Is there a separate DSL for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think includeTools maybe shouldn't be part of this design. While it's useful specifically for the default base image jib chooses (distroless/java), it could benefit be it's own design that answers these questions.

But to answer them anyway, since I'm here, I think for now it would include tar -- since that's what we need to do a sync. If the requirements of sync change, it could include whatever else sync needs. (that's why naming would be includeTools instead of includeTar)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an option to add tar into the target container is a great idea. I also think that this is an idea worth being explored on its own. For example, it could solve #1814. Consequently, I would see this option outside of the auto block and right under sync, so that other sync modes can also benefit from it.

Copy link
Contributor

@tejal29 tejal29 Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. i am still confused on it makes sense for me as a user writing this config.

sync:
   includeTar: true
   generated:
   ....

If this is true, skaffold would now install tar inside all containers built ?
If this is false, skaffold sync will fail potentially #1814 ?

If above is true; then
As a user, if i have a jib builder, i need to set this true.

I feel the users are might not be able to identify this. Can we instead, add an interface to builder

type Builder interface {
   SyncTools() []tools
}

And then skaffold in dev mode if sync enabled, will install these tools in the built container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make tar available to that specific artifacts container. Not all jib users have to do this, but the distroless collection of images do not have tar (or any tools really) included by default.

I would think, instead of asking the user to change their base image to enable sync, we just make it easier for them by making the tools available automatically. I assume it would a mechanism similar to how debugging tools are available to a pod right now using init containers (@briandealwis ?).

Again, perhaps I should take this out of this design doc and move it to a new one?

Copy link
Member

@chanseokoh chanseokoh Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for simplicity, this doc could assume tar exists in the container or declares that this works only when tar exists. Installing other builds tools could be another doc, independent of this?

The Skaffold doc already states this as a limitation to use the sync mode.

Copy link
Contributor

@tejal29 tejal29 Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah understood, so maybe skaffold can be a little smart about if tar is included in the built image.
Untill then, asking users to specify via config is easy way.
Skaffold can also suggest adding it to their config by parsing the sync error and determine if its tar related.

really only aware of `.java` files in a build.

1. Why is this required?
- Having a sync mode that has knowledge of files that need to be built before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, may be we should add it to our design doc proposal,
Can we list some use cases we want to support. On the top of my mind

  1. Users can sync files generated files
  2. Users can sync generated files which are bundled inside a tar and inside a container.
  3. Users can sync files outside of their current workspace? ( for now users can hook up a bash script which can take input ../../)
  4. Users can sync remote files?

With that in mind, what is the contract between Skaffold and command to generate these files.
Would skaffold run the command when it sees a change in the command.inputs files? In which case, inputs should be non-empty.

dest: "/app/classes",
- src: "target/resources.
dest: "/app/resources",
- command: "compile_c_stuff.sh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about dependency loops here? What if the build hook is triggered by a change in file A, the build hook runs and updates A again. It's a case which can hardly be detected upfront as an error, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not thought about this, I mostly work in the jib context where I don't expect that to happen. But detection of that should throw an error.

Copy link
Contributor

@corneliusweig corneliusweig Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the question is: how can we detect it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a hard problem :\

files:
- src: "asdf.o"
dest: "/libs/asdf.o"
direct:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I haven't thought enough about the edge cases. But is it really necessary to separate out the direct sync cases? What would be wrong about a sync rule that applies to both generated artifacts and direct source files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the process of doing a build when it shouldn't could be costly?
And if the build system isn't smart, it could trigger sync'ing of files that didn't need to be (and this could cause a server reload when there didn't need to be?)

Copy link
Contributor

@corneliusweig corneliusweig Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that the build hook is only triggered, if one of its input files changed (and Skaffold takes care to detect this). The syncing could be considered as a different step, where everything that changed is sync'ed (be it generated or direct). And if the build hook didn't run, none of its outputs would have changed and only direct dependencies are updated. Or is that not how you planned it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I planned for it all to exist in the auto block in this example as a single sync rule. Naming is a little weird at the moment.

```
sync: auto {
indirect:
- command: "./gradlew jib classes"
Copy link
Contributor

@corneliusweig corneliusweig Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a usability perspective, it is still quite some intellectual overhead to specify the build commands in the right way and define the proper sync rules. My gut feeling is that this procedure is a lot different from what most developers are used to, especially for java (if you do SpringBoot, you never care about .class files). Do you plan a further simplification so that the sync works with smaller configuration overhead for SpringBoot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this config is mostly a manifestation of what the API would be for a builder like jib to define all this - not necessarily a user. So jib would just tell the sync system what the config is, and it would go about doing what needs to be done.

I'm not opposed to it, but I agree, it's fairly involved for a user to configure this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused. Will the user be able to configure all these in skaffold.yaml, or is it that this doc just explains a proposed internal sync API in the form? In either case, what if the user configures other current sync options in their skaffold.yaml? Will they conflict?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In much the same vein, I'd recommend to list a few other use cases for the proposed yaml changes beside better Jib support. It sounds as if your main goal is to achieve better Jib integration, but that could be done with less user-facing changes. I'm a bit worried that the current proposal is very powerful, but also rather complex (both for users and maintenance).
If the sole intended use for this is Jib, there could be a simpler option without so many user-facing changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I just needed a way to describe the API in a way that could be read. There may not be any need at all to expose this beyond an internal API. It also helps explain how this might be extended beyond just jib usage (though that might be the only usecase for now).

@loosebazooka
Copy link
Member Author

It looks like I should rewrite this to reflect that this isn't really exposed to the user.

@loosebazooka
Copy link
Member Author

Updated this to reflect what is internal and external.

1. run `./gradlew classes copyDependencies`
2. run `skaffold dev`
3. *make changes to some java file*
4. run `./gradlew classes`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a similar question was asked, but what happens now when you change some Java source files is that it runs the Jib builder and rebuilds everything including .class files. So, modifying a source file triggers a full rebuild which in turn will trigger the proposed sync. Somehow, we should be able to determine if we should forgo a full build and go for the sync route only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the assumption is that, at first, we can sync everything. Anything that jib defines as a file in the container will be sync'd when it changes.

docs/design_proposals/sync-auto-support.md Show resolved Hide resolved
Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
auto: {}
```

#### Option 1: Delegate generation of sync tar to builder
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corneliusweig @tejal29 added in this section about the builder being responsible for the syncable tar...

I can see the problem that it kind of ties down the sync implementation to tar.

Copy link
Contributor

@balopat balopat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I think we should merge it. Added my comments as I was thinking it through.

docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
considered inputs for the container builder)
- Skaffold can sync files that are generated by a buildscript
- And one or both of the following:
- Skaffold can be configured to run a custom script to when certain files change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from the Skaffold can sync files that are generated by a buildscript feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default build script isn't necessarily the one you want to use to generate intermediate assets, for example

  • jib full container build: ./gradlew jib
  • jib build requirements: ./gradlew classes

docs/design_proposals/sync-auto-support.md Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Outdated Show resolved Hide resolved
docs/design_proposals/sync-auto-support.md Show resolved Hide resolved
@balopat balopat merged commit 022eb8e into master Jan 28, 2020
@balopat
Copy link
Contributor

balopat commented Jan 28, 2020

Merged with admin rights - CI build is nonsensical here.

@briandealwis briandealwis deleted the sync-auto-proposal branch April 14, 2020 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants