docs-v2/design_proposals/sync-improvements.md
Currently, skaffold config supports artifact.sync as a way to sync files
directly to pods. So far, artifact sync requires a specification of sync
patterns like
sync:
'*.js': app/
This is error prone and unnecessarily hard to use because the destination can often be determined by the builder. For example a destination path is already contained in the Dockerfile for docker build. (see #1166, #1581). In addition, the syncing needs to handle special cases for globbing and often requires a long list of sync patterns (#1807).
Furthermore, builders should be able to inform Skaffold about additional sync paths (#1704). This will result in much faster deploy cycles, without users having to bother about these optimizations.
The current sync config syntax has the following problems:
sync:
'src/**/*.js': dest/
.js files in the dest folder.sync:
'src/b/c/***/*.py': dest/
src/b/c at dest/ and sync .py files.
The triple-star therefore triggers two actions which should not depend on each other:
***The syntax should therefore be revised.
Skaffold sync shall have three different modes:
The scope of this design proposal is the direct and inferred mode. The smart sync mode is out of scope for this document but may be mentioned at times.
The sync mechanism in skaffold should be able to infer the destination location of local files, when the user opts-in to do so.
For example: given the following Dockerfile
FROM nginx
WORKDIR /var
COPY static *.html www/
ADD nginx.conf .
ADD nginx.conf /etc/
The sync locations should be inferred as
static/page1.html -> /var/www/page1.html (! no static folder at destination !)static/assets/style.css -> /var/www/assets/style.cssindex.html -> /var/www/index.htmlnginx.conf -> /var/nginx.conf, /etc/nginx.confA fundamental change compared to the current sync mechanism is that one source path may by sync'ed to multiple destinations in the future.
Builders know where to put static files in the container. This information needs to be pulled from the builders and made available in Skaffold. Currently, we know how to do this for Docker artifacts (@r2d4 suggested this in #1166). Jib transforms some input files and includes others as static resources. The specific sync maps need to be detailed out but are straightforward. Whether the bazel builder supports this, is unclear at the moment.
The builder implementations need to fulfill a new requirement:
They need to provide a default sync map for changed or deleted files.
For that, the Builder interface (pkg/skaffold/build/build.go) will receive the new method SyncMap.
type Builder interface {
Labels() map[string]string
Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]Artifact, error)
DependenciesForArtifact(ctx context.Context, artifact *latest.Artifact) ([]string, error)
// ADDED
SyncMap(ctx context.Context, artifact *latest.Artifact) (map[string][]string, error)
}
SyncMap returns a map whose keys are paths of source files for the container.
These host paths will be relative to the workingdir of the respective artifact.
The map values list all absolute remote destination paths where the input files are placed in the container.
This must be a list, because a single input file may be copied to several locations in the container.
Note that this does not mean that every file will be sync'ed.
The intent to sync a file needs to be explicitly specified by user-provided sync rules (see config changes).
In the future, some builders may specify default sync rules, but this is out of scope for this document.
For example, given the above Dockerfile, SyncMap will return something like
map[string][]string{
"static/page1.html": []string{"/var/www/page1.html"},
"static/assets/style.css": []string{"/var/www/assets/style.css"},
"index.html": []string{"/var/www/index.html"},
"nginx.conf": []string{"/var/nginx.conf", "/etc/nginx.conf"},
}
Unsupported builders will return a not-supported error.
The artifact.sync config needs to support additional requirements:
smart sync mode.The artifact.sync config will be upgraded as follows:
sync:
# Existing use-case: Copy all js under 'src' into 'dest', re-creating the directory structure below 'src'
# This corresponds to the current `'src/***/*.js': app/`
manual: # oneOf=sync
- src: 'src/**/*.js' # required
dest: app/ # required
strip: 'src/' # optional, default to ""
# New use-case: Copy all python files and infer their destination.
infer: # oneOf=sync
- '**/*.py'
# New use-case: smart sync mode for jib (out of scope)
smart: {} # oneOf=sync
In the sync config, exactly one of the modes manual or infer may be specified, but not both.
Both manual and infer contain a list of sync rules.
When determining the destination for a given file, all sync rules will be considered and not just the first match.
Thus, a single file may be copied to multiple destinations.
Error cases:
strip with a prefix which which does not match the static prefix of the from field.
- src: 'src/**/*.js'
dest: dest/
strip: src/sub # ERR
- src: 'src/**/*.js'
dest: dest/
strip: app/ # ERR
Open questions about the configuration:
<Is the syntax clear enough?> Is the syntax clear? Is the configuration intuitive to use?
Resolution: YES
<Does it improve clarity to have the inferTo field?>
A different approach could be to fall back to inference, if the to field is left blank.
This would have the advantage that there needs to be no validation about having to and inferTo in the same rule, which should cause an error.
Resolution: Obsolete, the improved version excludes this error case.
<Error for strip and flatten>
Given the new sync config, users may try to combine various flags in unintended ways.
For example, should strip and flatten in the same rule always be an error?
My intuition would say, that strip+flatten is the same as flatten, but this should to be discussed.
Resolution: Obsolete.
<Do we need an excludes rule?>
The current config only suggests additive rules. For some applications it might be easier to also remove matched host paths from the syncable set. For example:
sync:
# include all files below src...
- src: 'src/**/*'
dest: app/
# ...but do not consider js files for sync
- src: 'src/**/*.js'
exclude: true
Also see #1766.
Resolution: Descoped, open separate issue
An automatic schema upgrade shall be implemented. The former default behavior to flatten directories at the destination will no longer be supported. This is possible, because sync is still an alpha feature. Users who are using incompatible sync patterns will receive a warning during upgrade.
<Overlapping copies> Overlapping copies can be caused by at least two ways:
COPY/ADD instructions in the Dockerfile may copy to the same destination. For example:
COPY foo /foo
COPY bar/ / # bar contains a file called "foo"
sync:
- src: foo
dest: /baz
- src: bar
dest: /baz
Should such cases be treated as errors, warnings, or not be detected at all?
Resolution: No detection which is the current behavior.
<Transitive copies> Multi-stage Dockerfiles can lead to quite contrived scenarios. For example:
FROM scratch
ADD foo baz
FROM scratch
COPY --from=0 baz bar
Ideally, the sync destination should be inferred as foo -> bar.
This is quite an edge case and can easily be circumvented. However, it is quite complex to implement. Should we bother about this special case?
Resolution: not now
<Other builders> We don't know how to obtain a mapping of local paths to remote paths for other builders (jib, bazel). This may even have to be implemented upstream. Until we can support those builders, we to handle the case when a user tries to use inference with those builders.
Resolution: Option 1, ideally during schema validation.
<Upgrade hint> The path inference allows to simplify the sync specification considerably. Besides, subdirectory syncing makes the sync functionality even feasible for large projects that need to sync many folders into the container. Therefore, we could consider to advertise this new functionality when running Skaffold or when an old-style sync rule as found.
Resolution: Agree. We should support auto-fixing.
<SyncMap signature>
The new SyncMap interface provides some sync rules from the builder.
However, the suggested signature is func() map[string]string with the convention to not flatten directories.
To be more in line with the other sync functionality, it could also return []SyncRule.
Although being clearer, this has the downside of introducing a dependency from the builder package to the pipeline config package.
If going that route, we should probably duplicate this struct in a proper place.
However, is it necessary at all, or is the current signature good enough?
Resolution: This is an implementation detail.
artifact.sync to the new sync rule structure and test schema validation.
Should already support multiple destination paths. (#1847)ADD or COPY are not followed by a RUN command.implementation step 3 Change one of the existing sync examples so that it uses inference (e.g. #1826). The test should change a local source file. Expect that the change will be reflected in the container.
implementation step 3 Set up automatic destination syncing and delete a local input file. Expect that the input file is also deleted in the container. Update: This expectation cannot be met, because a deleted file is no longer contained in the inferred syncmap. Thus file deletion with inferred sync mode must trigger a rebuild.
implementation step 4 Add a test case that that features builder plugin sync patterns.