You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Define a pipeline that creates a volume, mounts it, and then destroys it:
import kfp
from kfp import dsl
def glob_volume(volume):
@kfp.components.create_component_from_func
def glob_files(directory: str) -> list:
import pathlib
paths = pathlib.Path(directory).glob('**/*')
filepaths = [str(p) for p in paths if p.is_file()]
return filepaths
component_function = glob_files
return component_function("/volume") \
.add_pvolumes({
"/volume": volume
})
@dsl.pipeline(name="vol-cache-bug", description="Demonstrates a caching bug in KF 1.3")
def volume_caching_bug():
vop = dsl.VolumeOp(name='create-a-volume', resource_name='a-volume', size="1Gi", modes=dsl.VOLUME_MODE_RWO)
glob_vol_op = glob_volume(vop.volume)
vop.delete().after(glob_vol_op)
if __name__ == "__main__":
kfp.compiler.Compiler().compile(volume_caching_bug, "volume_caching_bug.yaml")
Create a pipeline and an experiment in the UI from the above sample
Launch a run, let it complete
(delete mounting pods if the PVC is stuck at terminating)
Clone the run
The first step is cached and never run
The second step is never scheduled, because the volume got deleted in the previous run and was never created in this run.
Expected result
I expected that volume ops should not be cached at all. If the PVC already exists it should be reused and if it doesn't it should be created. Alternatively, that VolumeOp supported an equivalent to .execution_options.caching_strategy.max_cache_staleness = "P0D".
FWIW: Recently upgraded from KF 1.1 to KF 1.3. Did not experience this on 1.1.
Workarounds
Rename the volume creation step to prevent it from fetching from cache. Needs to be done after each run (i.e. upload a new pipeline version before issuing a new run)
Disable caching on the whole KF instance.
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
skogsbrus
changed the title
[backend] Cached volume creations leads to unschedulable runs, if the volum
[backend] Cached volume creations leads to unschedulable runs, if the pvc is deleted in the pipeline
Jun 11, 2021
this is a duplicate of #5257. Should we close this in favor of the other one.
Copying my comment here as well, for the sake of completeness of this issue:
I think the problem here is that
the mechanism is caching steps that it shouldn't do so
there is no "user" selection on whether to cache some specific step or not, only global API server configuration. The API server overrides any configuration:
Environment
Steps to reproduce
Expected result
I expected that volume ops should not be cached at all. If the PVC already exists it should be reused and if it doesn't it should be created. Alternatively, that VolumeOp supported an equivalent to
.execution_options.caching_strategy.max_cache_staleness = "P0D"
.FWIW: Recently upgraded from KF 1.1 to KF 1.3. Did not experience this on 1.1.
Workarounds
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: