Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced "full_refresh" with "dev_mode" #1735

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/technical/general_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Pipeline can be explicitly created and configured via `dlt.pipeline()` that retu
4. dataset_name - name of the dataset where the data goes (see later the default names)
5. import_schema_path - default is None
6. export_schema_path - default is None
7. full_refresh - if set to True the pipeline working dir will be erased and the dataset name will get the unique suffix (current timestamp). ie the `my_data` becomes `my_data_20221107164856`.
7. dev_mode - if set to True the pipeline working dir will be erased and the dataset name will get the unique suffix (current timestamp). ie the `my_data` becomes `my_data_20221107164856`.

> **Achtung** as per `secrets_and_config.md` the arguments passed to `dlt.pipeline` are configurable and if skipped will be injected by the config providers. **the values provided explicitly in the code have a full precedence over all config providers**

Expand Down Expand Up @@ -101,7 +101,7 @@ In case **there are more schemas in the pipeline**, the data will be loaded into
1. `spotify` tables and `labels` will load into `spotify_data_1`
2. `mel` resource will load into `spotify_data_1_echonest`

The `full_refresh` option: dataset name receives a prefix with the current timestamp: ie the `my_data` becomes `my_data_20221107164856`. This allows a non destructive full refresh. Nothing is being deleted/dropped from the destination.
The `dev_mode` option: dataset name receives a prefix with the current timestamp: ie the `my_data` becomes `my_data_20221107164856`. This allows a non destructive full refresh. Nothing is being deleted/dropped from the destination.

## pipeline working directory and state
Another fundamental concept is the pipeline working directory. This directory keeps the following information:
Expand All @@ -117,7 +117,7 @@ The `restore_from_destination` argument to `dlt.pipeline` let's the user restore

The state is being stored in the destination together with other data. So only when all pipeline stages are completed the state is available for restoration.

The pipeline cannot be restored if `full_refresh` flag is set.
The pipeline cannot be restored if `dev_mode` flag is set.

The other way to trigger full refresh is to drop destination dataset. `dlt` detects that and resets the pipeline local working folder.

Expand Down Expand Up @@ -155,8 +155,8 @@ The default json normalizer will convert json documents into tables. All the key

❗ [more here](working_with_schemas.md)

### Full refresh mode
If `full_refresh` flag is passed to `dlt.pipeline` then
### Dev mode mode
If `dev_mode` flag is passed to `dlt.pipeline` then
1. the pipeline working dir is fully wiped out (state, schemas, temp files)
2. dataset name receives a prefix with the current timestamp: ie the `my_data` becomes `my_data_20221107164856`.
3. pipeline will not be restored from the destination
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,6 @@ resource. Below we show you an example on how to pseudonymize the data before it
print(info)
```

1. Remember to keep the pipeline name and destination dataset name consistent. The pipeline name is crucial for retrieving the [state](https://dlthub.com/docs/general-usage/state) from the last run, which is essential for incremental loading. Altering these names could initiate a "[full_refresh](https://dlthub.com/docs/general-usage/pipeline#do-experiments-with-full-refresh)", interfering with the metadata tracking necessary for [incremental loads](https://dlthub.com/docs/general-usage/incremental-loading).
1. Remember to keep the pipeline name and destination dataset name consistent. The pipeline name is crucial for retrieving the [state](https://dlthub.com/docs/general-usage/state) from the last run, which is essential for incremental loading. Altering these names could initiate a "[dev_mode](https://dlthub.com/docs/general-usage/pipeline#do-experiments-with-dev-mode)", interfering with the metadata tracking necessary for [incremental loads](https://dlthub.com/docs/general-usage/incremental-loading).

<!--@@@DLT_TUBA sql_database-->
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/stripe.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,6 @@ verified source.
load_info = pipeline.run(data=[source_single, source_incremental])
print(load_info)
```
> To load data, maintain the pipeline name and destination dataset name. The pipeline name is vital for accessing the last run's [state](../../general-usage/state), which determines the incremental data load's end date. Altering these names can trigger a [“full_refresh”](../../general-usage/pipeline#do-experiments-with-full-refresh), disrupting the metadata (state) tracking for [incremental data loading](../../general-usage/incremental-loading).
> To load data, maintain the pipeline name and destination dataset name. The pipeline name is vital for accessing the last run's [state](../../general-usage/state), which determines the incremental data load's end date. Altering these names can trigger a [“dev_mode”](../../general-usage/pipeline#do-experiments-with-dev-mode), disrupting the metadata (state) tracking for [incremental data loading](../../general-usage/incremental-loading).

<!--@@@DLT_TUBA stripe_analytics-->
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ To create your data pipeline using single loading and
destination dataset names. The pipeline name helps retrieve the
[state](https://dlthub.com/docs/general-usage/state) of the last run, essential for incremental
data loading. Changing these names might trigger a
[“full_refresh”](https://dlthub.com/docs/general-usage/pipeline#do-experiments-with-full-refresh),
[“dev_mode”](https://dlthub.com/docs/general-usage/pipeline#do-experiments-with-dev-mode),
disrupting metadata tracking for
[incremental data loading](https://dlthub.com/docs/general-usage/incremental-loading).

Expand Down
Loading