Skip to content

Commit

Permalink
Fixes for Data Porter feature
Browse files Browse the repository at this point in the history
  • Loading branch information
meta-paul committed May 1, 2024
1 parent 19248dd commit fe83237
Show file tree
Hide file tree
Showing 32 changed files with 1,544 additions and 737 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,7 @@ and perhaps `default_strategy_name` value (see `DefaultMergeConflictResolver` as
NOTE: All available providers must be present in `strategies_config`.
Table names (under each provider key) are optional, and if missing, `default_strategy_name`
will be used for all conflicts related to this table.

4. There is an example of a working custom conflict resolver in module `mephisto/tools/db_data_porter/conflict_resolvers/example_merge_conflict_resolver.py`. You can launch it like this:

`mephisto db import ... --conflict-resolver ExampleMergeConflictResolver`
38 changes: 19 additions & 19 deletions docs/web/docs/guides/how_to_use/merge_dbs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This is a reference describing set of commands under the `mephisto db` command g
## Export

This command exports data from Mephisto DB and provider-specific datastores
as a combination of (i) a JSON file, and (ii) an archived `data` catalog with related files.
as an archived combination of (i) a JSON file, and (ii) a `data` catalog with related files.

If no parameter passed, full data dump (i.e. backup) will be created.

Expand All @@ -29,25 +29,25 @@ mephisto db export --export-tasks-by-ids 1 --export-tasks-by-ids 2
mephisto db export --export-task-runs-by-ids 3 --export-task-runs-by-ids 4
mephisto db export --export-task-runs-since-date 2024-01-01
mephisto db export --export-task-runs-since-date 2023-01-01T00:00:00
mephisto db export --export-labels first_dump --export-labels second_dump
mephisto db export --export-tasks-by-ids 1 --delete-exported-data --randomize-legacy-ids --export-indent 2
mephisto db export --labels first_dump --labels second_dump
mephisto db export --export-tasks-by-ids 1 --delete-exported-data --randomize-legacy-ids --export-indent 4
```

Options (all optional):

- `-tn/--export-tasks-by-names` - names of Tasks that will be exported
- `-ti/--export-tasks-by-ids` - ids of Tasks that will be exported
- `-tr/--export-task-runs-by-ids` - ids of TaskRuns that will be exported
- `-tri/--export-task-runs-by-ids` - ids of TaskRuns that will be exported
- `-trs/--export-task-runs-since-date` - only objects created after this ISO8601 datetime will be exported
- `-tl/--export-labels` - only data imported under these labels will be exported
- `-de/--delete-exported-data` - after exporting data, delete it from local DB
- `-l/--labels` - only data imported under these labels will be exported
- `-del/--delete-exported-data` - after exporting data, delete it from local DB
- `-r/--randomize-legacy-ids` - replace legacy autoincremented ids with
new pseudo-random ids to avoid conflicts during data merging
- `-i/--export-indent` - make dump easy to read via formatting JSON with indentations
- `-i/--export-indent` - make dump easy to read via formatting JSON with indentations (Default 2)
- `-v/--verbosity` - write more informative messages about progress (Default 0. Values: 0, 1)

Note that the following options cannot be used together:
`--export-tasks-by-names`, `--export-tasks-by-ids`, `--export-task-runs-by-ids`, `--export-task-runs-since-date`, `--export-labels`.
`--export-tasks-by-names`, `--export-tasks-by-ids`, `--export-task-runs-by-ids`, `--export-task-runs-since-date`, `--labels`.


## Import
Expand All @@ -56,21 +56,21 @@ This command imports data from a dump file created by `mephisto db export` comma

Examples:
```
mephisto db import --dump-file <dump_file_name_or_path>
mephisto db import --file <dump_file_name_or_path>
mephisto db import --dump-file 2024_01_01_00_00_01_mephisto_dump.json --verbosity
mephisto db import --dump-file 2024_01_01_00_00_01_mephisto_dump.json --label-name my_first_dump
mephisto db import --dump-file 2024_01_01_00_00_01_mephisto_dump.json --conflict-resolver MyCustomMergeConflictResolver
mephisto db import --dump-file 2024_01_01_00_00_01_mephisto_dump.json --keep-import-metadata
mephisto db import --file 2024_01_01_00_00_01_mephisto_dump.json --verbosity
mephisto db import --file 2024_01_01_00_00_01_mephisto_dump.json --labels my_first_dump
mephisto db import --file 2024_01_01_00_00_01_mephisto_dump.json --conflict-resolver MyCustomMergeConflictResolver
mephisto db import --file 2024_01_01_00_00_01_mephisto_dump.json --keep-import-metadata
```

Options:
- `-d/--dump-file` - location of the __***.json__ dump file (filename if created in
- `-f/--file` - location of the `***.zip` dump file (filename if created in
`<MEPHISTO_REPO>/outputs/export` folder, or absolute filepath)
- `-cr/--conflict-resolver` (Optional) - name of Python class to be used for resolving merging conflicts
(when your local DB already has a row with same unique field value as a DB row in the dump data)
- `-l/--label-name` - a short string serving as a reference for the ported data (stored in `imported_data` table),
so later you can export the imported data with `--export-labels` export option
- `-l/--labels` - one or more short strings serving as a reference for the ported data (stored in `imported_data` table),
so later you can export the imported data with `--labels` export option
- `-k/--keep-import-metadata` - write data from `imported_data` table of the dump (by default it's not imported)
- `-v/--verbosity` - level of logging (default: 0; values: 0, 1)

Expand All @@ -95,13 +95,13 @@ Note that it will erase all current data, and you may want to run command `mephi

Examples:
```
mephisto db restore --backup-file <backup_file_name_or_path>
mephisto db restore --file <backup_file_name_or_path>
mephisto db restore --backup-file 2024_01_01_00_10_01.zip
mephisto db restore --file 2024_01_01_00_10_01.zip
```

Options:
- `-b/--backup-file` - location of the __*.zip__ backup file (filename if created in
- `-f/--file` - location of the `***.zip` backup file (filename if created in
`<MEPHISTO_REPO>/outputs/backup` folder, or absolute filepath)
- `-v/--verbosity` - level of logging (default: 0; values: 0, 1)

Expand Down
23 changes: 11 additions & 12 deletions docs/web/docs/guides/how_to_use/merge_dbs/simple_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ mephisto db backup
And you will see text like this

```
Started making backup
Finished successfully! File: '/<MEPHISTO_PATH>/outputs/backup/2024_01_01_00_00_01_mephisto_backup.zip
Started creating backup file ...
Finished successfully! File: /<MEPHISTO_PATH>/outputs/backup/2024_01_01_00_00_01_mephisto_backup.zip
```

Find and copy this file.
Expand Down Expand Up @@ -79,31 +79,30 @@ mephisto db export --randomize-legacy-ids
And you will see text like this

```
Started exporting
Run command for all TaskRuns.
Started exporting data ...
No filter for TaskRun specified - exporting all TaskRuns.
Finished successfully!
Files created:
- Database dump - /<MEPHISTO_PATH>/outputs/export/2024_01_01_00_00_01_mephisto_dump.json
- Data files dump - /<MEPHISTO_PATH>/outputs/export/2024_01_01_00_00_01_mephisto_dump.zip
- Dump archive - /<MEPHISTO_PATH>/outputs/export/2024_01_01_00_00_01_mephisto_dump.zip
```

### Import just created dump into main project

Put your dump into export directory `/mephisto/outputs/export/` and you can use just a dump name in the command,
Put your dump into export directory `/<MEPHISTO_PATH>/outputs/export/` and you can use just a dump name in the command,
or use a full path to the file.
Let's just imagine, you put file in export directory:

```shell
mephisto db import --dump-file 2024_01_01_00_00_01_mephisto_dump.json
mephisto db import --file 2024_01_01_00_00_01_mephisto_dump.zip
```

And you will see text like this

```
Started importing from dump '2024_01_01_00_00_01_mephisto_dump.json'
Are you sure? It will affect your databases and related files. Type 'yes' and press Enter if you want to proceed: yes
Just in case, we are making a backup of all your local data. If something went wrong during import, we will restore all your data from this backup
Backup was created successfully! File: '/mephisto/outputs/backup/2024_01_01_00_10_01_mephisto_backup.zip'
Backup was created successfully! File: '/<MEPHISTO_PATH>/outputs/backup/2024_04_25_17_11_56_mephisto_backup.zip'
Started importing from dump file /<MEPHISTO_PATH>/outputs/export/2024_04_25_17_11_43_mephisto_dump.zip ...
Finished successfully
```

Expand All @@ -117,14 +116,14 @@ Also, we create a backup automatically just in case too, just before all changes
No worries, just restore everything from your or our backup:

```shell
mephisto db restore --backup-file 2024_01_01_00_10_01.zip
mephisto db restore --file 2024_01_01_00_10_01_mephisto_backup.zip
```

And you will see text like this

```
Started restoring from backup '2024_01_01_00_10_01.zip'
Are you sure? It will affect your databases and related files. Type 'yes' and press Enter if you want to proceed: yes
Started restoring from backup /<MEPHISTO_PATH>/outputs/backup/2024_01_01_00_10_01_mephisto_backup.zip ...
Finished successfully
```

Expand Down
Loading

0 comments on commit fe83237

Please sign in to comment.