Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream large snapshots #118

Merged
merged 7 commits into from
Apr 20, 2022
Merged

Stream large snapshots #118

merged 7 commits into from
Apr 20, 2022

Conversation

jasonlfunk
Copy link

This PR is in the spirit of the abandoned PR #93 . I reviewed @pelmered's changes and the code reviews in the PR and have come up with this. I've intended this to have minimum viable functionality, as opposed to the previous PR which had error checking, progress indicators, etc. I just need large snapshots not to crash my application after dropping all of the tables.

Please review it and let me know if you'd like any changes.

@freekmurze freekmurze merged commit 69fa5c7 into spatie:main Apr 20, 2022
@freekmurze
Copy link
Member

Thank you!

@eli-s-r
Copy link
Contributor

eli-s-r commented Apr 22, 2022

? gzdecode(gzread($stream, self::STREAM_BUFFER_SIZE))

@jasonlfunk I get a data error on this line when trying to load a fresh compressed snapshot, any idea what could be causing this?

@jasonlfunk
Copy link
Author

@eli-s-r Is it possible to share the snapshot? Or can you make a shareable one that reproduces the error?

What is the exact error that you are seeing?

@pelmered
Copy link

I get the same problem as @eli-s-r. In my case I am loading giziped snapshot from an external S3 drive on DigitalOcean spaces. The compressed snapshot file is about 235 MB. The error does not say more than data error end points to the line with gzdecode. Googling doesn't give much insight.

I had some similar problems with streaming the file directly into the DB and depress it on the fly. What I had to do was to save the file file locally and stream it from there. I can see if I can come up with a solution.

@pelmered
Copy link

I managed to work out a solution, but it is extremely slow for larger files like mine. I think it is a better solution to download, and decompress if needed, and then import using a CLI command, i.e. mysql -u username -p dbname < snapshot.sql. Whoever, that makes a local mysql client needed so that would probably be configurable in the config file. For even better performance you could do everything in a CLI command.

@jasonlfunk
Copy link
Author

@pelmered I had considered doing an import like that, however, it would get challenging to support all the various database drivers.

@eli-s-r
Copy link
Contributor

eli-s-r commented Apr 27, 2022

@jasonlfunk unfortunately I can't share the snapshot as there's lots of protected data that I can't legally share, even if anonymized.

@eli-s-r
Copy link
Contributor

eli-s-r commented Apr 27, 2022

@pelmered This was actually where the error originally occurred for me -- I was trying to decompress a gzipped snapshot in the SnapshotCreated event because I have to manually modify some lines. I also tried streaming by adapting the code from loadStream() on this PR, and encountered the data error. At first I thought it was a bug in my code, but then I tried running snapshot:load on one of the (unmodified) gzipped snapshots and got the same error 🙃

@pelmered
Copy link

pelmered commented Apr 27, 2022

@pelmered I had considered doing an import like that, however, it would get challenging to support all the various database drivers.

Yes, that is true.

Here is the code that is working for me, but it probably needs some refactoring:

    protected function loadStream(string $connectionName = null)
    {

        $statementBuffer = '';

        LazyCollection::make(function () use ($path) {
            if ($this->compressionExtension === 'gz') {
                $stream = $this->disk->readStream($this->fileName);
                $path = (new TemporaryDirectory(config('db-snapshots.temporary_directory_path')))
                            ->create()
                            ->path('temp-load.tmp').'.gz';
                $fileDest = fopen($path, 'w');

                if (! file_exists($this->disk->path($this->fileName))) {
                    while (feof($stream) !== true) {
                        fwrite($fileDest, gzread($stream, self::STREAM_BUFFER_SIZE));
                    }
                }

                $handle = gzopen($path, 'r');
                while (! gzeof($handle)) {
                    yield gzgets($handle, self::STREAM_BUFFER_SIZE);
                }
            } else {
                $handle = Storage::disk('local')->readStream($path);
                while (($line = fgets($handle)) !== false) {
                    yield $line;
                }
            }
        })->each(function (string $line, $index) use($connectionName, &$statementBuffer) {
            if ($this->shouldIgnoreLine($line)) {
              return;
            }

            $statementBuffer .= $line;

            if (substr(trim($line), -1, 1) !== ';') {
                return;
            }

            DB::connection($connectionName)->unprepared($statementBuffer);

            $statementBuffer = '';
        });
    }

@pelmered
Copy link

However, I think I will roll my own import command. This package will still be great for creating and storing the snapshots.
Something like this:

        $this->info('Finding latest snapshot...');

        $disk  = Storage::disk('snapshots_remote');
        $files = $disk->files();

        $snapshotName = end($files);

        $this->info(sprintf('Found snapshot: %s', $snapshotName));

        $fileUrl  = $disk->url($snapshotName);
        $tempFile = (new TemporaryDirectory(storage_path('temp/db-snapshots/')))
                        ->create()
                        ->path('temp-load.tmp').'.gz';

        $this->info('Downloading...');
        exec('spaces-cli down '.$fileUrl.' -o '.$tempFile);
        $this->info('Snapshot downloaded.');

        $this->info('Dropping all existing tables...');
        $this->dropAllCurrentTables();
        $this->info('Done.');

        $this->info('Decompressing and importing into DB...');
        exec('pv '.$tempFile.' | gunzip | mysql -u root '.DB::connection()->getDatabaseName());
        $this->info('Import complete.');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants