Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Electric crashes during loading of large amount of data into Postgres, and restarts with an empty shape. #2073

Open
samwillis opened this issue Dec 2, 2024 · 7 comments
Assignees
Labels

Comments

@samwillis
Copy link
Contributor

Working on the PGlite Linearlite demo. I have a script that resets the Docker images (Postgres + Electric) and then loads a number of issue and comments (I've been aiming for 100k issues and 500k comments). Sometimes Electric crashes (with no error in the log) while that load-data script is running, I think it's when I have the app open and so its trying to connect and sync. When this happens, and I restart Electric the shape is empty... as in the shape exists but it doesn't have any of the issues/comments that i inserted in the database.

So there are two related things here:

  1. Somehow I'm crashing Electric when I load a large amount of data into Postgres

  2. When it crashes in this situation the shape exists but is empty - it clearly needs to be recycled and recreated

I'm going to dig in a little more and try and create a simple reproduction and some logs.

@samwillis samwillis changed the title Electric crashes during (I think) large snapshots, and restarts with an empty shape. Electric crashes during loading of large amount of data into Postgres, and restarts with an empty shape. Dec 2, 2024
@balegas balegas added the bug label Dec 2, 2024
@samwillis
Copy link
Contributor Author

samwillis commented Dec 2, 2024

In the video below:

  1. Start electric
  2. Load an initial batch of 10 issues + comments into Postgres
  3. Start the front end which creates the shapes
  4. Stop the front end so its not connect to electric
  5. Load 100k issues + 500k comments into Postgres
  6. Electric crashes at the 1min 50sec mark
  7. I start the front end, it still only has 10 issues
  8. I load more, Electric crashes again
Screen.Recording.2024-12-02.at.12.17.03.mp4

Log of the crash:

2024-12-02 12:32:50 backend-1   | 12:32:50.960 [info] Received transaction 739 from Postgres at 0/10F0D038
2024-12-02 12:33:02 postgres-1  | 2024-12-02 12:33:02.799 GMT [65] LOG:  could not receive data from client: Connection reset by peer
2024-12-02 12:33:02 postgres-1  | 2024-12-02 12:33:02.799 GMT [65] STATEMENT:  START_REPLICATION SLOT "electric_slot_default" LOGICAL 0/0 (proto_version '1', publication_names 'electric_publication_default')
2024-12-02 12:33:02 postgres-1  | 2024-12-02 12:33:02.802 GMT [65] LOG:  unexpected EOF on standby connection
2024-12-02 12:33:02 postgres-1  | 2024-12-02 12:33:02.802 GMT [65] STATEMENT:  START_REPLICATION SLOT "electric_slot_default" LOGICAL 0/0 (proto_version '1', publication_names 'electric_publication_default')
2036-01-01 00:00:00 
2001-01-01 00:00:00 xited with code 137

@samwillis
Copy link
Contributor Author

This crash is due to the size of the transaction loading the data. The script to load the data was doing it in a single transaction, splitting it into batches prevents the crash.

@balegas
Copy link
Contributor

balegas commented Dec 2, 2024

okay, we can work around that in the meantime, but will keep the issue open for tacking the issue you've found

@balegas
Copy link
Contributor

balegas commented Dec 3, 2024

investigate the amount of memory that the large transaction took and document a limit of size for the transactions

@alco alco self-assigned this Dec 3, 2024
@alco
Copy link
Member

alco commented Dec 3, 2024

I'll investigate this further but my first attempt at reproducing this wasn't successful: I modified the data-loading script to load all issues and comments in one transaction and followed the reproduction steps. Once the transaction was committed in Postgres, it took Electric 20-30 seconds to fully process it. I was eyeing its memory usage in htop, it peaked at 9.6% of total memory which is slightly above 3GB on my machine.

@alco
Copy link
Member

alco commented Dec 3, 2024

The shape log file ended up 141 MB in size.

By far the least efficient part of the process was serving the shape to a client. Our current chunking algorithm drip-feeds client with 10KB chunks and it is taking forever to load even 25k issues.

@balegas
Copy link
Contributor

balegas commented Dec 5, 2024

We've learned that the machine might have been too small to handle the shape. I opened #2101 and we're working on initial snapshot chunking. Maybe we can close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants