-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: --reporting-period flag fails with timescaledb-parallel-copy 0.6.0+ #96
Comments
Thank you for the detailed explanation. It really helps to reproduce the issue and understand your situation. I want to first point out that we already noticed the bug with progress report that was introduced on v0.6.0. There is a fix for it that was released on version v0.8.1-rc.1 with the PR #93 That version is still not released officially as we are still testing some other issues. In particular one that was reported last week due to a dead lock when the option implemented in the previous PR is used. About the error when connecting to a database. The dead lock was also identified and fixed here #91 that is released on the same version mentioned above. I'll fix the race condition and then all the functionality will be back to expected behavior. |
That's great to hear, as we are planning to use this in production for data ingestion. My colleague has encountered some deadlock problems as well. Initially, we thought these were related to timescale itself, specifically upserts into compressed data as there were some deadlock issues regarding the compression, but it seems that the issue may have been fixed in timescale, maybe 2.16 version not sure. Even on the latest timescale 2.17 version the timescaledb-parallel-copy tool still causes occasional deadlocks for my colleague, although I'm not sure which parallel-copy version. Due to the deadlock issues, we can't use parallel copy in production. Do you know if parallel-copy is commonly used in production environments? I'm a little worried about these regression issues potentially affecting our production setup. |
We started to use timescaledb-parallel-copy as a package on production recently for timescale cloud. This powers the feature to import csv files from the web UI. We added a couple features we needed, mainly around error reporting and package interaction. Those created the some deadlocks that we are aiming to resolve soon. We are eager to make some improvements to the package interface so we can take it to a stable v1 version. Mainly about having a clean interface. But also about extending test coverage, so we do not have regressions. Would you like to share some details on how you plan to use this on production? You may be able to use our API to import CSV data if you are using timescale cloud |
Hi, Apologies for the late reply, I was on holiday and just got back to working on timescale projects. We are using the open-source version of timescale, not the cloud version, so integrating with your API wouldn’t apply to our setup. However, we are using it for large-scale daily batch inserts in production. We either load from CSV files or stream CSV data via stdin. Also, I'm using it for load testing, so it would be useful to get the reporting working for insert stats (ROW/s) as before. We manage utility smart meters, collecting meter readings from various sources using multiple protocols (CoAP, DLMS/COSEM, FTP, REST API, AMQP, MQTT, etc.). We are gradually transitioning to real-time streaming, likely using Kafka/MQTT, but we would still like to use parallel-copy for batch-insert jobs to ingest meter readings. It would be great if you could resolve these regression issues. I think otherwise, the tool has been working great. |
I see, It is good to see you are finding timescaledb-parallel-copy useful 🎉 As mentioned before, The latest available version is addressing the bug with reporting period. You can update and let us know if you notice anything not working as expected. In the mean time, We've been working on making the tool more reliable and decided to implement idempotency. This will guarantee that you can retry operations and data will be inserted only once. In addition to that, you get a very nice table that will report every successful insert. This will allow you to notice if there are some gaps in your data for whatever reason. Here is the PR with that change #114 I'll be happy to get feedback from you so we can make this work for everyone 😄 |
Description:
The
--reporting-period
flag, which worked correctly in versions 0.5.1 and earlier, no longer functions as expected in versions 0.6.0 and above. Additionally, the tool fails to provide useful SQL error messages altogether since version 0.7.0.The expected behaviour is that the
--reporting-period
flag should report intermediate insert stats, and any database errors (e.g., nonexistent database) should return clear SQL error messages.timescaledb-parallel-copy (linux amd64) 0.7.0 - both period flag and sql errors don't work
timescaledb-parallel-copy (linux amd64) 0.6.0 - reporting-period flag doesn't work
timescaledb-parallel-copy (linux amd64) 0.5.1 - all works no issues
Steps to Reproduce:
go install github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy@latest
timescaledb-parallel-copy
with the--reporting-period
flag on version 0.6.0 or later:Expected Behavior:
--reporting-period
flag should work as it did in version 0.5.1. and report intermediate insert stats.Actual Behavior:
Versions 0.6.0+:
--reporting-period
flag does not function.Versions 0.7.0+:
--period
flag still does not function.Environment:
--reporting-period
fails, SQL errors still reported)--reporting-period
fails, SQL errors not reported)Logs/Stack Trace:
From version 0.7.0:
Impact:
--reporting-period
flag.Additional Context:
--period
flag.Suggested Fix:
--reporting-period
flag implementation since version 0.6.0.The text was updated successfully, but these errors were encountered: