Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Progress Info #163

Closed
dpakon1 opened this issue Sep 4, 2020 · 4 comments
Closed

Incorrect Progress Info #163

dpakon1 opened this issue Sep 4, 2020 · 4 comments

Comments

@dpakon1
Copy link

dpakon1 commented Sep 4, 2020

Hello, thank you very much for good library!

I found bug with api method "execute_with_progress", my query iterator show me incorrect total_rows value.

Progress:
num_rows: 26581978 total_rows: -1140508263
num_rows: 58067973 total_rows: -1140508263
num_rows: 90825722 total_rows: -1140508263
....
num_rows: 3105694729 total_rows: -1140508263
num_rows: 3150702347 total_rows: -1140508263
num_rows: 3154441799 total_rows: -1140508263

Result:
progress.rows: 3154441799
progress.total_rows: -1140508263

@xzkostyan
Copy link
Member

Hi.

Can you provide minimal code/SQL example for reproducing the problem?

@dpakon1
Copy link
Author

dpakon1 commented Sep 15, 2020

I used example from docs executing query with progress bar: https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html?highlight=progress#selecting-data-with-progress-statistics

python version: 3.6.5
clickhouse-driver ver: 0.1.4
clickhouse-server ver: 19.14.x.x & 20.8.2.3

My python code:

from clickhouse_driver import Client
from datetime import datetime

client = Client('server', user='default', password='123', database='default')

progress = client.execute_with_progress('SELECT * from numbers(3000000000)')
# https://clickhouse.tech/docs/v19.14/ru/operations/system_tables/#system-numbers

timeout = 20
started_at = datetime.now()

for num_rows, total_rows in progress:
    print(f"num_rows: {num_rows}\ttotal_rows: {total_rows}")
    if total_rows:
        done = float(num_rows) / total_rows
    else:
        done = total_rows
    print(f"done: {done}")
    now = datetime.now()
    elapsed = (now - started_at).total_seconds()

    if elapsed > timeout and done < 0.5:
        print("CANCELED")
        client.cancel()
        break
else:
    rv = progress.get_result()
    print(len(rv))

And i got result:

num_rows: 327525        total_rows: -1294967296
done: -0.0002529214452069066
num_rows: 982575        total_rows: -1294967296
done: -0.0007587643356207198
num_rows: 1899645       total_rows: -1294967296
...

@xzkostyan
Copy link
Member

There was a bug with reading big numbers. It will work if you lower number 3000000000 from your example to 300000000.

Fix is already published in master branch. You can install package directly from github.

Output after fix

num_rows: 1835008	total_rows: 3000000000
done: 0.0006116693333333333
num_rows: 2686976	total_rows: 3000000000
done: 0.0008956586666666667
num_rows: 3342336	total_rows: 3000000000
done: 0.001114112
num_rows: 3997696	total_rows: 3000000000
done: 0.0013325653333333333
num_rows: 4653056	total_rows: 3000000000
done: 0.0015510186666666667
num_rows: 5308416	total_rows: 3000000000
done: 0.001769472
num_rows: 5963776	total_rows: 3000000000
done: 0.0019879253333333334
num_rows: 6684672	total_rows: 3000000000
done: 0.002228224

@dpakon1
Copy link
Author

dpakon1 commented Sep 20, 2020

Thank you, now function works fine! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants