-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query results are incorrect #2027
Comments
Forgot to say I am using the latest master, commit 8e4309b |
It occurred to me to check using a timestamp and it seems like the issue may be the way events are written out? Repeating the query seconds apart the count I get is increasing in value.
So now I'm not sure what this issue should be, there's definitely something wrong with the way these queries are returning inconsistent results, but why is the call to write a batch of points returning successfully if the writes aren't actually done? I can sum all the values and group in yearly buckets and get what seems to be a correct value yet no group by and it's wrong.
|
Thanks @dstrek, any chance you can zip up the data directory and post it somewhere so we can troubleshoot? |
Wait, so after your test, are the results inconsistent? Are you sure that you're not posting points in the same series with the same timestamp? I ask that last question because they'll overwrite previous points. |
yes I know points with the same timestamp in a series overwrite, each batch has 10,000 chances at randoming the same value between 0 and 2.6m (number of seconds in a month) and I figured that accounted for the missing points where the total is 997960 instead of an even million I have another measurement with 10 million points and that exhibits the same issues, there's ~100k series in that one though it shouldn't matter as each query should only merge 1k of them and it responds quickly.
I can't make sense of the dates that return in these, 2015-02-14 isn't 32d ago, and one more day before that certainly isn't 2015-02-01. I will upload the data, though it may take a bit. |
http://cdn.fancy.support/influx_db_11m.tar.bz2
|
Yeah I am getting a very similar result to this. Also on count(distinct(value)) I am getting ERR: json: cannot unmarshal object into Go value of type error e.g.
If I just count without the 'where time' I get;
However if I count (without distinct) and use time I get proper results;
|
I ran the linked test program and verified that it is writing duplicate timestamps which accounts for the missing values. The slow increments on the count is likely due to timestamps in the future. I saw the same thing by writing points in the future. You could try running a query with The timestamp returned by the |
I inserted one million 'events' with 2 fields, value and name, where value is always 1 and name is a random 3 letter string. The time was a random date between now and 30 days ago. Each batch of 10,000 has an 'app' tag with a value of a random string so there's 100 unique tag values.
I can see there are roughly a million points inserted, they were inserted in batches of 10,000 using this code https://gist.github.com/dstrek/e342c32ae1b5dc0b7132 nothing complex just a basic use case
However if I try to query for the last 30, 31, 32, 40 days the counts are all different and incorrect.
Now if I group the -40d query by 1 day buckets theres actually a count.
The text was updated successfully, but these errors were encountered: