-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.10.0-nightly-146f36c] - Missing data a minute before current time. Comes back later #5193
Comments
I'm also seeing "missing data" issues occasionally with TSM1 although it seems to happen under load as well. Let me know if there's any useful debug data I can provide to help troubleshoot |
In this case the data exists, but reporting null. If you wait an extra period of time then the data will show up. It might be in some other processing state that the query object can not access. As a note in grafana I am using report null as zero. So those dips in the graph are actually no data being returned by influxdb. I could print the response from influxdb, but found it easier to understand the problem with a graph. |
Thanks for the reports. This is probably something to do with WAL caching -- data that still exists in the WAL must be combined with data on disk in TSM files. There may be some bugs in that area. Once the data has been written completely to TSM files, it seems your data then looks good. |
Noticed there were a few commits related to compaction and thought I'd mention this problem still exists with the latest version. This problem wouldn't be as bad if it was no data at the beginning as could blame it on being slow, but with a sudden loss of data it makes it appear something happened after a minute to users. |
@gerrickw Thanks for the update. Still looking into this issue. |
We were closing the cursor when we read the last block which caused the internal state to be cleared. In a group by query, we seeked multiple times so depending on the group by interval and how the data was laid out in the blocks, we woudl close the cursor and the last block would get skipped. Fixes #5193
No longer seeing a gap after building the latest version from master. Thanks! I tried downloading the nightly build today expecting to see the changes from yesterday, but the build referenced a commit from 4 days ago. Possibly a delay in making the latest builds. |
There was a fix in 5b1791, but is not present in the current branch likely due to a rebase issue. The current code panics with a query like: select value from cpu group by host order by time desc limit 1 This fixes the panic as well as prevents #5193 from re-occurring. The issue is that agressively closing the cursors clears out the seeks slice so re-seeking will fail.
Summary:
It appears the first 10s tick after around a minute will have no data, while before and after showing data. If I wait a 10s-60s the data will show up, but the last minute will have no data. This could also be thought of.. data exists, data disappears around 1 minute mark, and comes back again. As a note I am only loading data in every 10 seconds with the same epoch time.
This actually use to be much worst on the dec 14th latest build where the duration of missing data was at times 1-5 minutes, but looks like it improved after I pulled the build today. Have only run for 30 minutes so a possibility it could get worst. Curious if it improved from #5142.
Log with build version:
2015/12/21 14:21:56 InfluxDB starting, version 0.10.0-nightly-146f36c, branch master, commit 146f36c,
built 2015-12-21T05:00:47+0000
2015/12/21 14:21:56 Go version go1.5.2, GOMAXPROCS set to 40
Pre-testing:
Data input repo steps:
Example images below showing issue. Notice how there is missing data around a minute with data before and after. Then on the 2nd screen shot notice the time range has progressed with previous data missing existing.
![missing-data-1](https://cloud.githubusercontent.com/assets/656611/11942300/679d757a-a7ea-11e5-8ecd-9b340c758a02.png)
![missing-data-2](https://cloud.githubusercontent.com/assets/656611/11942301/679e4cca-a7ea-11e5-91e9-508421bc160c.png)
A few log entries related to compaction if useful:
[tsm1] 2015/12/21 14:44:15 compacted 3 TSM into 1 files in 3m7.324218109s
[tsm1] 2015/12/21 14:44:15 beginning compaction of 9 TSM files
[tsm1] 2015/12/21 14:45:22 compacted 9 TSM into 1 files in 1m6.612098711s
[tsm1] 2015/12/21 14:45:22 beginning compaction of 3 TSM files
[tsm1] 2015/12/21 14:46:11 compacted 3 TSM into 1 files in 49.335513717s
[tsm1] 2015/12/21 14:46:19 beginning compaction of 3 TSM files
[tsm1] 2015/12/21 14:46:49 compacted 3 TSM into 1 files in 30.000324717s
[tsm1] 2015/12/21 14:47:08 beginning compaction of 3 TSM files
[tsm1] 2015/12/21 14:47:40 compacted 3 TSM into 1 files in 32.406410161s
[tsm1] 2015/12/21 14:47:57 beginning compaction of 3 TSM files
The text was updated successfully, but these errors were encountered: