Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect value stored in database #6643

Closed
bwight opened this issue May 16, 2016 · 7 comments
Closed

Incorrect value stored in database #6643

bwight opened this issue May 16, 2016 · 7 comments
Assignees
Labels

Comments

@bwight
Copy link

bwight commented May 16, 2016

Bug report

System info:
InfluxDB: 0.12.1
OS: linux 4.4.5-15.26.amzn1.x86_64
Docker Image: tutum/influxdb 0.12 33b36625800c

We're using the statsd protocol and using telegraf's statsd input plugin to write the points to influxdb.

Steps to reproduce:

We are not yet clear exactly how to reproduce this bug. However it does happen frequently due to the amount of metrics that we report. At least a couple times a day we're experiencing a bug that causes the query we run on the data to return values that we never inserted into the database.

We identified this issue because of unexpected negative values that were being reported from counters that we reported using the statsd protocol. Telegraf does not allow negative values to be reported as a counter, only the gauge type allows for negative values.

To help track down this issue to at least rule out Telegraf or our application counters as the issue we enabled write tracing on influx and here was the result.

Query

SELECT * FROM main."default".provider_counter WHERE env = 'development' AND application =     'pipeline' AND entity = 'linev2' AND host = 'ip-172-22-17-221' AND method = 'insert' AND module = 'datawriter' AND provider = 'bet365v2'

Query Result

time    application cluster count   entity  env host    method  metric_type module  provider    size
2016-05-13T21:08:20Z    "pipeline"  "pipelineimporter"  2   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T21:22:20Z    "pipeline"  "pipelineimporter"  2   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T21:44:30Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T21:54:00Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T22:01:30Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T22:28:50Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T22:43:40Z    "pipeline"  "pipelineimporter"  6   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T22:44:40Z    "pipeline"  "pipelineimporter"  2   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T22:48:10Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T22:55:40Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T23:01:40Z    "pipeline"  "pipelineimporter"  3   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T23:04:10Z    "pipeline"  "pipelineimporter"  -4  "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T23:10:30Z    "pipeline"  "pipelineimporter"  4   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"  
2016-05-13T23:17:50Z    "pipeline"  "pipelineimporter"  1   "linev2"    "development"   "ip-172-22-17-221"  "insert"    "counter"   "datawriter"    "bet365v2"

As you can see here the unexpected value of -4 at 2016-05-13T23:04:10Z that was inserted. So knowing this we looked at the write tracing and here's a grep of that file to match the query that we show above.

Write Trace

provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=2i 1463173700
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=2i 1463174540
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463175870
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463176440
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463176890
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463178530
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=6i 1463179420
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=2i 1463179480
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463179690
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463180140
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=3i 1463180500
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=2i 1463180650
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463181030
provider_counter,application=pipeline,cluster=pipelineimporter,entity=linev2,env=development,host=ip-172-22-17-221,method=insert,metric_type=counter,module=datawriter,provider=bet365v2 count=1i 1463181470

We also noticed that its not just the -4 that's incorrect but also the 4 that was inserted on the line after that. The rest of the writes are in line with what was posted as well as the timestamps.

Retention Policy

show retention policies on main
name    duration    shardGroupDuration  replicaN    default
default 0       168h0m0s        1       true

Continuous Queries

show continuous queries
name: main
----------
name    query


name: _internal
---------------
name    query

write-trace-1463180650.txt
influxdb-config.txt

@e-dard
Copy link
Contributor

e-dard commented May 18, 2016

@bwight thanks for the report. I've tried to reproduce with the data provided but I'm unable to. Just to clarify—are you saying that the only points written to produce the bad result are the ones included in your report?

Or was more data written to produce this?

@e-dard e-dard self-assigned this May 18, 2016
@bwight
Copy link
Author

bwight commented May 18, 2016

Hi @e-dard I appreciate your response. Yes there is a lot more data that we're inserting into influx. The data that I uploaded was only for that series for a period of about 2 hours. I'm going to try and make a script that can reproduce this bug and then I will share that here.

@e-dard
Copy link
Contributor

e-dard commented May 18, 2016

That would be great @bwight thanks!

@melor
Copy link

melor commented Nov 16, 2016

Somewhat similar but not necessarily related issue: influxdata/telegraf#2050

@melor
Copy link

melor commented Nov 25, 2016

Similar problem reproduces very often for us on Fedora 23, Linux 4.7.9-100.fc23.x86_64 #1 SMP Thu Oct 20 15:59:59 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux, InfluxDB 1.0.0.

Haven't verified this on InfluxDB 1.1 yet.

I've marked a specific point in time in the example below with "-MARK-". Before the mark the data does not match and there are negative values in influxdb, after the mark the data points match.

Here's a sample of the input data:

disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276643840i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25382912i,used_percent=0.30574355826889055 1480078530000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276623360i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25403392i,used_percent=0.3059902450191478 1480078560000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276602880i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25423872i,used_percent=0.30623693176940514 1480078590000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276582400i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25444352i,used_percent=0.30648361851966244 1480078620000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276561920i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25464832i,used_percent=0.3067303052699197 1480078650000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276541440i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25485312i,used_percent=0.30697699202017703 1480078680000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276520960i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25505792i,used_percent=0.3072236787704343 1480078710000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276504576i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25522176i,used_percent=0.3074210281706401 1480078740000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276484096i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25542656i,used_percent=0.3076677149208974 1480078770000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276463616i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25563136i,used_percent=0.30791440167115475 1480078800000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276443136i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25583616i,used_percent=0.308161088421412 1480078830000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276422656i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25604096i,used_percent=0.3084077751716693 1480078860000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276402176i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25624576i,used_percent=0.30865446192192664 1480078890000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276381696i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25645056i,used_percent=0.3089011486721839 1480078920000000000
-MARK-
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276361216i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25665536i,used_percent=0.3091478354224412 1480078950000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276340736i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25686016i,used_percent=0.30939452217269847 1480078980000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276324352i,inodes_free=524222i
,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25702400i,used_percent=0.30959187157290435 1480079010000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276303872i,inodes_free=524222i,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25722880i,used_percent=0.3098385583231616 1480079040000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276283392i,inodes_free=524222i,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25743360i,used_percent=0.3100852450734189 1480079070000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276262912i,inodes_free=524222i,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25763840i,used_percent=0.31033193182367624 1480079100000000000
disk,cloud_name=google-us-east1,fstype=ext4,host=7026,path=/srv/aiven-persistent/7026-service,service_id=1933,service_type=influxdb free=8276242432i,inodes_free=524222i,inodes_total=524288i,inodes_used=66i,total=8318803968i,used=25784320i,used_percent=0.3105786185739335 1480079130000000000

Here's the same data points raw from influxdb using "select time, free from disk where host='7026' and time > now() - 25m and path =~ /service/":

[(('disk', None),
  [{'free': -8276746238, 'time': '2016-11-25T12:58:00Z'},
   {'free': 8276766717, 'time': '2016-11-25T12:58:30Z'},
   {'free': 8276504576, 'time': '2016-11-25T12:59:00Z'},
   {'free': -8276525056, 'time': '2016-11-25T12:59:30Z'},
   {'free': 8276545535, 'time': '2016-11-25T13:00:00Z'},
   {'free': -8276566015, 'time': '2016-11-25T13:00:30Z'},
   {'free': 8276586494, 'time': '2016-11-25T13:01:00Z'},
   {'free': -8276606974, 'time': '2016-11-25T13:01:30Z'},
   {'free': 8276627453, 'time': '2016-11-25T13:02:00Z'},
-MARK-
   {'free': 8276361216, 'time': '2016-11-25T13:02:30Z'},
   {'free': 8276340736, 'time': '2016-11-25T13:03:00Z'},
   {'free': 8276324352, 'time': '2016-11-25T13:03:30Z'},
   {'free': 8276303872, 'time': '2016-11-25T13:04:00Z'},
   {'free': 8276283392, 'time': '2016-11-25T13:04:30Z'},
   {'free': 8276262912, 'time': '2016-11-25T13:05:00Z'},
   {'free': 8276242432, 'time': '2016-11-25T13:05:30Z'},
   {'free': 8276221952, 'time': '2016-11-25T13:06:00Z'},
   {'free': 8276201472, 'time': '2016-11-25T13:06:30Z'},
   {'free': 8276180992, 'time': '2016-11-25T13:07:00Z'},
   {'free': 8276164608, 'time': '2016-11-25T13:07:30Z'},
   {'free': 8276144128, 'time': '2016-11-25T13:08:00Z'},
   {'free': 8276123648, 'time': '2016-11-25T13:08:30Z'},
   {'free': 8276103168, 'time': '2016-11-25T13:09:00Z'},
   {'free': 8276082688, 'time': '2016-11-25T13:09:30Z'},
   {'free': 8276062208, 'time': '2016-11-25T13:10:00Z'},
   {'free': 8276041728, 'time': '2016-11-25T13:10:30Z'},
   {'free': 8276021248, 'time': '2016-11-25T13:11:00Z'},
   {'free': 8276004864, 'time': '2016-11-25T13:11:30Z'},
   {'free': 8275984384, 'time': '2016-11-25T13:12:00Z'},
   {'free': 8275963904, 'time': '2016-11-25T13:12:30Z'},
   {'free': 8275943424, 'time': '2016-11-25T13:13:00Z'},
   {'free': 8275922944, 'time': '2016-11-25T13:13:30Z'},
   {'free': 8275902464, 'time': '2016-11-25T13:14:00Z'},

Visually the error pattern is very similar most of the time, please see the screenshot in influxdata/telegraf#2050

@melor
Copy link

melor commented Nov 27, 2016

In my case (message above) the problem disappeared after upgrading influxdb from 1.0 to 1.1.

@jwilder
Copy link
Contributor

jwilder commented Apr 5, 2017

This is likely fixed by #7397 in 1.0.2. If this is still occurring in 1.2 please log a new issue.

@jwilder jwilder closed this as completed Apr 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants