Allow API to specify overwrite or append #1920

otoolep · 2015-03-11T17:52:29Z

If a point comes in with exactly the same measurement name, tag set, and timestamp, as an existing point, the existing point is overwritten with the new point. The API should allow the user to specify that the point should be added to the existing series. This could be done by bumping the timestamp of the new point by a single nanosecond.

otoolep · 2015-03-11T18:00:49Z

This is only an issue if the target database and retention policy are also the same for both points.

dashesy · 2015-03-11T18:08:37Z

It is good to have options of course, but bumping the nanoseconds will only work for the first duplicate.

pauldix · 2015-03-11T18:11:11Z

This is actually only an issue if the db, retention policy, measurement, tagset, and timestamp are the same for two data points.

This makes it so that you can't have nanosecond precision and have two data points at the same time. The best you could do would be millisecond precision, but this should be fine for all use cases we're targeting.

dashesy · 2015-03-11T18:21:41Z

@pauldix
Even if the precision is 1h, the first duplicate will be 1h+1ns then second duplicate cannot be 1h+1ns but instead it should be 1h+2ns, and the iterative process could easily get out of hand if it is a runaway process generating the points making keep-duplicates option even less desirable. The first record can be found with a hash of name+tags+timestamp but the duplicate will make it slow, like a hash collision problem, this should warn people not to use this.

P.S. I actually only need the default behaviour, and 1m precision is all I need.

arobinsongit · 2015-03-13T15:31:22Z

I'd like to throw in a use case for consideration. In the Industrial data environment sometimes we will "backfill" data with "correct" data after say the system has taken in all 0's for a period of time. I would like to see some mechanism that allows for versioning of the data. Example below

Original, Bad/Sensor Offline Data
00:00 1.10
00:01 2.30
00:02 0.00
00:03 0.00
00:04 0.00
00:05 1.70

I backfill the data with a post that looks something like (in terrible pseudo JSON)
{backfillstart: 00:02, backfillend: 00:04,
values {
{ "value: 2.2", time:00:02}
}
}

So the result, if I just ask for the data would be
00:00 1.10
00:01 2.30
00:02 2.20
00:05 1.70

But, if I craft the query to ask for original data, or a previous version of the data I get back

00:00 1.10
00:01 2.30
00:02 0.00
00:03 0.00
00:04 0.00
00:05 1.70

This may be totally off the reservation for what you are looking to do but would be extremely valuable in a regulated manufacturing environment like Life Sciences or environmental data. In those environments it's typically ok to update data after the fact but you better have a good audit trail that shows it.

Feel free to smack me around for this comment as it is my first issue comment on someone else's repo.

andy

sammy007 · 2015-03-23T06:43:58Z

Appending data is necessary and valuable option. Influxdb should not overwrite points. In distributed environment you might have multiple writers, especially in my case I have to reduce data on writer side and push these chunks from multiple instances to avoid thousands writes and aggregate this semi-reduced data to produce final view.

#2055

Possible deploy: (system1..N) many---many collector (UDP) many---1 Influxdb (HTTP)

dashesy · 2015-03-25T15:30:45Z

@sammy007 if data comes from different sources maybe you can have a source_id tag that differentiates them, that way they will not be overwritten and you keep track of their source. The feature discussed here is for when records are exactly the same (time, and tags)

sammy007 · 2015-03-25T15:35:30Z

@dashesy Thanks. I already figured out this, I use source_id and additional timestamp (1s precision) as a tag because I am pushing data from collector several times per minute with the same timestamp (with minute precision). Sounds like a trick or a pair of crutches for me, I really would love option to append data.

ckmaresca · 2015-06-03T17:49:53Z

In a similar fashion, I'd like to set the behavior to ignore - so overwrite, append or ignore. There is no point in overwriting the data if it already exists and is exactly the same.

bbinet · 2015-09-18T15:32:30Z

it would also be great to be able to merge the new point in the the old point, but it would rather deserve an UPDATE query.

sseveran · 2016-02-06T04:02:52Z

It might be nice to make this more clear somewhere in the docs. Maybe it is and I never saw it. I spent a considerable amount of time thinking an app I was building was broken until I figured out that the telemetry was bad and I needed to move to nanosecond precision.

pauldix · 2016-02-06T17:54:11Z

We won't be doing this feature. Timestamps can go down to the nanosecond and writers should specify them. If the timestamp is the same, then the values in the write will be updated.

Values not specified in the write won't be touched. So if you write a value for field foo and then later write a value with the same measurement, tagset, and timestamp for field bar, you'd have values for both at that timestamp.

Checking existing values on a write would have such a massive negative impact on performance within the database that we're unlikely to be able to dot his in the future.

sseveran · 2016-02-08T16:28:27Z

@pauldix The only suggestion I have is to make more clear in the docs. For instance in the 0.9 Schema Design docs there is no mention that (tags,timestamp) is the primary key. The mental model I had of InfluxDB was that it would just record each event that I sent it without checking the existence of or overwriting previous. Thats not the case and thats fine. I do think that this could be spelled out more clearly in the docs somewhere if it was not already done for 0.10.

beckettsean added this to the Longer term milestone May 5, 2015

beckettsean changed the title ~~Allow 0.9 API to specify overwrite or append~~ Allow API to specify overwrite or append May 5, 2015

neonstalwart mentioned this issue May 12, 2015

handle aggregations with 0 intervals #2547

Merged

otoolep mentioned this issue Jun 9, 2015

Allow a user to control data point overwrite #1660

Closed

beckettsean added the area/writes label Sep 17, 2015

pauldix closed this as completed Feb 6, 2016

piglei mentioned this issue Aug 25, 2016

data point missing when using procstat with multi pids outputing to influxdb influxdata/telegraf#1668

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow API to specify overwrite or append #1920

Allow API to specify overwrite or append #1920

otoolep commented Mar 11, 2015

otoolep commented Mar 11, 2015

dashesy commented Mar 11, 2015

pauldix commented Mar 11, 2015

dashesy commented Mar 11, 2015

arobinsongit commented Mar 13, 2015

sammy007 commented Mar 23, 2015

dashesy commented Mar 25, 2015

sammy007 commented Mar 25, 2015

ckmaresca commented Jun 3, 2015

bbinet commented Sep 18, 2015

sseveran commented Feb 6, 2016

pauldix commented Feb 6, 2016

sseveran commented Feb 8, 2016

Allow API to specify overwrite or append #1920

Allow API to specify overwrite or append #1920

Comments

otoolep commented Mar 11, 2015

otoolep commented Mar 11, 2015

dashesy commented Mar 11, 2015

pauldix commented Mar 11, 2015

dashesy commented Mar 11, 2015

arobinsongit commented Mar 13, 2015

sammy007 commented Mar 23, 2015

dashesy commented Mar 25, 2015

sammy007 commented Mar 25, 2015

ckmaresca commented Jun 3, 2015

bbinet commented Sep 18, 2015

sseveran commented Feb 6, 2016

pauldix commented Feb 6, 2016

sseveran commented Feb 8, 2016