Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow API to specify overwrite or append #1920

Closed
otoolep opened this issue Mar 11, 2015 · 13 comments
Closed

Allow API to specify overwrite or append #1920

otoolep opened this issue Mar 11, 2015 · 13 comments

Comments

@otoolep
Copy link
Contributor

otoolep commented Mar 11, 2015

If a point comes in with exactly the same measurement name, tag set, and timestamp, as an existing point, the existing point is overwritten with the new point. The API should allow the user to specify that the point should be added to the existing series. This could be done by bumping the timestamp of the new point by a single nanosecond.

@otoolep
Copy link
Contributor Author

otoolep commented Mar 11, 2015

This is only an issue if the target database and retention policy are also the same for both points.

@dashesy
Copy link

dashesy commented Mar 11, 2015

It is good to have options of course, but bumping the nanoseconds will only work for the first duplicate.

@pauldix
Copy link
Member

pauldix commented Mar 11, 2015

This is actually only an issue if the db, retention policy, measurement, tagset, and timestamp are the same for two data points.

This makes it so that you can't have nanosecond precision and have two data points at the same time. The best you could do would be millisecond precision, but this should be fine for all use cases we're targeting.

@dashesy
Copy link

dashesy commented Mar 11, 2015

@pauldix
Even if the precision is 1h, the first duplicate will be 1h+1ns then second duplicate cannot be 1h+1ns but instead it should be 1h+2ns, and the iterative process could easily get out of hand if it is a runaway process generating the points making keep-duplicates option even less desirable. The first record can be found with a hash of name+tags+timestamp but the duplicate will make it slow, like a hash collision problem, this should warn people not to use this.

P.S. I actually only need the default behaviour, and 1m precision is all I need.

@arobinsongit
Copy link

I'd like to throw in a use case for consideration. In the Industrial data environment sometimes we will "backfill" data with "correct" data after say the system has taken in all 0's for a period of time. I would like to see some mechanism that allows for versioning of the data. Example below

Original, Bad/Sensor Offline Data
00:00 1.10
00:01 2.30
00:02 0.00
00:03 0.00
00:04 0.00
00:05 1.70

I backfill the data with a post that looks something like (in terrible pseudo JSON)
{backfillstart: 00:02, backfillend: 00:04,
values {
{ "value: 2.2", time:00:02}
}
}

So the result, if I just ask for the data would be
00:00 1.10
00:01 2.30
00:02 2.20
00:05 1.70

But, if I craft the query to ask for original data, or a previous version of the data I get back

00:00 1.10
00:01 2.30
00:02 0.00
00:03 0.00
00:04 0.00
00:05 1.70

This may be totally off the reservation for what you are looking to do but would be extremely valuable in a regulated manufacturing environment like Life Sciences or environmental data. In those environments it's typically ok to update data after the fact but you better have a good audit trail that shows it.

Feel free to smack me around for this comment as it is my first issue comment on someone else's repo.

  • andy

@sammy007
Copy link

Appending data is necessary and valuable option. Influxdb should not overwrite points. In distributed environment you might have multiple writers, especially in my case I have to reduce data on writer side and push these chunks from multiple instances to avoid thousands writes and aggregate this semi-reduced data to produce final view.

#2055

Possible deploy: (system1..N) many---many collector (UDP) many---1 Influxdb (HTTP)

@dashesy
Copy link

dashesy commented Mar 25, 2015

@sammy007 if data comes from different sources maybe you can have a source_id tag that differentiates them, that way they will not be overwritten and you keep track of their source. The feature discussed here is for when records are exactly the same (time, and tags)

@sammy007
Copy link

@dashesy Thanks. I already figured out this, I use source_id and additional timestamp (1s precision) as a tag because I am pushing data from collector several times per minute with the same timestamp (with minute precision). Sounds like a trick or a pair of crutches for me, I really would love option to append data.

@beckettsean beckettsean added this to the Longer term milestone May 5, 2015
@beckettsean beckettsean changed the title Allow 0.9 API to specify overwrite or append Allow API to specify overwrite or append May 5, 2015
@ckmaresca
Copy link

In a similar fashion, I'd like to set the behavior to ignore - so overwrite, append or ignore. There is no point in overwriting the data if it already exists and is exactly the same.

@bbinet
Copy link
Contributor

bbinet commented Sep 18, 2015

it would also be great to be able to merge the new point in the the old point, but it would rather deserve an UPDATE query.

@sseveran
Copy link

sseveran commented Feb 6, 2016

It might be nice to make this more clear somewhere in the docs. Maybe it is and I never saw it. I spent a considerable amount of time thinking an app I was building was broken until I figured out that the telemetry was bad and I needed to move to nanosecond precision.

@pauldix
Copy link
Member

pauldix commented Feb 6, 2016

We won't be doing this feature. Timestamps can go down to the nanosecond and writers should specify them. If the timestamp is the same, then the values in the write will be updated.

Values not specified in the write won't be touched. So if you write a value for field foo and then later write a value with the same measurement, tagset, and timestamp for field bar, you'd have values for both at that timestamp.

Checking existing values on a write would have such a massive negative impact on performance within the database that we're unlikely to be able to dot his in the future.

@pauldix pauldix closed this as completed Feb 6, 2016
@sseveran
Copy link

sseveran commented Feb 8, 2016

@pauldix The only suggestion I have is to make more clear in the docs. For instance in the 0.9 Schema Design docs there is no mention that (tags,timestamp) is the primary key. The mental model I had of InfluxDB was that it would just record each event that I sent it without checking the existence of or overwriting previous. Thats not the case and thats fine. I do think that this could be spelled out more clearly in the docs somewhere if it was not already done for 0.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants