-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow API to specify overwrite or append #1920
Comments
This is only an issue if the target database and retention policy are also the same for both points. |
It is good to have options of course, but bumping the nanoseconds will only work for the first duplicate. |
This is actually only an issue if the db, retention policy, measurement, tagset, and timestamp are the same for two data points. This makes it so that you can't have nanosecond precision and have two data points at the same time. The best you could do would be millisecond precision, but this should be fine for all use cases we're targeting. |
@pauldix P.S. I actually only need the default behaviour, and 1m precision is all I need. |
I'd like to throw in a use case for consideration. In the Industrial data environment sometimes we will "backfill" data with "correct" data after say the system has taken in all 0's for a period of time. I would like to see some mechanism that allows for versioning of the data. Example below Original, Bad/Sensor Offline Data I backfill the data with a post that looks something like (in terrible pseudo JSON) So the result, if I just ask for the data would be But, if I craft the query to ask for original data, or a previous version of the data I get back 00:00 1.10 This may be totally off the reservation for what you are looking to do but would be extremely valuable in a regulated manufacturing environment like Life Sciences or environmental data. In those environments it's typically ok to update data after the fact but you better have a good audit trail that shows it. Feel free to smack me around for this comment as it is my first issue comment on someone else's repo.
|
Appending data is necessary and valuable option. Influxdb should not overwrite points. In distributed environment you might have multiple writers, especially in my case I have to reduce data on writer side and push these chunks from multiple instances to avoid thousands writes and aggregate this semi-reduced data to produce final view. Possible deploy: (system1..N) many---many collector (UDP) many---1 Influxdb (HTTP) |
@sammy007 if data comes from different sources maybe you can have a |
@dashesy Thanks. I already figured out this, I use source_id and additional timestamp (1s precision) as a tag because I am pushing data from collector several times per minute with the same timestamp (with minute precision). Sounds like a trick or a pair of crutches for me, I really would love option to append data. |
In a similar fashion, I'd like to set the behavior to ignore - so overwrite, append or ignore. There is no point in overwriting the data if it already exists and is exactly the same. |
it would also be great to be able to merge the new point in the the old point, but it would rather deserve an UPDATE query. |
It might be nice to make this more clear somewhere in the docs. Maybe it is and I never saw it. I spent a considerable amount of time thinking an app I was building was broken until I figured out that the telemetry was bad and I needed to move to nanosecond precision. |
We won't be doing this feature. Timestamps can go down to the nanosecond and writers should specify them. If the timestamp is the same, then the values in the write will be updated. Values not specified in the write won't be touched. So if you write a value for field Checking existing values on a write would have such a massive negative impact on performance within the database that we're unlikely to be able to dot his in the future. |
@pauldix The only suggestion I have is to make more clear in the docs. For instance in the 0.9 Schema Design docs there is no mention that (tags,timestamp) is the primary key. The mental model I had of InfluxDB was that it would just record each event that I sent it without checking the existence of or overwriting previous. Thats not the case and thats fine. I do think that this could be spelled out more clearly in the docs somewhere if it was not already done for 0.10. |
If a point comes in with exactly the same measurement name, tag set, and timestamp, as an existing point, the existing point is overwritten with the new point. The API should allow the user to specify that the point should be added to the existing series. This could be done by bumping the timestamp of the new point by a single nanosecond.
The text was updated successfully, but these errors were encountered: