-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for column indexes #582
Comments
+1 |
1 similar comment
+1 |
+1 |
You lost me on
It's not clear to me how the tags would be stored, and why would it be more efficient? We'd still store them as uint's right, and keep a separate table with the value-uint mapping? just like in the first approach? And we'd still need a datastructure (B-tree or whatever) to track locations of all values of all tags? |
FWIW: I like the index idea. |
@Dieterbe what I meant was that with indexes, the normal write path would be:
Whereas with the approach of having it explicitly specified in the input format in the form of tags, there's no logic to lookup the series/column tuple to see if it's indexed. You already know from the input. Probably not a big deal. |
Here's an alternative: You can extend this idea to multiple indexed columns by (behind the scenes) creating a series for every combination of values of the columns you want to index. un-indexed colums would be regular columns in each of these series. thoughts? |
that's exactly the idea. Under the covers this is how indexes will work. On Wed, Jun 18, 2014 at 7:41 AM, Dieter Plaetinck [email protected]
|
+1 |
1 similar comment
+1 |
Correct me if I am wrong, but this sounds like it would make InfluxDB much more useful for ingesting log data -- log data that had been parsed in such a manner that key fields like "severity" and "hostname" has been parsed out. One could imagine custom parsers pulling up metric information from unstructured data, and then sending it into InfluxDB. |
if it would work like how i described, then i'm not sure about using the "index" terminology for this feature. I don't feel strongly about this, but: traditionally, database indexes have always been datastructures with pointers to records, and they come with certain behaviors that people come to expect: slower writes and extra disk space to maintain the extra index datastructure, and reading without where clause is just as fast as it was before (in reality, if the io device spends more time doing writes, reads are impacted when it saturates) i'm not too familiar with the implementation details, but it looks like in this case, there's just a bit of metadata/glue, no index datastructure of pointers, barely any extra diskspace, write speed should be barely impacted (unless high cardinality on the indexed column, maybe), and read without where clause is now always slower by design (but cpu-bound instead of io-bound so hopefully only a small amount, but that's to be seen. especially on ssd the difference could be very noticeable) the main thing that this and indexes have in common is that reads with a where clause are faster, the rest seems different. I like the idea of calling them tags. People will, however, try to find out "how do i use indexes with influxdb" because that's the familiar term, so we could have a doc page called "indexes/tags" where we explain the differences. |
+1 |
+1 - this would make influx+grafana killer for metrics and log search. |
+1 Any updates on when we might have this? |
+1 |
This will probably get rolled into the API refactor. Please comment on that PR: #1059 |
+1 It's a important feature and I'm waiting for it |
+1 for me too. Thinking about making the switch from mongoDB to InfluxDB but performance without indexes is too poor for me to make it happen. |
+1 for this feature. |
+1 also from me for this feature. That would be a massive feature improve. |
+1 Is the tags implementation ready now? I cannot find any documentation on how to use this great feature. |
+1 Too. What is the current status of this (great) feature? ^^ Thanks |
Seems like we have this now @pauldix ? |
Something like it. Tags in 0.9.0 should take care of this feature. |
Where can I found the documentation about how to use column indexes? All influxdb pages comes to here |
We should add support for column indexes. Everything will still be indexed by time, but you'll be able to additionally index by column values. For example if you have a series:
Later when you're doing queries, if you do
select sum(value) from memory_used where host = 'serverA'
it has to do a range scan to get at all of that data. Indexes would make it efficient to pull back the data.Another possible win on indexes, is if a column is indexed, we should convert the value into a
uint64
, which will be more efficient to store. Note that the indexes are intended for hash lookups (i.e.=
) and not range indexes.Adding an index could look something like this:
The second example will index any series by host that has
.h.
in the series name. You can then create conventions around putting things in series names to have those values be indexed.Because of the way I imagine this being implemented, you'd want to keep the total number of indexes in each DB fairly small (< 100). But only real testing will reveal what makes sense.
One other idea I heard floated for how to do this is to update the input format. Instead of specifying this as a configuration option, have a new section on ingestion like this:
The idea being that
tags
would be automatically indexed and would be the last values in the array of a point.This would be more efficient because we wouldn't need to worry about looking up indexes every time we write data in. Would like to hear what other people think about the two approaches.
The text was updated successfully, but these errors were encountered: