Support High Cardinality Tags and Series #7151

jwilder · 2016-08-15T16:47:43Z

Feature Request

The database should be able to support higher levels of cardinality for tags and series. Currently, the full tag set is loaded into an in-memory index for fast query planning. When tags with a large number of values are written, the in-memory index can consume more memory than is available on the host.

Proposal:

The database should not require loading the full tag set into an in-memory index. Higher cardinality series and tags should be able to be stored and queried and not be limited by the amount of RAM on the host.

Current behavior:

Currently, high cardinality data causes the process memory usage to grow quickly increasing the chances of an OOM. It also slows startup times as the the index needs to scan all the stored data to re-create the in-memory index.

Users also frequently write high cardinality tag data by mistake causing the server to crash. When in this state, removing the problem data is very difficult as well.

Desired behavior:

Storing high-cardinality data should not cause the process to OOM or adversely affect startup times. Query performance should not be adversely affected by higher cardinality data as well.

Use case:

It is more natural and convenient to be able to store higher cardinality data at times. For example, some tag data is ephemeral in nature (docker containers IDs), but can contribute to high cardinality data issues over time.

Documentation

benbjohnson · 2016-08-17T22:25:28Z

Proposal added for TSI (Time-Series Index) file format: #7174

jwilder · 2016-08-17T23:17:45Z

Problem statement/requirements docs: #7151

sorrison · 2016-11-08T21:24:21Z

We are getting hit but this pretty hard and am wondering if there is any way we can prevent influx from consuming all ram and then getting killed. Is there some setting we can tweak to help this. I'd be happy with lowering performance if it meant that the service stayed up

VojtechVitek · 2016-11-08T21:27:21Z

For reference:
https://twitter.com/lisiewski/status/793504279063506944

jwilder · 2016-11-08T21:54:53Z

@sorrison 1.1 has a number of memory improvements related to queries, but high memory usage in queries or writes is usually due to schema design issues. Two common problems are querying across too many shards (e.g. shard duration is too low) as well as writing high cardinality tag values and querying too many series at once.

There are a few limits you can enable to prevent high cardinality data from being written or being queried.

In 1.0, there is max-series-per-database which will limit the number of series per database to 1M by default.

[data]
  max-series-per-database = 1000000

In 1.1, there is a max-values-per-tag limit that drops values that would cause the cardinality of any one tag to exceed the limit:

[data]
  max-values-per-tag = 100000

For queries, there are a few others:

[coordinator]
  max-concurrent-queries = 0 # limits the number concurrently running queries
  query-timeout = "0s"  # limits the length of time a query can execute before being killed
  log-queries-after = "0s"  # logs queries that run longer than the threshold
  max-select-point = 0  # kills any queries that too many points
  max-select-series = 0  # kills queries that would involve selecting from too many series at once
  max-select-buckets = 0 # kills queries that would create too many group by buckets

If you are having performance issues, please log a new issue using the instructions for a bug report. In order to help, we need all the information requested in the instructions.

sorrison · 2016-11-08T22:12:30Z

Thanks @jwilder I am currently developing a driver for Gnocchi (part of openstack) https://github.com/openstack/gnocchi and am dealing with a large amount of data. Basically I have lots of metrics going into influx, originally I put each metric into it's own measurement but I wanted to do 3 levels of downsampling so I didn't want to have 3 continuous queries per measurement (we have in the order of 100,000s of metrics).
So now they all go into one measurement with a tag for metric id and I run the continuous queries on the one measurement.

I thought having more tag values would be better than having more continuous queries?

Sorry for putting this all in this bug. Is there a better place to discuss these kind of things? IRC?

Just installed the 1.1 RC and working good so far although it takes about a week for it to die and need restarted at the moment. (We are running on a host with 24 cores and 96G RAM)

ivanscattergood · 2016-11-09T08:49:44Z

@sorrison I tried doing something similar earlier this year with influx. In the end I have grouped together related metrics into separate measurements.

I also moved away from continuous queries and I build the downsampled data at the same time, this seems to work really well.

Although I am still looking forward to the tag index being cached to disk as at the moment I am storing the data over three separate influxdb instances.

carlo-activia · 2016-11-22T19:18:30Z

@ivanscattergood
When you said that you have built the downsampled data at the same time, you mean that you execute a query and then save the aggregated results into a different retention policy. If so, how do you schedule that query?

Thanks in advance.

ivanscattergood · 2016-11-23T00:08:19Z

Hi,

I use a java client to collect the data and I aggregate it within that code.

I save one summary of data every minute and then a summary every hour.

Currently this allows me to visualise 7 million unique series from 3 months down to 1 minute.

We are expecting to treble the amount of data we visualise over the next 3 months.

Ivan

carlo-activia · 2016-11-23T14:28:44Z

Hi Ivan,
Thanks for your quick reply. When the java client collect the data, do you execute just one query to retrieve all data, or do you execute multiple queries?
I have 200K devices (each with 10 metrics), every 5 minutes they collect data for all devices, so every 5 minutes I have 200K data points, each data point have a tag (deviceId), and 10 fields (one for each metric).
If I try to compress data every hour (2.4M data points) using Continuous Query, either it never returns or it might crash the server. I wonder how you get the data using your java client.

Thanks in advance.

ivanscattergood · 2016-11-23T15:41:19Z

Hi,

I cache the data in the java client rather than re-querying the data. I was using an earlier version of Influxdb at the time I made that change (version 0.9) and I did this to work around the DB crashing.

carlo-activia · 2016-11-23T15:52:33Z

I see, so no queries to retrieve the data.
BTW, Are you still using InfluxDB?

Thanks.

ivanscattergood · 2016-11-23T15:57:22Z

Yes still using influxdb

trinitronx · 2017-03-04T00:32:38Z

This appears to be a problem for things such as Heapster (kubernetes-retired/heapster#605) & Kubernetes (kubernetes/kubernetes#27630) metrics which appear to use a lot of tags. Based on the pod memory usage pattern for InfluxDB when running in a Kubernetes cluster with Heapster populating data into InfluxDB, it appears that it begins to use a lot of memory the more activity in the cluster is happening. (Therefore more metrics stored & ephemeral pods are started & stopped creating more tags, using more memory until hitting the OOM limit). At this point Kubernetes shows: Last State: Terminated, Reason: OOMKilled and the pod restarts to enjoy it's next limited lifespan until the next OOMKilled event.

pauldix · 2017-03-04T00:34:03Z

@trinitronx that's one of the key use cases this is designed to support

ivanscattergood · 2017-03-04T13:28:27Z

Do you know when this will be available in nightly builds?

pauldix · 2017-03-05T20:28:38Z

@ivanscattergood there's been significant work on this so hopefully soon. No set date though.

kattmang · 2017-04-22T05:41:22Z

This feature would really help with handling clickstream data :)

rbetts · 2017-05-30T15:21:13Z

Storage and query level support is available in nightly and will be present for opt-in in 1.3.0. There is additional work required to support SHOW commands for high cardinality data and to integrate some enterprise auth features into TSI.

I'm removing this issue from the 1.3.0 milestone and leaving it open for 1.4 / future work where we will finish up the remaining bits and enable TSI by default.

More information on the current state is available on the blog: https://www.influxdata.com/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/

jwilder · 2018-03-23T23:07:34Z

TSI shipped in 1.5. It is not currently enabled by default.

jwilder added area/queries RFC area/tsm kind/feature-request labels Aug 15, 2016

jwilder added this to the 1.1.0 milestone Aug 15, 2016

benbjohnson self-assigned this Aug 15, 2016

pauldix self-assigned this Aug 15, 2016

daviesalex mentioned this issue Aug 15, 2016

Add option max-tags-per-database to limit high cardinality data #7146

Closed

jwilder self-assigned this Aug 15, 2016

This was referenced Aug 17, 2016

TSI Proposal #7173

Closed

TSI Proposal #7174

Closed

jwilder mentioned this issue Aug 17, 2016

Add high cardinality requirements doc #7175

Closed

jsternberg mentioned this issue Aug 25, 2016

Feature: GROUP BY <field> #7200

Closed

sparrc mentioned this issue Sep 7, 2016

Removing mesos_tasks metrics. influxdata/telegraf#1686

Merged

3 tasks

ghost mentioned this issue Sep 21, 2016

cgroups path being parsed as metric influxdata/telegraf#1724

Closed

jwilder modified the milestones: 1.2.0, 1.1.0 Oct 6, 2016

sparrc mentioned this issue Oct 11, 2016

Optional pid as tag influxdata/telegraf#1843

Closed

3 tasks

cxreg mentioned this issue Nov 2, 2016

Allow retention policies to act on cardinality #7569

Closed

e-dard self-assigned this Nov 17, 2016

timhallinflux modified the milestones: 1.3.0, 1.2.0 Dec 19, 2016

pauldix added ready in progress and removed ready in progress labels Jan 25, 2017

desa mentioned this issue Feb 21, 2017

Question: Upper limit for tags #3471

Closed

rbetts removed this from the 1.3.0 milestone May 30, 2017

rbetts added the proposed label May 30, 2017

jwilder removed their assignment Mar 23, 2018

e-dard closed this as completed Mar 23, 2018

ghost removed the proposed label Mar 23, 2018

121watts mentioned this issue Aug 26, 2020

feat: flows index #19448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support High Cardinality Tags and Series #7151

Support High Cardinality Tags and Series #7151

jwilder commented Aug 15, 2016 •

edited

Loading

benbjohnson commented Aug 17, 2016

jwilder commented Aug 17, 2016

sorrison commented Nov 8, 2016

VojtechVitek commented Nov 8, 2016

jwilder commented Nov 8, 2016

sorrison commented Nov 8, 2016 •

edited

Loading

ivanscattergood commented Nov 9, 2016

carlo-activia commented Nov 22, 2016

ivanscattergood commented Nov 23, 2016 •

edited

Loading

carlo-activia commented Nov 23, 2016

ivanscattergood commented Nov 23, 2016 •

edited

Loading

carlo-activia commented Nov 23, 2016

ivanscattergood commented Nov 23, 2016 •

edited

Loading

trinitronx commented Mar 4, 2017 •

edited

Loading

pauldix commented Mar 4, 2017

ivanscattergood commented Mar 4, 2017 via email

pauldix commented Mar 5, 2017

kattmang commented Apr 22, 2017

rbetts commented May 30, 2017

jwilder commented Mar 23, 2018

Support High Cardinality Tags and Series #7151

Support High Cardinality Tags and Series #7151

Comments

jwilder commented Aug 15, 2016 • edited Loading

Feature Request

Documentation

benbjohnson commented Aug 17, 2016

jwilder commented Aug 17, 2016

sorrison commented Nov 8, 2016

VojtechVitek commented Nov 8, 2016

jwilder commented Nov 8, 2016

sorrison commented Nov 8, 2016 • edited Loading

ivanscattergood commented Nov 9, 2016

carlo-activia commented Nov 22, 2016

ivanscattergood commented Nov 23, 2016 • edited Loading

carlo-activia commented Nov 23, 2016

ivanscattergood commented Nov 23, 2016 • edited Loading

carlo-activia commented Nov 23, 2016

ivanscattergood commented Nov 23, 2016 • edited Loading

trinitronx commented Mar 4, 2017 • edited Loading

pauldix commented Mar 4, 2017

ivanscattergood commented Mar 4, 2017 via email

pauldix commented Mar 5, 2017

kattmang commented Apr 22, 2017

rbetts commented May 30, 2017

jwilder commented Mar 23, 2018

jwilder commented Aug 15, 2016 •

edited

Loading

sorrison commented Nov 8, 2016 •

edited

Loading

ivanscattergood commented Nov 23, 2016 •

edited

Loading

ivanscattergood commented Nov 23, 2016 •

edited

Loading

ivanscattergood commented Nov 23, 2016 •

edited

Loading

trinitronx commented Mar 4, 2017 •

edited

Loading