Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildtsi: Do not escape measurement names #9921

Merged
merged 1 commit into from
Jun 1, 2018
Merged

buildtsi: Do not escape measurement names #9921

merged 1 commit into from
Jun 1, 2018

Conversation

jacobmarble
Copy link
Member

@jacobmarble jacobmarble commented May 29, 2018

When influx_inspect buildtsi is used to create a new tsi1 index, spaces in measurement names are escaped, so measurement "a b" is changed to "a\ b".

This change modifies models.ParseKeyBytes() and models.ParseName() to unescape measurement names. models.ParseKeyBytes() returns unescaped tag keys, so this seems like the natural place to unescape measurement names.

Also followed scanMeasurement() to see what other code could be problematic, and this should be everything (the result of one other use of scanMeasurement() is later escaped).

Removed tsdb.MeasurementFromSeriesKey(). These methods are exported, so checked for side effects in other InfluxData repositories.

After this change, existing TSI indexes that include backslashed measurements created by this bug will need to be rebuilt with influx_inspect buildtsi. This consolidates measurements "a b" and "a\ b" into just "a b".

Fixes #9904

@jacobmarble jacobmarble requested a review from e-dard May 29, 2018 23:36
@ghost ghost assigned jacobmarble May 29, 2018
@ghost ghost added the review label May 29, 2018
@ghost ghost added the review label May 29, 2018
@jacobmarble jacobmarble force-pushed the jgm-escape branch 2 times, most recently from 4f39921 to aac70ba Compare May 30, 2018 00:00
Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think on this a bit... I'm concerned it could be a breaking change because we call into ParseKeyBytes from the Engine and Index when deleting/dropping series.

Since this change would alter the behaviour of ParseKeyBytes, we might find that users are no longer able to delete certain series that they have stored.

For example, the TSI index might currently store series data under the cp\ u measurement. When dropping the series cp\ u,server=a series we currently would check if cp\ u,server=a was the last series for the measurement cp\ u in the index.

With this proposed change though, the index will look to see if cp\ u,server=a is the last series for the cp u measurement.

As a secondary point—I'm not sure how this change would affect the series writer in the importer tool @tmgordeeva and @stuartcarnie worked on.

When `influx_inspect buildtsi` is used to create a new `tsi1` index, spaces in measurement names are escaped, so measurement "a b" is changed to "a\ b".

This change modifies `models.ParseKeyBytes()` and `models.ParseName()` to unescape measurement names. `models.ParseKeyBytes()` returns unescaped tag keys, so this seems like the natural place to unescape measurement names.

Also followed `scanMeasurement()` to see what other code could be problematic, and this should be everything (the result of one other use of `scanMeasurement()` is later escaped).

Removed `tsdb.MeasurementFromSeriesKey()`. These methods are exported, so checked for side effects in other InfluxData repositories.
@jacobmarble
Copy link
Member Author

jacobmarble commented May 31, 2018

Identified any side effects of this change.

influx-tools import (master)

  • seriesWriter.AddSeries calls models.ParseKeyBytes
  • works properly before and after change

tsm1.Engine.deleteSeriesRange calls models.ParseKeyBytes

  • called by delete from "a b"
  • called by drop series from "a b"
  • TSI - Index.DropSeries
    • called by Engine.deleteSeriesRange, but never hit when name contains space because of the current bug, but the input arg is exactly the same pointer as the call from
      models.ParseKeyBytes
  • broken before change
    • calls ParseKeyBytes with escaped measurement name
    • Returned name used to call SeriesFile.SeriesID, still escaped
    • this calls AppendSeriesKey with escaped name
    • SeriesFile.SeriesID can't find the series, so returns zero
    • show measurements still lists the relevant measurement
    • FIXED

influx_inspect buildtsi - processShard AND processTSMFile call models.ParseKey

  • processShard calls ParseKey to populate index from WAL/cache, uses returned escaped name to call TSI Index.CreateSeriesIfNotExists
  • processTSMFile calls ParseKey to populate index from TSM, uses returned escaped name exactly as processShard
  • resulting measurement name is escaped
  • FIXED

influx_inspect deletetsm (master) - Command.process calls ParseKey

  • influx_inspect deletetsm -measurement "a b" ~/.influxdb/data/db/autogen/2/000000001-000000001.tsm
  • measurement not removed
  • influx_inspect deletetsm -measurement "a\ b" ~/.influxdb/data/db/autogen/2/000000001-000000001.tsm (notice one backslash)
  • measurement removed
  • influx_inspect deletetsm -measurement "a\\ b" ~/.influxdb/data/db/autogen/2/000000001-000000001.tsm (notice two backslashes)
  • measurement removed
  • FIXED (one backslash and two backslashes do not work; zero backslashes works)

influx_inspect report - Command.Run calls models.ParseKey

  • influx_inspect report -detailed ~/.influxdb/data/db/autogen/1/
  • escaped measurement names are printed
  • FIXED

tsm1.Engine.createVarRefSeriesIterator calls models.ParseKey

  • measurement name thrown away
  • nothing to do

tsdb.MeasurementFromSeriesKey calls models.ParseName

  • unescapes the result of ParseName
  • only called from Engine.addToIndexFromKey, so safe to refactor
  • REFACTORED

show series shows escaped measurement names

  • working as intended - series keys have escaped measurement names
  • NOT FIXED

@jacobmarble
Copy link
Member Author

Spent some more time confirming each of the "FIXED" cases in my last comment, discovered a bug in deletetsm, but otherwise everything checks out.

Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants