Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction fixes #6653

Merged
merged 8 commits into from
May 18, 2016
Merged

Compaction fixes #6653

merged 8 commits into from
May 18, 2016

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented May 17, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

This PR re-introduces the fix for #6556 that was reverted in #6637. The original fix could drop points during compactions as well as cause an interface conversion panic (#6652) in some cases. What happened was that merged values from one series could get mixed with the values for the next series. If the series were different types, a panic could occur. If it was the last block to be merged, the values would be lost. Both of these issues should be fixed now as well as the original issue of overwriting a point in a large series triggering memory spikes.

This also fixes #6406 that can occur if enough points are written into a single series to cause the TSM index to exceed the max index entries allotment. It will now handle that error and rollover to a new file.

@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @joelegasse to be a potential reviewer

@jwilder
Copy link
Contributor Author

jwilder commented May 17, 2016

@aderumier @daviesalex @easyrasta if you have a dataset that can easily reproduce #6652, would you be able to test this PR on it?

@jwilder jwilder force-pushed the jw-compact-fix branch 6 times, most recently from e80dcf5 to 0b25541 Compare May 18, 2016 00:15
@jwilder
Copy link
Contributor Author

jwilder commented May 18, 2016

@e-dard @benbjohnson

@daviesalex
Copy link
Contributor

daviesalex commented May 18, 2016

So this fixed the panic, but not the missing data.

The startup was very fast, but much of our data (most really) only exists from midnight on the 16th (not all , though). See example data:

image

FWIW, during the startup, most of the time (by a long way) is spent in sync/atomic.AddUint32. This compares to runtime.mapiternext and runtime.indexbytebody in 0.13 stable. I submitted #6658 to track this:

Samples: 4M of event 'cycles', Event count (approx.): 430757131359
  28.53%  influxd                       [.] sync/atomic.AddUint32
  11.64%  influxd                       [.] strings.Index
   5.80%  influxd                       [.] scanblock
   4.72%  [kernel]                      [k] _raw_spin_lock
   4.60%  influxd                       [.] runtime.MSpan_Sweep
   2.33%  influxd                       [.] github.com/influxdata/influxdb/tsdb.(*DatabaseIndex).Series
   1.82%  influxd                       [.] runtime.memmove
   1.76%  influxd                       [.] runtime.aeshashbody
   1.54%  influxd                       [.] runtime.memeqbody
   1.52%  [kernel]                      [k] down_read_trylock
   1.39%  influxd                       [.] runtime.mallocgc
   1.34%  influxd                       [.] runtime.mapaccess1_faststr
   1.22%  influxd                       [.] runtime.deferreturn
   1.18%  influxd                       [.] runtime.newdefer
   1.14%  influxd                       [.] runtime.writebarrierptr
   1.10%  influxd                       [.] runtime.cas64
   1.02%  influxd                       [.] runtime.xchg
   0.95%  influxd                       [.] runtime.readvarint
   0.89%  influxd                       [.] github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).WalkKeys
   0.82%  [kernel]                      [k] page_fault
   0.80%  influxd                       [.] github.com/influxdata/influxdb/tsdb/engine/tsm1.(*indirectIndex).KeyAt
   0.78%  [kernel]                      [k] up_read
   0.76%  influxd                       [.] runtime.findfunc
   0.74%  influxd                       [.] runtime.gentraceback
   0.73%  influxd                       [.] runtime.releasem
   0.69%  influxd                       [.] getfull
   0.66%  influxd                       [.] runtime.freedefer
   0.55%  influxd                       [.] runtime.acquirem
   0.53%  influxd                       [.] runtime.atomicload64
   0.53%  influxd                       [.] runtime.memclr

@daviesalex
Copy link
Contributor

Log showing the start with this PR (branch unknown in log, but it was built from branch jw-compact-fix - 0b2554) that works but is missing most data, and then a start with the 0.13 stable release at https://gist.github.com/daviesalex/f9e70754fb2c0ac85454b6db341b34a4 which works and has the data.

return err
}

return ErrMaxBlocksExceeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant since there is an err != nil check below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the redundant return but I think specifying ErrMaxBlocksExceeded twice is redundant. You can just do a return err instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, stylistically I prefer grouping the entire thing together so the err is confined to a specific block:

if err := w.WriteBlock(key, minTime, maxTime, block); err == ErrMaxBlocksExceeded {
    if err := w.WriteIndex(); err != nil {
        return err
    }
    return ErrMaxBlocksExceeded
} else if err != nil {
    return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/return ErrMaxBlocksExceeded/return err though 😉

@e-dard
Copy link
Contributor

e-dard commented May 18, 2016

Just a nit, but LGTM 👍

@aderumier
Copy link
Contributor

@jwilder It's starting fine with this PR. (still around 10min of startup time)

// Filter out only the values for overlapping block
v = Values(v).Include(first.minTime, first.maxTime)
if len(v) > 0 {
// Recoder that we read a subset of the block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to say, "Record"?

@benbjohnson
Copy link
Contributor

Minor nits but lgtm 👍

@jwilder
Copy link
Contributor Author

jwilder commented May 18, 2016

@daviesalex With the same dataset/host, does 0.13 load all the data? I'm not sure why you would be missing data using this branch as it does not have any changes related to startup.

@jwilder
Copy link
Contributor Author

jwilder commented May 18, 2016

Also, were there any errors in the logs?

@daviesalex
Copy link
Contributor

@jwilder yes. 0.13 loads all the data. This branch does not (nor does the nightly).

The full log (a start under this version, which fails to load all the data, and 0.13, which does) is in the gist above - https://gist.github.com/daviesalex/f9e70754fb2c0ac85454b6db341b34a4

@jwilder
Copy link
Contributor Author

jwilder commented May 18, 2016

@daviesalex If the current nightly has missing data, then it's not caused by this PR. Any chance you could git bisect 0.13 and master to see if you can find the commit that is introduced the issue? I suspect 0dbd489 might have introduced the bug.

Would you be able to test that commit and just before it?

@jwilder
Copy link
Contributor Author

jwilder commented May 18, 2016

I was able to reproduce the missing data on startup. It is due to 0dbd489#diff-82e4e65bd72f00208a49c4797d466ecdR383.

@jwilder jwilder force-pushed the jw-compact-fix branch 2 times, most recently from 3fa355e to b5cfb0e Compare May 18, 2016 17:37
@jwilder
Copy link
Contributor Author

jwilder commented May 18, 2016

@daviesalex I pushed a fix for the data not reloading on this branch. Let me know if that resolves it for you.

@daviesalex
Copy link
Contributor

daviesalex commented May 18, 2016

@jwilder good news. Confirmed that data is there.

[run] 2016/05/18 13:12:08 InfluxDB starting, version unknown, branch unknown, commit unknown
[run] 2016/05/18 13:28:48 Listening for signals

Startup down from about 55 minutes to about 15. I tested on two machines and saw the same just over 3x speedup.

Great success! Thank you.

If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.

Fixes #6557
jwilder added 7 commits May 18, 2016 15:25
Switched the max keys test to write int64 of the same value so RLE
would kick in and the file size will be smaller (84MB vs 3.8MB).

Removed the chunking test which was skipped because the code will
not downsize a block into smaller chunks now.

Skip MaxKeys tests in various environments because it needs to
write too much data to run reliably.
* tsm1 engine test is no longer needed as it's the default now
* go1.6 build from git is not needed as we build with go1.6.2 released
version now.
The optimization to speed up shard loading had the side effect of
skipping adding series to the index that already exist.  The skipping
was in the wrong location and also skipped the shards measurementFields
index which is required in order to query that series in the shard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After intense write load: "error compacting TSM files: ... exceeds max index entries"
6 participants