Skip to content

Commit

Permalink
HBASE-26265 Update ref guide to mention the new store file tracker im… (
Browse files Browse the repository at this point in the history
  • Loading branch information
wchevreuil authored and apurtell committed Mar 18, 2022
1 parent 072fcf4 commit f550d0a
Show file tree
Hide file tree
Showing 2 changed files with 146 additions and 0 deletions.
145 changes: 145 additions & 0 deletions src/main/asciidoc/_chapters/store_file_tracking.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
////
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
////
[[storefiletracking]]
= Store File Tracking
:doctype: book
:numbered:
:toc: left
:icons: font
:experimental:
== Overview
This feature introduces an abstraction layer to track store files still used/needed by store
engines, allowing for plugging different approaches of identifying store
files required by the given store.
Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
those files to the actual store directory at operation commit time. That's a simple and convenient
way to separate transient from already finalised files that are ready to serve client reads with data.
This approach works well with strong consistent file systems, but with the popularity of less consistent
file systems, mainly Object Store which can be used like file systems, dependency on atomic rename operations starts to introduce
performance penalties. The Amazon S3 Object Store, in particular, has been the most affected deployment,
due to its lack of atomic renames. The HBase community temporarily bypassed this problem by building a distributed locking layer called HBOSS,
to guarantee atomicity of operations against S3.
With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
commit is delegated to the specific Store File Tracking implementation.
The implementation can be set at the HBase service leve in *hbase-site.xml* or at the
Table or Column Family via the TableDescriptor configuration.

NOTE: When the store file tracking implementation is specified in *hbase_site.xml*, this configuration is also propagated into a tables configuration
at table creation time. This is to avoid dangerous configuration mismatches between processes, which
could potentially lead to data loss.

== Available Implementations

Store File Tracking initial version provides three builtin implementations:

* DEFAULT
* FILE
* MIGRATION

### DEFAULT

As per the name, this is the Store File Tracking implementation used by default when no explicit
configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
directories and renames. This is how all previous (implicit) implementation that HBase used to track store files.

### FILE

A file tracker implementation that creates new files straight in the store directory, avoiding the
need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
store is updated and a new meta file is written with this list contents, discarding the previous
meta file now containing an out dated list.

### MIGRATION

A special implementation to be used when swapping between Store File Tracking implementations on
pre-existing tables that already contain data, and therefore, files being tracked under an specific
logic.

== Usage

For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
to the first hbase start. Omitting this property sets the *DEFAULT* implementation.

For clusters with data that are upgraded to a version of HBase containing the store file tracking
feature, the Store File Tracking implementation can only be changed with the *MIGRATION*
implementation, so that the _new tracker_ can safely build its list of tracked files based on the
list of the _current tracker_.

NOTE: MIGRATION tracker should NOT be set at global configuration. To use it, follow below section
about setting Store File Tacking at Table or Column Family configuration.


### Configuring for Table or Column Family

Setting Store File Tracking configuration globally may not always be possible or desired, for example,
in the case of upgraded clusters with pre-existing user data.
Store File Tracking can be set at Table or Column Family level configuration.
For example, to specify *FILE* implementation in the table configuration at table creation time,
the following should be applied:

----
create 'my-table', 'f1', 'f2', {CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}}
----

To define *FILE* for an specific Column Family:

----
create 'my-table', {NAME=> '1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}}
----

### Switching trackers at Table or Column Family

A very common scenario is to set Store File Tracking on pre-existing HBase deployments that have
been upgraded to a version that supports this feature. To apply the FILE tracker, tables effectively
need to be migrated from the DEFAULT tracker to the FILE tracker. As explained previously, such
process requires the usage of the special MIGRATION tracker implementation, which can only be
specified at table or Column Family level.

For example, to switch _tracker_ from *DEFAULT* to *FILE* in a table configuration:

----
alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION',
'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT',
'hbase.store.file-tracker.migration.dst.impl' => 'FILE'}
----

To apply similar switch at column family level configuration:

----
alter 'my-table', {NAME => 'f1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION',
'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT',
'hbase.store.file-tracker.migration.dst.impl' => 'FILE'}}
----

Once all table regions have been onlined again, don't forget to disable MIGRATION, by now setting
*hbase.store.file-tracker.migration.dst.impl* value as the *hbase.store.file-tracker.impl*. In the above
example, that would be as follows:

----
alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}
----
1 change: 1 addition & 0 deletions src/main/asciidoc/book.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ include::_chapters/zookeeper.adoc[]
include::_chapters/community.adoc[]
include::_chapters/hbtop.adoc[]
include::_chapters/tracing.adoc[]
include::_chapters/store_file_tracking.adoc[]
= Appendix
Expand Down

0 comments on commit f550d0a

Please sign in to comment.