Skip to content
This repository has been archived by the owner on Jun 12, 2020. It is now read-only.

TokuDB Files and File Descriptors

Rik Prohaska edited this page Apr 24, 2015 · 6 revisions

Most operating systems impose an upper bound on the number of file descriptors that a process may have open. Usually, the default values for the MySQL server open file limit and the number of open MySQL tables works fine. However, there are cases where the default values do not work and one needs to understand how TokuDB uses file descriptors in more detail so that one can configure resources appropriately.

This document describes how TokuDB uses file descriptors. There are a fixed set of files that are opened when the TokuDB storage engine is initialized. These files are described first. The bulk of the file descriptors are used when TokuDB tables are opened. The file descriptors related to MySQL tables are described next. Finally, TokuDB uses a bulk loader for some operations, and the bulk loader uses temporary files while it is running.

TokuDB Storage Engine Files

TokuDB uses a handful of files to lock various directories. These files have names like 'tokudb_lock_dont_delete_me'. These files are opened and locked when the TokuDB storage engine is initialized, and closed when the TokuDB storage engine is terminated. The intent of these files is to provide exclusive access to the various directories used by TokuDB.

The TokuDB rollback log for small transactions resides in memory. However, if the rollback log gets too big, it is spilled to the 'tokudb.rollback' file. The 'tokudb.rollback' file is opened when the TokuDB storage engine is initialized.

The TokuDB recovery log stores the transactional state of the active transactions and is used to recover from crashes. The recovery log is implemented as a sequence of files with names like 'log*.tokulog*'. Usually, only the most recent log file is opened by TokuDB.

The TokuDB directory is used to map internal fractal tree names to the file used to store the fractal tree. The 'tokudb.directory' is opened when the TokuDB storage engine is initialized.

The 'tokudb.environment' file stores version information in a special fractal tree that is used when the fractal tree software is upgraded.

TokuDB Tables

Each TokuDB table is mapped onto multiple fractal trees. These files are stored in the MySQL data directory unless the 'tokudb_data_dir' MySQL variables says to store them somewhere else. Each of the fractal tree files is suffixed with '.tokudb'.

One fractal tree stores the meta-data for the table in the table's 'status' fractal tree. For example, if there is an auto increment column defined for the table, then the current value is stored in this fractal tree. A copy of the table's schema (frm data) is also stored in the 'status' fractal tree. The last computed cardinality is also stored in the 'status' fractal tree. There are other rows in the status fractal tree, but we won't talk about them here.

Each key defined for the table is stored in a fractal tree. The fractal tree for the primary key is stored in the 'main' fractal tree. If the table does not define a primary key, then TokuDB manufactures a hidden primary key and uses it as the primary key. This hidden primary key is never seen outside of the TokuDB storage engine.

Each secondary key is stored in its 'key' fractal tree.

The current TokuDB software stores each fractal tree in its own file. One can imagine storing multiple fractal trees in the same file, or storing a single fractal tree in multiple files. We might do something more complicated than storing a single fractal tree in a single file someday, but for today's discussion, there is one file for each fractal tree.

For a table with a primary key and no other secondary keys, 2 file descriptors are used when the table is opened. For a table with 'F' keys including the primary key, 'F+1' file descriptors are used when the table is opened; 'F 'file descriptors for the key fractal tree files including the primary key and 1 file descriptor for the status fractal tree.

When MySQL opens a table, all of the fractal tree files that store the fractal trees for the table plus the status fractal tree file are opened. All of the MySQL clients that open the same table share these open file descriptors.

When the last MySQL client closes a table, TokuDB closes the fractal tree files for that table.

When MySQL reaches it limit on the number of open tables, it picks one to close.

Partitioned TokuDB Tables

A partitioned TokuDB table consists of a TokuDB table for each partition. If there are 'P' partitions, then 'P' times 'F+1' files are needed to store the partition. Since MySQL opens all of the partitions when the table is opened, 'P' times 'F+1' file descriptors are used.

For a partitioned TokuDB table with 1000 partitions and 9 keys, TokuDB will use 10,000 file descriptors when the table is opened by MySQL.

TokuDB Bulk Loader

The TokuDB bulk loader is used when inserting data into an empty table, or when creating an index on an existing table. There are additional use cases. The TokuDB bulk loader stores data in temporary files. The rows in these temporary files are sorted using an in memory sort. When all of the rows have been handed to TokuDB, the TokuDB bulk loader does a multi-phase merge sort on these temporary files. The maximum number of open files occurs during the merge phase of temporary files. This number depends on the number of temporary files and the amount of memory used by the bulk loader. Suffice it to say that each bulk loader instance uses a handful of file descriptors when merging some of its temporary files.

Note that there may be multiple bulk load operations in progress concurrently.

Recommendation

Running TokuDB near the open file limit is not a good idea. Bulk load operations will fail when they can not create or open a file. This is especially disappointing after waiting several hours for a really big database to be loaded. In some other cases, TokuDB will crash since it was not designed to run in a process in which file descriptors are a scarce resource.

Clone this wiki locally