Async Commits #108

astigsen · 2013-06-25T18:36:32Z

This is the initial implementation of async commits. It is still very rough, so it is being put up so that we can discuss the approach.

The core idea is all processes that open the db, can commit, but only to memory. Then we start a separate process that keeps a read lock to ensure that no data that has not been committed to disk get overwritten. It then watches for changes to the in-memory mapping, and syncs them to disk as they come up (so we get continuous writing to disk).

During the commit it holds a second read lock on the version it is committing, and when done it can use this to replace the old lock as this no longer has to be protected. So in this way we can leapfrog ahead, always ensuring that no corruption can happen to the on-disk representation.

The main point of contention is how we start the backend async_commit process. Right now I have implemented it as a straight fork from the process that first opens the db, but that has the following potential problems:

Process name: The backend process will have the same name as the parent, which could confuse users when they look for it with ps (or Activity Monitor on mac).
It pulls in a lot of state from the parent process, could potentially give problems at shutdown?

The alternative could be to exec a separate executable as the backend. That would fix the two above problems, but might introduce its own problems (primarily how to ensure that you can always find it in a multiuser environment).

astigsen · 2013-06-25T19:56:14Z

One further process with forking from the main process (which will also be there if we follow with an exec), is that we may leave zombie processes around. To avoid the child process hanging around, the parent will have to handle the SIGCHLD signal. But since the parent process is the users and not ours, it might be quite problematic to inject a signal handler (without conflicting with handlers set by the user).

Any of you guy who have experience with this?

kspangsege · 2013-06-26T00:28:43Z

Why is it necessary to spawn a new process? Would a thread not suffice?

I do have a lot of experience with forking and signalling on Linux (POSIX) systems. I fear that it is going to be hard to achieve any level of robustness.

Is forking available on Windows? Even if it is, it probably has quite different semantics than on Linux and Darwin.

Forking without exec'ing seems like a bad idea to me, especially considering the fact that we are a library that becomes part of an unknown application.

Forking will wreck havoc with allocated resources other than memory, unless it can be guaranteed that the forking function never returns in the child (not even exceptionally), nor exits via std::exit() or any other way that involves resource cleanup. In some cases a process is terminated by an unhandled exception. It must be ensured that such a termination handler performs no resource cleanup. This is further complicated by the fact that the unknown application is free to install any termination handler for uncaught exceptions.

Even if we can control all these factors, forking may still clone a huge amount of memory pages depending on the application, which will be effectively copied as the parent process modifies its memory.

Signalling is an inherently unpleasant business.

kspangsege · 2013-06-26T00:44:27Z

Spawning a thread instead of a process has the disadvantage of not being able to continue after the application terminates.

Another disadvantage is that we have to be careful about thread safety issues.

Besides that, as far as I can see, using a thread is possible, but still hard. We have to carefully consider how to behave at application termination time.

kspangsege · 2013-06-26T01:15:24Z

I wonder whether it would be better to implement a form of "group commit".

"Group commit" is about detecting multiple overlapping transactions and finalizing all of them with a single flush to disk.

One approach to "group commit" would be as follows:

Process A begins a write transaction.
Process B begins a write transaction.
Process A ends its transaction, but since there is another transaction i progress, it waits.
Process B ends its transaction, and since there are other transaction in progress, it flushes to disk.
Process A sees the flush, and returns to the caller.

One could add a timeout to the wait with no extra complication.

Another option is to wait a little bit as part of the ending of a transaction, to see if a new transaction comes along.

This scheme requires no autonomous processes or threads.

"Group commit" is supported by many databases including MySQL.

kspangsege · 2013-06-26T01:19:28Z

Note that group commit does in no way lower the durability of transactions compared to regular ones.

kspangsege · 2013-06-26T01:38:35Z

With respect to async commit, I can say with absolute certainty, that if I was a customer, I would prefer to start a daemon process manually, rather than have the library start processes by itself.

astigsen · 2013-06-26T03:58:06Z

Since we don't know the number of processes that will use the db, nor when or the order in which they will close down, we probably have no choice but making it a separate process. If we did it with threads, we would very quickly get into complicated handover semantics when programs closed while other were using the same db.'

exec'ing a new program is probably best, to avoid all the stuff being pulled in by fork. That will also make it more consistent with windows, where you do not have fork, but can start new processes.

One problem with group (or delayed) commits, is that you will still get a stall of the entire thread when it actually have to write to disk (now it is just unpredictable when it happens).

I am not sure that it would be a good idea to let the user start the daemon manually. What if he first has to open the db (or maybe several db's) when the app has been running for some time? Then he is the one who has to learn about forking and exec'ing and handling signals. Adding a whole new layer of complexity to his app.

astigsen · 2013-06-26T23:32:22Z

Kristian, do you have any idea of how to avoid zombie processes when you can not add signal handlers to the parent process?

kspangsege · 2013-06-27T00:16:10Z

It is possible to detach completely from the parent process. This is a
standard part of becoming what is known as a daemon process.

Richard Stevens has good information on this subject.

Here is the standard procedure from becoming a daemon process:

// A process that wishes to become a daemon can call this function to
// detach completely from the invoking environment
void daemon_init()
{
errno = 0;
int m = sysconf(_SC_OPEN_MAX);
if (m < 0) {
if (errno) {
int err = errno; // Avoid clobbering
throw runtime_error("'sysconf(_SC_OPEN_MAX)' failed:
"+error_to_string(err));
}
throw runtime_error("'sysconf(_SC_OPEN_MAX)' failed: It's
indeterminate");
}
pid_t pid = fork();
if (pid < 0) {
int err = errno; // Avoid clobbering
throw runtime_error("'fork' failed: "+error_to_string(err));
}
if (0 < pid) _exit(0); // Quit if parent
setsid(); // Detach from session (signal handling)
chdir("/");
umask(S_IWGRP|S_IWOTH);
// Close any inherited file descriptors
for (int i=0; i<m; ++i) close(i);
}

The easy and safe thing to do, is to call this function after doing the
fork+exec that you already want to do, however, it will almost certainly be
possible to merge the two forks. The order of things are important though,
so you have to be careful.

I have to say, though, that I do not approve of the idea of having the
library spawn processes. If it was up to me, it would be an absolute NO. As
soon as additional processes are involved, serious customers will want to
have full control over when and how such processes run.

We should in my opinion offer functionality that gives the customer full
flexibility with respect to how to spawn the async committer process. We
could provide both a library function as well as a stand-alone executable
daemon process launcher.

On Thu, Jun 27, 2013 at 1:32 AM, astigsen [email protected] wrote:

Kristian, do you have any idea of how to avoid zombie processes when you
can not add signal handlers to the parent process?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/108#issuecomment-20087720
.

kspangsege · 2013-06-27T00:27:56Z

It turns out that you cannot merge the two forks!

A little elaboration:

The main purpose of the daemonizer fork() is to detach from the parent
process (becoming an orphan process). This effectively happens when the
parent exits, and it can only happen this way. that is why you need a
double fork.

An orphan process can not become a zombie.

Another thing of major importance is to detach from the controlling
terminal (setsid()). It is intricate, and I do not have all the details in
fresh memory.

On Thu, Jun 27, 2013 at 2:16 AM, Kristian Spangsege [email protected] wrote:

It is possible to detach completely from the parent process. This is a
standard part of becoming what is known as a daemon process.

Richard Stevens has good information on this subject.

Here is the standard procedure from becoming a daemon process:

// A process that wishes to become a daemon can call this function to
// detach completely from the invoking environment
void daemon_init()
{
errno = 0;
int m = sysconf(_SC_OPEN_MAX);
if (m < 0) {
if (errno) {
int err = errno; // Avoid clobbering
throw runtime_error("'sysconf(_SC_OPEN_MAX)' failed:
"+error_to_string(err));
}
throw runtime_error("'sysconf(_SC_OPEN_MAX)' failed: It's
indeterminate");
}
pid_t pid = fork();
if (pid < 0) {
int err = errno; // Avoid clobbering
throw runtime_error("'fork' failed: "+error_to_string(err));
}
if (0 < pid) _exit(0); // Quit if parent
setsid(); // Detach from session (signal handling)
chdir("/");
umask(S_IWGRP|S_IWOTH);
// Close any inherited file descriptors
for (int i=0; i<m; ++i) close(i);
}

The easy and safe thing to do, is to call this function after doing the
fork+exec that you already want to do, however, it will almost certainly be
possible to merge the two forks. The order of things are important though,
so you have to be careful.

I have to say, though, that I do not approve of the idea of having the
library spawn processes. If it was up to me, it would be an absolute NO. As
soon as additional processes are involved, serious customers will want to
have full control over when and how such processes run.

We should in my opinion offer functionality that gives the customer full
flexibility with respect to how to spawn the async committer process. We
could provide both a library function as well as a stand-alone executable
daemon process launcher.

On Thu, Jun 27, 2013 at 1:32 AM, astigsen [email protected]:

Kristian, do you have any idea of how to avoid zombie processes when you
can not add signal handlers to the parent process?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/108#issuecomment-20087720
.

astigsen · 2013-06-27T00:29:07Z

Do you have any suggestions about how it should work if the daemon should be manually launched?

From my perspective it just introduces a whole host of coordination problems. How do you know that someone has not already started one for the same db? If you do detect that there already is a daemon running, how do you avoid it closing down on you? What happens if you open the db and no daemon is running?

astigsen · 2013-06-27T00:30:53Z

With the double fork approach, won't the middle process still be a zombie, since its parent is still alive?

kspangsege · 2013-06-27T00:36:59Z

No, the middle process will not become a zombie as long as the first fork
does a waitpid() on the child process. It must do that.

On Thu, Jun 27, 2013 at 2:30 AM, astigsen [email protected] wrote:

With the double fork approach, won't the middle process still be a zombie,
since its parent is still alive?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/108#issuecomment-20089991
.

kspangsege · 2013-06-27T00:38:00Z

That is, the parent of the first fork must do a waitpid() on the child.
Since that child terminates immediately, the wait will be short.

On Thu, Jun 27, 2013 at 2:36 AM, Kristian Spangsege [email protected] wrote:

No, the middle process will not become a zombie as long as the first fork
does a waitpid() on the child process. It must do that.

On Thu, Jun 27, 2013 at 2:30 AM, astigsen [email protected]:

With the double fork approach, won't the middle process still be a
zombie, since its parent is still alive?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/108#issuecomment-20089991
.

kspangsege · 2013-06-27T00:50:53Z

I believe it is possible to get a manually launched daemon process to work
well. It is a very widely used scheme on UNIX, and there are lots of
standard ways of launching and managing them.

Of course, if the daemon isn't running, there will be no one to synch the
changes to disk.

Rather that having a separate daemon process for each database file, it
would probably be better to have one that manages several database files.
In that case, there has to be a way of submitting new files to it. There
are many ways it can be done. One would be to tell the daemon process to
look for any database file in an any of a set of directories. It can then
use inotify (or similar) to discover appearing files.

On Thu, Jun 27, 2013 at 2:37 AM, Kristian Spangsege [email protected] wrote:

That is, the parent of the first fork must do a waitpid() on the child.
Since that child terminates immediately, the wait will be short.

On Thu, Jun 27, 2013 at 2:36 AM, Kristian Spangsege [email protected]:

No, the middle process will not become a zombie as long as the first fork
does a waitpid() on the child process. It must do that.

On Thu, Jun 27, 2013 at 2:30 AM, astigsen [email protected]:

With the double fork approach, won't the middle process still be a
zombie, since its parent is still alive?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/108#issuecomment-20089991
.

kspangsege · 2013-06-27T00:56:11Z

I suspect that in some cases, the customer would rather want to develop
their own asynchronously executing thread/process and just call a function
in our library to do the disk flushing.

On Thu, Jun 27, 2013 at 2:50 AM, Kristian Spangsege [email protected] wrote:

I believe it is possible to get a manually launched daemon process to work
well. It is a very widely used scheme on UNIX, and there are lots of
standard ways of launching and managing them.

Of course, if the daemon isn't running, there will be no one to synch the
changes to disk.

Rather that having a separate daemon process for each database file, it
would probably be better to have one that manages several database files.
In that case, there has to be a way of submitting new files to it. There
are many ways it can be done. One would be to tell the daemon process to
look for any database file in an any of a set of directories. It can then
use inotify (or similar) to discover appearing files.

On Thu, Jun 27, 2013 at 2:37 AM, Kristian Spangsege [email protected]:

That is, the parent of the first fork must do a waitpid() on the child.
Since that child terminates immediately, the wait will be short.

On Thu, Jun 27, 2013 at 2:36 AM, Kristian Spangsege [email protected]:

No, the middle process will not become a zombie as long as the first
fork does a waitpid() on the child process. It must do that.

On Thu, Jun 27, 2013 at 2:30 AM, astigsen [email protected]:

With the double fork approach, won't the middle process still be a
zombie, since its parent is still alive?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/108#issuecomment-20089991
.

finnschiermer · 2013-08-02T07:48:01Z

How about providing an interface through which the library can request a thread from the client. So if a client wants to use async commits he registers a delegate which is invoked on the first use of async commit. The delegate creates the thread and the thread enters the library. We use it to do async work, and when we don't need it anymore, we return. The client does the actual creation and deletion of the thread. When a client shut down, it must inform the library, and wait for any async work to complete.

finnschiermer · 2013-08-12T13:10:36Z

I think we should get rid of the use of file locking. Kristian and I discussed it, and I agree with him, that the right approach is to build a daemon the traditional unix way. This has the drawback of potentially leaving behind the .lock file in some situations. But it is an established approach and it does not need file locking. To get rid of file locking even in the synchronous case, we could consider always using the daemon and just having the client wait for it.

astigsen · 2013-08-12T20:04:59Z

I am very reluctant to start adding a daemon processes to the mix. One of the key selling points for embedded databases is that they require no setup or administration:

"There are advantages and disadvantages to being serverless. The main advantage is that there is no separate server process to install, setup, configure, initialize, manage, and troubleshoot. This is one reason why SQLite is a "zero-configuration" database engine. Programs that use SQLite require no administrative support for setting up the database engine before they are run. Any program that is able to access the disk is able to use an SQLite database." - http://www.sqlite.org/serverless.html (note that sqlite also uses file locks for coordination)

Daemon processes are fine for databases that run all the time, but embedded databases may be used in short discrete periods of time. Imagine a user installing an application that he runs once a week, or maybe tries once and never runs again. Should the daemon just keep running in the background none withstanding? And what when the app is uninstalled? How do you know if it is ok to shut down the daemon (there may or may not be other apps using tightdb)?

And then we have the permission problem. What if the user does not have root access? We could have one daemon per user, but what then when two users start working with the same file? Whose daemon should handle that and how should they coordinate?

If we choose to go the daemon route, we should have some really good answers for these kinds of questions.

finnschiermer · 2013-08-13T08:47:12Z

I'll try to provide some answers to the questions, which i think mostly arise because I was being imprecise, I'm actually NOT proposing a really classic unix daemon, although I admit writing just that :-(

The idea is to start one daemon per database file. The daemon can be started by the library as part of creating
as SharedGroup. No config is needed for a default setup, and the user need not "install" a daemon through init scripts or something like that. I agree that this makes the daemon somewhat less a daemon.

I don't see the permission problem:
The daemon is spawned from the process opening the database file and runs with the same access rights. Root access is not required - during its lifetime it only accesses the database file and the "lock" file. The daemon will detach to survive the exit of the parent process.

about shutdown:
I think we can arrange for the daemon to shut itself down following a period of no activity. The library will just spawn it again when/if needed.

New BPlusTree implementation

Initial implementation of async commits

db10ce5

finnschiermer merged commit db10ce5 into realm:master Sep 24, 2013

tgoyne pushed a commit that referenced this pull request Jul 11, 2018

Merge pull request #108 from realm/new-bplus-tree

b9ab5fe

New BPlusTree implementation

gvravikumar mentioned this pull request Oct 30, 2023

Failure to open a Realm file on Android 14 #7037

Open

sync-by-unito bot mentioned this pull request Feb 16, 2024

App crashing when starting Realm: [realm-core-11.0.2] Assertion failed: r == 0 && "File::unlock() " ... #4777

Closed

github-actions bot locked as resolved and limited conversation to collaborators Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async Commits #108

Async Commits #108

astigsen commented Jun 25, 2013

astigsen commented Jun 25, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

astigsen commented Jun 26, 2013

astigsen commented Jun 26, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

astigsen commented Jun 27, 2013

astigsen commented Jun 27, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

finnschiermer commented Aug 2, 2013

finnschiermer commented Aug 12, 2013

astigsen commented Aug 12, 2013

finnschiermer commented Aug 13, 2013

Async Commits #108

Async Commits #108

Conversation

astigsen commented Jun 25, 2013

astigsen commented Jun 25, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

kspangsege commented Jun 26, 2013

astigsen commented Jun 26, 2013

astigsen commented Jun 26, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

astigsen commented Jun 27, 2013

astigsen commented Jun 27, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

kspangsege commented Jun 27, 2013

finnschiermer commented Aug 2, 2013

finnschiermer commented Aug 12, 2013

astigsen commented Aug 12, 2013

finnschiermer commented Aug 13, 2013