Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IsMutable method back to Configurable class #8928

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mrambacher
Copy link
Contributor

This method allows an object to state that it can no longer be "changed", meaning any non-mutable options can no longer be updated. There are a couple of use cases where this could be important (some current, some future):

  • Given a ColumnFamilyOptions used to create/open a database, the settings inside the TableFactory could be updated outside of the database. This would result in immutable changes potentially being applied to a database.
  • Given a Cache, the cache settings could be updated while the cache is active outside of any cache mutex. (Note that since the Cache is not yet Customizable, this is a future scenario).

The default implementation of IsMutable is lightweight, only checking if the class has options. The TableFactory implementation uses a bool/mutex to get/set/protect the mutability of an object. Classes like Cache may be able to be "configured and prepared" in one step, eliminating the need for an additional mutex (still TBD).

@pdillinger
Copy link
Contributor

This method allows an object to state that it can no longer be "changed", meaning any non-mutable options can no longer be updated. There are a couple of use cases where this could be important (some current, some future):

  • Given a ColumnFamilyOptions used to create/open a database, the settings inside the TableFactory could be updated outside of the database. This would result in immutable changes potentially being applied to a database.
  • Given a Cache, the cache settings could be updated while the cache is active outside of any cache mutex. (Note that since the Cache is not yet Customizable, this is a future scenario).

The default implementation of IsMutable is lightweight, only checking if the class has options.

Reading this literally it sounds like you're making "non-mutable" options actually mutable unless IsMutable is overridden. That sounds like it would lead to many race conditions. Why do we want to make "immutable changes"?

@mrambacher
Copy link
Contributor Author

This method allows an object to state that it can no longer be "changed", meaning any non-mutable options can no longer be updated. There are a couple of use cases where this could be important (some current, some future):

  • Given a ColumnFamilyOptions used to create/open a database, the settings inside the TableFactory could be updated outside of the database. This would result in immutable changes potentially being applied to a database.
  • Given a Cache, the cache settings could be updated while the cache is active outside of any cache mutex. (Note that since the Cache is not yet Customizable, this is a future scenario).

The default implementation of IsMutable is lightweight, only checking if the class has options.

Reading this literally it sounds like you're making "non-mutable" options actually mutable unless IsMutable is overridden. That sounds like it would lead to many race conditions. Why do we want to make "immutable changes"?

As currently implemented and deployed, one can change the value of all configurable settings whether they are mutable or not. My understanding of the original design is that only mutable options should be changeable during certain operations/times. The problem arises from options within objects that are shared (as opposed to copied), such as a Cache or TableFactory. In this case, the the original holder of the options object (an Options created by the user) and the DB share a common pointer to an object. So even though you cannot change it settings from INSIDE the DB "Options" code, you could change the values from the OUTSIDE (by modifying a property in the input object).

This PR attempts to close some of those gaps. Once an object is "sealed", it is no longer Mutable, and calls to set the immutable options will fail. These calls will now fail whether done from the inside or outside of the DB.

In an attempt to make IsMutable lightweight, objects must define their own means of "sealing" off. For TableFactory, the logical place for this sealing was in ValidateOptions since Validate may reject the settings for being incompatible (there is no reason to seal something that is not good as it would not be possible to fix). For classes like Cache, this "sealing" could potentially be done as part of create/prepare and hidden/embedded inside of Cache::CreateFromString (meaning no need for a mutex).

This is one solution to sealing these objects. Another "solution" is to document that it is bad coding practice to change values inside of Options once a DB is opened (except via the DB APIs).

This is not a new problem, but has more paths of exposure with the Configurable classes (from table.h in 6.10):
// Note that changing the underlying TableFactory options while the
// TableFactory is currently used by any open DB is undefined behavior.
// Developers should use DB::SetOption() instead to dynamically change
// options while the DB is open.
virtual void* GetOptions() { return nullptr; }

For most classes, changing the options while they are running will probably not be catastrophic. But for something like Cache, I could see it being more of an issue. This PR gives a general way for sealing the object that it is up to individual implementations to decide if they should implement.

@@ -244,6 +244,10 @@ class Configurable {
// initialized.
virtual Status PrepareOptions(const ConfigOptions& config_options);

// Returns true if the object is mutable. An object may be mutable until it
// is prepared and/or validated.
virtual bool IsMutable() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the problem we're trying to solve but I don't think this method really helps solve it. The problem is that there is an inherent race condition in this interface: in a multi-threaded context, the value returned by IsMutable is potentially stale by the time the caller acts on it. That is, the following snippet does not guarantee that PrepareOptions will be called only once since another thread might have swooped in between us checking IsMutable's return value and calling PrepareOptions:

if (config->IsMutable()) {
   Status status = config->PrepareOptions(opts);
   // ...
}

The way around this is to perform the check and prepare the options atomically (under the same mutex lock) in PrepareOptions. Seems to me this is already done in e.g. BlockBasedTableFactory::PrepareOptions, so I don't think IsMutable itself buys us anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having IsMutable in the interface would also eliminate the question of what the default implementation is supposed to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a better ide

Not having IsMutable in the interface would also eliminate the question of what the default implementation is supposed to do.

There are two issues that this PR attempts to address:

  1. Some objects can only be initialized once. These include things like Cache and MemoryAllocator.
  2. Once an object is "prepared", immutable options cannot be changed via the "configure" methods.

The first issue could/can be solved in a thread-safe way without the IsMutable being in the interface. The solution there is to do a create/configure/prepare in a single step where there can be no other threads accessing the new object. There is no race and this is how I implemented it in the MemoryAllocator and Cache customizable PRs. The TableFactory methods theoretically still could have this potential race, though in reality I think most of that is avoided (since I do not believe a TableFactory can be shared between databases and will be prepared and validated at DB setup).

The only way I can see to solve the second issue is to put IsMutable into the API (either public or protected). Ideally this would simply set a bool (atomic ?). If we allow non-mutable options to be changed, then the values used by the system may not be updated correctly. For example, you should not be able to set the capacity of the Cache the same way once it is initialized and you cannot update any setting of the JEMallocator once it is initialized.

Do you have an alternative way of solving the second issue?

Copy link
Contributor

@ltamasi ltamasi Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something but I don't see a conceptual difference between the two issues. I think the second issue could be solved (in a thread-safe fashion) the same way as the first: by performing the mutability check and configuring the object in a single "atomic" operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. The current Configurable::ConfigureFromMap code in the PR seems thread-unsafe for the same reason as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something but I don't see a conceptual difference between the two issues. I think the second issue could be solved (in a thread-safe fashion) the same way as the first: by performing the mutability check and configuring the object in a single "atomic" operation

Options can be configured (or serialized or compared) after an object is made immutable by many APIs. The DB code does exactly this internally currently and only allows mutable options to be updated. There is nothing that would prevent the user from changing immutable ones outside of the DB however.

For the objects that are "initialize only once" (Cache, MemoryAllocator, etc), the process is create, configure, prepare, lock, return the new object. There are no threading issues to deal with. There can be no race because the object cannot yet be shared amongst multiple objects. But to prevent the object from being configured later, the Configurable class (or each "initialize only once" class -- which would require a lot of overridden methods instead of a single one) needs to know that the object has been prepared.

P.S. The current Configurable::ConfigureFromMap code in the PR seems thread-_un_safe for the same reason as above

Yes, there is the potential for some objects (like TableFactory) to have Configure and Validate/Prepare methods called concurrently by multiple threads. Since this is a highly unlikely event (since these objects are not shared between databases), we could probably live with it. For TableFactory, there are typically two ways they are created -- either NewXXXFactory or loaded from the Options file (a two-stage process). For the former, we could "lock" the configurations (Prepare and mark immutable without ramifications -- outside of a few odd tests that do not expect "PreparedOptions"). For the latter, we could mark them immutable when they are loaded from the options file.

Note that the user typically has to go out of their way to create objects that are not prepared. The default option is to prepare options when they are created, so the potential race condition could be documented as a warning to the user on a potentially undefined behavior.

In other words, the race conditions you are concerned about are things we can prevent from happening by default, prevent from happening in many internal cases, and document as potential undefined behavior if the user choses to override the defaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options can be configured (or serialized or compared) after an object is made immutable by many APIs. The DB code does exactly this internally currently and only allows mutable options to be updated. There is nothing that would prevent the user from changing immutable ones outside of the DB however.

Sure, I understand the issue. What I'm saying is that the two problems you outlined are similar in the sense that some configuration operations are not allowed after a certain point, namely once the configurable is prepared. And in the presence of multiple threads, we would need method(s) that combine the check and the operation in question in order to avoid race conditions. This is the same kind of interface problem that affects multi-threaded data structures like queues or stacks: there's no point in providing, say, separate IsEmpty and Pop methods because the value returned by IsEmpty can go stale by the time the caller acts on it and calls (or doesn't call) Pop. (The solution is the same there too: fold the emptiness check and the pop operation into a TryPop method.)

Yes, there is the potential for some objects (like TableFactory) to have Configure and Validate/Prepare methods called concurrently by multiple threads. Since this is a highly unlikely event (since these objects are not shared between databases), we could probably live with it.

I believe we do support sharing a single TableFactory between multiple databases.

In other words, the race conditions you are concerned about are things we can prevent from happening by default, prevent from happening in many internal cases, and document as potential undefined behavior if the user choses to override the defaults.

In my opinion, the chances of the application attempting to "illegally" reconfigure an object via a pointer they hold are very slim to start with. In addition, even if we plug every hole in the Configurable framework (which this PR doesn't do), the application can still mess with the objects behind the framework's back, and there's no way we can prevent that. Given all this, I'm not sure if this partial solution is worth the extra complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. One way to completely eliminate unwanted reconfigurations would be to clone the objects provided by the application. Unfortunately, this would also preclude us from sharing objects across DBs etc., so it's not really an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options can be configured (or serialized or compared) after an object is made immutable by many APIs. The DB code does exactly this internally currently and only allows mutable options to be updated. There is nothing that would prevent the user from changing immutable ones outside of the DB however.

Sure, I understand the issue. What I'm saying is that the two problems you outlined are similar in the sense that some configuration operations are not allowed after a certain point, namely once the configurable is prepared. And in the presence of multiple threads, we would need method(s) that combine the check and the operation in question in order to avoid race conditions. This is the same kind of interface problem that affects multi-threaded data structures like queues or stacks: there's no point in providing, say, separate IsEmpty and Pop methods because the value returned by IsEmpty can go stale by the time the caller acts on it and calls (or doesn't call) Pop. (The solution is the same there too: fold the emptiness check and the pop operation into a TryPop method.)

I agree this PR does not combine the "check" and "operate" for TableFactories -- this gap is currently there for backwards compatibility. But for TableFactory there is nothing that goes completely haywire if the operation goes stale. For the objects where it goes stale (MemoryAllocator, Cache), the "check" and "operate" are combined (by always returning an object that is already prepared and cannot be later modified) and there is no race. Without the changes to the Configurable class for IsMutable, I cannot prevent the Configure/Prepare methods from being called when they should not be unless I override all of them for every class that falls under this pattern, which seems like more code that is harder to maintain than to put it into one base class.

In my opinion, the chances of the application attempting to "illegally" reconfigure an object via a pointer they hold are very slim to start with. In addition, even if we plug every hole in the Configurable framework (which this PR doesn't do), the application can still mess with the objects behind the framework's back, and there's no way we can prevent that. Given all this, I'm not sure if this partial solution is worth the extra complexity.

The chances of an application "illegally" reconfiguring an object are much greater than hitting the threading issues (empty/pop). The API is there for every object and can be called from outside of the DB

The only way I am aware of for objects being messed with behind the framework's back is by getting a pointer to the options (Configurable::GetOptions) and changing the values, which is (and was before this framework existed) documented as unsupported behavior.

I fully agree with you that this change does not completely plug the mutable holes for the TableFactory, but it was also not intended to do so. But I believe this change does allow the hole to be plugged completely for objects like MemoryAllocator and Cache in a simpler manner than could otherwise be implemented. Without this change, I believe those classes will be much more complex and harder to make Customizable and fit into the framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants