Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

Backend Plugin #3693

Merged
merged 43 commits into from
Nov 24, 2021
Merged

Conversation

kodebach
Copy link
Member

@kodebach kodebach commented Mar 6, 2021

Squashed and rebased version of #2969

Basics

  • Short descriptions of your changes are in the release notes
    (added as entry in doc/news/_preparation_next_release.md which
    contains _(my name)_)
    Please always add something to the release notes.
  • Details of what you changed are in commit messages
    (first line should have module: short statement syntax)
  • References to issues, e.g. close #X, are in the commit messages.
  • The buildservers are happy. If not, fix in this order:
    • add a line in doc/news/_preparation_next_release.md
    • reformat the code with scripts/dev/reformat-all
    • make all unit tests pass
    • fix all memleaks
  • The PR is rebased with current master.

Checklist

  • I added unit tests for my code
  • I fully described what my PR does in the documentation
    (not in the PR description)
  • I fixed all affected documentation
  • I added code comments, logging, and assertions as appropriate (see Coding Guidelines)
  • I updated all meta data (e.g. README.md of plugins and METADATA.ini)
  • I mentioned every code not directly written by me in THIRD-PARTY-LICENSES

Review

@kodebach
Copy link
Member Author

kodebach commented Mar 6, 2021

The current implementation uses this configuration structure:

system:/elektra/mountpoints/\/hosts
system:/elektra/mountpoints/\/hosts/config
system:/elektra/mountpoints/\/hosts/config/glob/set/#0
system:/elektra/mountpoints/\/hosts/config/glob/set/#1
system:/elektra/mountpoints/\/hosts/config/glob/set/#2
system:/elektra/mountpoints/\/hosts/config/glob/set/#3
system:/elektra/mountpoints/\/hosts/config/glob/set/#4
system:/elektra/mountpoints/\/hosts/config/glob/set/#4/flags
system:/elektra/mountpoints/\/hosts/config/mountpoint
system:/elektra/mountpoints/\/hosts/config/path
system:/elektra/mountpoints/\/hosts/error
system:/elektra/mountpoints/\/hosts/error/rollback
system:/elektra/mountpoints/\/hosts/error/rollback/#0
system:/elektra/mountpoints/\/hosts/error/rollback/#0/label (="resolver")
system:/elektra/mountpoints/\/hosts/error/rollback/#0/name (="resolver_fm_hpu_b")
system:/elektra/mountpoints/\/hosts/get
system:/elektra/mountpoints/\/hosts/get/poststorage
system:/elektra/mountpoints/\/hosts/get/poststorage/#0
system:/elektra/mountpoints/\/hosts/get/poststorage/#0/label (="glob")
system:/elektra/mountpoints/\/hosts/get/poststorage/#0/name (="glob")
system:/elektra/mountpoints/\/hosts/get/resolver
system:/elektra/mountpoints/\/hosts/get/resolver/#0
system:/elektra/mountpoints/\/hosts/get/resolver/#0/reference (="resolver")
system:/elektra/mountpoints/\/hosts/get/storage
system:/elektra/mountpoints/\/hosts/get/storage/#0
system:/elektra/mountpoints/\/hosts/get/storage/#0/label (="hosts")
system:/elektra/mountpoints/\/hosts/get/storage/#0/name (="hosts")
system:/elektra/mountpoints/\/hosts/set
system:/elektra/mountpoints/\/hosts/set/commit
system:/elektra/mountpoints/\/hosts/set/commit/#0
system:/elektra/mountpoints/\/hosts/set/commit/#0/reference (="resolver")
system:/elektra/mountpoints/\/hosts/set/precommit
system:/elektra/mountpoints/\/hosts/set/precommit/#0
system:/elektra/mountpoints/\/hosts/set/precommit/#0/label (="sync")
system:/elektra/mountpoints/\/hosts/set/precommit/#0/name (="sync")
system:/elektra/mountpoints/\/hosts/set/prestorage
system:/elektra/mountpoints/\/hosts/set/prestorage/#0
system:/elektra/mountpoints/\/hosts/set/prestorage/#0/reference (="glob")
system:/elektra/mountpoints/\/hosts/set/prestorage/#1
system:/elektra/mountpoints/\/hosts/set/prestorage/#1/label (="error")
system:/elektra/mountpoints/\/hosts/set/prestorage/#1/name (="error")
system:/elektra/mountpoints/\/hosts/set/prestorage/#2
system:/elektra/mountpoints/\/hosts/set/prestorage/#2/label (="network")
system:/elektra/mountpoints/\/hosts/set/prestorage/#2/name (="network")
system:/elektra/mountpoints/\/hosts/set/resolver
system:/elektra/mountpoints/\/hosts/set/resolver/#0
system:/elektra/mountpoints/\/hosts/set/resolver/#0/reference (="resolver")
system:/elektra/mountpoints/\/hosts/set/storage
system:/elektra/mountpoints/\/hosts/set/storage/#0
system:/elektra/mountpoints/\/hosts/set/storage/#0/reference (="hosts")

I think it would be better, if we move to e.g.

system:/elektra/mountpoints/\/hosts/backend (="#6")
system:/elektra/mountpoints/\/hosts/path (="myhosts")

system:/elektra/mountpoints/\/hosts/plugins/#0/name (="resolver_fm_hpu_b")
system:/elektra/mountpoints/\/hosts/plugins/#1/name (="glob")
system:/elektra/mountpoints/\/hosts/plugins/#1/config/set/#0
system:/elektra/mountpoints/\/hosts/plugins/#1/config/set/#1
system:/elektra/mountpoints/\/hosts/plugins/#1/config/set/#2
system:/elektra/mountpoints/\/hosts/plugins/#1/config/set/#3
system:/elektra/mountpoints/\/hosts/plugins/#1/config/set/#4/flags
system:/elektra/mountpoints/\/hosts/plugins/#2/name (="hosts")
system:/elektra/mountpoints/\/hosts/plugins/#3/name (="sync")
system:/elektra/mountpoints/\/hosts/plugins/#4/name (="error")
system:/elektra/mountpoints/\/hosts/plugins/#5/name (="network")
system:/elektra/mountpoints/\/hosts/plugins/#6/name (="backend")

system:/elektra/mountpoints/\/hosts/positions/get/resolver = (="#0")
system:/elektra/mountpoints/\/hosts/positions/get/storage = (="#2")
system:/elektra/mountpoints/\/hosts/positions/get/poststorage/#0 (="#1")

system:/elektra/mountpoints/\/hosts/positions/set/resolver = (="#0")
system:/elektra/mountpoints/\/hosts/positions/set/prestorage/#0 = (="#1")
system:/elektra/mountpoints/\/hosts/positions/set/prestorage/#1 = (="#4")
system:/elektra/mountpoints/\/hosts/positions/set/prestorage/#2 = (="#5")
system:/elektra/mountpoints/\/hosts/positions/set/storage = (="#2")
system:/elektra/mountpoints/\/hosts/positions/set/precommit/#0 = (="#3")
system:/elektra/mountpoints/\/hosts/positions/set/commit = (="#0")
system:/elektra/mountpoints/\/hosts/positions/set/rollback (="#0")
  • The most important change here is that we use a plugins array and the positions only reference elements from that array. This gets rid of the somewhat confusing label/reference system and also puts the config for a plugin in the same hierarchy as its definition.
  • The .../config/mountpoint value was removed. The mountpoint is extracted from the part after system:/elektra/mountpoints, i.e. even internally backends no longer have a name and a mountpoint, but just a mountpoint.
  • The .../config/path value was moved to .../path.
  • The new .../backend value is a reference into the plugins array. It specifies the backend plugin to use. Currently there is just one of course, but this future-proofs the API and we always know exactly which plugins we need for a mountpoint just from its config.
  • All other keys below system:/elektra/mountpoints/_ will be ignored by libelektra-kdb. Instead the keys will just be given to configured backend plugin to process. This means that a backend plugin could use different position names than libelektra-kdb. (see below)
  • For the backend plugin: In the positions part, we now just use array element references. I also merged the error and set parts, since rollback is only called in the set phase. More importantly, only the pre* and post* positions are arrays, since the current implementation doesn't support multiple resolvers, storage plugins, commit or rollback plugins per mountpoint anyway.

Regarding the positions:
In libelektra-kdb we will use resolver, prestorage, storage and poststorage as positions in the get phase. In each of these positions we will call the configured backend plugin exactly once and tell it which position to run. Similarly, in the set phase we will use resolver, prestorage, storage, poststorage and then depending on the outcome either precommit, commit, postcommit or prerollback, rollback, postrollback.

What the configured backend plugin does in each of these positions and which other plugins it delegates to doesn't matter, as long as it fulfils the general contract for each of the positions. This allows us for example to implement system:/elektra/version as a type of backend plugin that handles everything internally.

This also makes it easier to e.g. further split up the prestorage and poststorage positions. There is no need to change libelektra-kdb at all to do this. It could all be done within the backend plugin, just by changing what happens in these phases. This would make it much easier to test such changes.

We would still need changes to the tooling library, but that is not so much a problem. Essentially, the idea is that once you have a system:/elektra/mountpoints/_ configuration, it will always be handled the same way by libelektra-kdb and backend plugins are responsible for maintaining backwards compatibility. Even if a backend plugin makes breaking changes, you could cheat a bit and e.g. copy libelektra-backend.so to libelektra-backend-old.so and just change the relevant plugins/# key from "backend" to "backend-old". It is not an ideal solution, but it would work in a pinch.


@markus2330 @mpranj What do you think?

@markus2330
Copy link
Contributor

There are quite many proposals at once 💖:

  1. splitting up plugin definition and positioning is a very good idea
  2. special handling of "path" does not sound like a good idea, I do not see why it shouldn't be a config like all others
  3. removing config/mountpoint is a good idea
  4. The new .../backend value: I think this can be introduced when needed?
  5. "Instead the keys will just be given to configured backend plugin to process. This means that a backend plugin could use different position names than libelektra-kdb. (see below)" Can you elaborate? Is there an alternative to the backend plugin processing the whole backend structure? One of the goals of the backend plugin was to also allow non-file based backends.

@kodebach
Copy link
Member Author

kodebach commented Mar 7, 2021

  1. special handling of "path" does not sound like a good idea, I do not see why it shouldn't be a config like all others

Yes, you are right, this should have been grouped with the positions part.

  1. The new .../backend value: I think this can be introduced when needed?

Theoretically yes, but it would mean having some awkward fallback code. Better to implement it right away. Using this key also means that libelektra-kdb always needs to load exactly the plugins listed in plugins/#.

Can you elaborate?

I think that got a bit out of hand... I added a (very long) "bit" at the end of doc/decisions/backend_plugin.md

@kodebach
Copy link
Member Author

kodebach commented Mar 7, 2021

PS. maybe start with "Further Examples" right at the bottom of doc/decisions/backend_plugin.md

@markus2330
Copy link
Contributor

Before getting to deeply involved into discussions of how to change the mountpoint configs: Is there maybe an option to get this PR merged without huge changes? Or is the current form so broken that bigger changes are needed anyway? Do you see yourself in doing these bigger changes?

@kodebach
Copy link
Member Author

kodebach commented Mar 7, 2021

I haven't look into the PR very much yet, but I'll try to answer you questions.

Is there maybe an option to get this PR merged without huge changes?

Not sure, but the required changes are probably more than fixing a few small bugs here and there. I suspect something went wrong during the changes to backend.c or mount.c. What I can say based on my free time and the current state of this PR (as far as I know it), is that the PR won't be ready for while (at least a few weeks).

Or is the current form so broken that bigger changes are needed anyway?

I think it is so broken that a complete rewrite is needed, but in some places (especially if it comes to split and trie) I suspect I will be faster in writing a new solution than in understanding the existing code.

Do you see yourself in doing these bigger changes?

Since nobody has worked on this PR in a while, I don't think anyone else is gonna do it...


Is there a reason that this PR should be merged ASAP (other than it being an important change)? IMO we should probably wait with merging this PR until the summer (even if it is ready earlier) so that we don't disturb the CM2021 work.

@markus2330
Copy link
Contributor

Is there a reason that this PR should be merged ASAP

For some checker plugins to be written in CM you can easily come into the "too many plugins" error #2133, e.g. in #3461.

(other than it being an important change)?

Exactly, the earlier we start testing the more bugs we can squash before 1.0.

IMO we should probably wait with merging this PR until the summer (even if it is ready earlier) so that we don't disturb the CM2021 work.

It is a delicate question. We also cannot hold back all big changes. And for CM students changes in the API is probably more annoying than remounting everything. It also depends on timing, e.g. after deadlines bigger changes can be done.

@kodebach
Copy link
Member Author

kodebach commented Mar 7, 2021

For some checker plugins to be written in CM you can easily come into the "too many plugins" error

This shouldn't be a problem. Developing and testing the problem is possible without running into the error. You only run in to trouble, if you actually wanna use many checkers for an application.

Exactly, the earlier we start testing the more bugs we can squash before 1.0.

Of course, but if we want to implement my proposed mountpoint configuration changes (and you basically agreed to at least some of them) we should do so right away. If we implement version A first only to move on to version B, we will only waste time with making version A work.

@kodebach
Copy link
Member Author

kodebach commented Mar 8, 2021

@markus2330 I have now fixed the compile errors and I think fixing this PR is basically the same amount of work as rewriting it (using the existing code as guideline). It may be because of the rebase (the old version was still pre keyname change), but not even the testmod_backend test works...

@lgtm-com
Copy link

lgtm-com bot commented Mar 8, 2021

This pull request introduces 1 alert when merging 830d11f into 591e4ce - view on LGTM.com

new alerts:

  • 1 for FIXME comment

@markus2330
Copy link
Contributor

This shouldn't be a problem. Developing and testing the problem is possible without running into the error. You only run in to trouble, if you actually wanna use many checkers for an application.

Which easily can happen if they try to integrate their H3 plugin into their spec from T1. But you are right, as it is not required probably nobody will try it.

Of course, but if we want to implement my proposed mountpoint configuration changes (and you basically agreed to at least some of them) we should do so right away. If we implement version A first only to move on to version B, we will only waste time with making version A work.

Yes, this is of course true. Probably we should also already consider #1074?

If we can fix these three things (plugin=backend, any number of plugins, user/dir mountpoints) the backends are imho okay for 1.0. Any other issues should be fixable by creating new backends/more plugins.

Thank you so much for working on this 💖

@kodebach
Copy link
Member Author

Yes, I can take a look at #1074 as well and replacing split and trie with a KeySet as proposed in #3492. For details see those issues.

In general, I think we need a more precise and well-defined description of the inner workings of kdbGet and kdbSet. At the moment AFAICT its all mostly defined by what the tests check for.


PS. I may also do a (partial) rewrite of the tools library in C in this PR (not sure if there is an issue for that). I know it's not really needed for 1.0, but since I have to fix the tools library anyway, I might as well use as little of the C++ biding as possible.

This was referenced Mar 11, 2021
@markus2330
Copy link
Contributor

In general, I think we need a more precise and well-defined description of the inner workings of kdbGet and kdbSet. At the moment AFAICT its all mostly defined by what the tests check for.

I could not agree more.

PS. I may also do a (partial) rewrite of the tools library in C in this PR (not sure if there is an issue for that). I know it's not really needed for 1.0, but since I have to fix the tools library anyway, I might as well use as little of the C++ biding as possible.

I am not a fan of partial rewrites. Either we rewrite or we fix the current things. I know that fixing is less popular then rewriting. I am very glad that you are one of the view people who actually do not scare away of fixing complicated things. 💖

@kodebach
Copy link
Member Author

I am not a fan of partial rewrites. Either we rewrite or we fix the current things.

I agree. I won't rewrite the whole tools library (unless it is completely broken by this PR). My idea was to only use the C API and use as little C++ (STL, etc.) as possible in the parts that I have to change, so that the eventual move to pure C will be easier. But it was just an idea, not sure if I'll actually do it this way...

@kodebach
Copy link
Member Author

@markus2330 I have now worked out a "definition of operation" for kdbOpen(), kdbGet(), kdbSet() (and kdbClose()) as well as some info on backend plugins and mountpoint config. It is more a sketch for now, but it should give an idea of how I think everything should work. See the files doc/dev/backend-plugins.md, doc/dev/mountpoints.md and doc/dev/kdb-operations.md.

If you want (and have time) we can discuss details in/after tomorrow's meeting. Before you comment on the proposals, I should probably clarify: The proposals are mostly a "how everything should work". Whether or not this is too big a change to implement is not clear and not something I did consider yet.

@lgtm-com
Copy link

lgtm-com bot commented Mar 14, 2021

This pull request introduces 1 alert when merging d62fd30 into 52f7d40 - view on LGTM.com

new alerts:

  • 1 for FIXME comment

@lgtm-com
Copy link

lgtm-com bot commented Mar 28, 2021

This pull request introduces 11 alerts when merging 8b8c948 into ebc9d20 - view on LGTM.com

new alerts:

  • 11 for FIXME comment

@lgtm-com
Copy link

lgtm-com bot commented Mar 28, 2021

This pull request introduces 10 alerts when merging 917fa77 into ebc9d20 - view on LGTM.com

new alerts:

  • 10 for FIXME comment

@lgtm-com
Copy link

lgtm-com bot commented Apr 4, 2021

This pull request introduces 18 alerts when merging 0161cb1 into 5702ce0 - view on LGTM.com

new alerts:

  • 18 for FIXME comment

In other words, in a `set` operation the `resolver` phase is also about preparing a transaction in addition to resolving the storage unit.

#### Cache-Check Phase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that the backend plugin would call the resolver, which would then more or less do this part

if ((global = elektraPluginGetGlobalKeySet (handle)) != NULL && ELEKTRA_STAT_NANO_SECONDS (buf) != 0)
{
name = elektraCacheKeyName (pk->filename);
ELEKTRA_LOG_DEBUG ("global-cache: check cache update needed?");
Key * time = ksLookupByName (global, name, KDB_O_NONE);
if (time && keyGetValueSize (time) == sizeof (struct timespec))
{
struct timespec cached;
keyGetBinary (time, &cached, sizeof (struct timespec));
if (cached.tv_sec == ELEKTRA_STAT_SECONDS (buf) && cached.tv_nsec == ELEKTRA_STAT_NANO_SECONDS (buf))
{
ELEKTRA_LOG_DEBUG ("global-cache: no update needed, everything is fine");
ELEKTRA_LOG_DEBUG ("cached.tv_sec:\t%ld", cached.tv_sec);
ELEKTRA_LOG_DEBUG ("cached.tv_nsec:\t%ld", cached.tv_nsec);
ELEKTRA_LOG_DEBUG ("buf.tv_sec:\t%ld", ELEKTRA_STAT_SECONDS (buf));
ELEKTRA_LOG_DEBUG ("buf.tv_nsec:\t%ld", ELEKTRA_STAT_NANO_SECONDS (buf));
// update timestamp inside resolver
pk->mtime.tv_sec = ELEKTRA_STAT_SECONDS (buf);
pk->mtime.tv_nsec = ELEKTRA_STAT_NANO_SECONDS (buf);
if (name) elektraFree (name);
errno = errnoSave;
return ELEKTRA_PLUGIN_STATUS_CACHE_HIT;
}
}
}

Maybe we could also put this code directly into the backend plugin. Not sure if that is possible, or if another call to fstat could cause problems.

Other backends (e.g. Database) will use other code to check if the cache is current.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted below, this comparison code is not the same for all resolvers. This is only one implementation and I'm pretty sure this should stay in each resolver, because different resolvers might have different notions of file change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted below, this comparison code is not the same for all resolvers. This is only one implementation and I'm pretty sure this should stay in each resolver, because different resolvers might have different notions of file change.

The idea was that the backend plugin would only be for file-based backends. For e.g. a git-based backend or a database backend you would use a different plugin. These other plugins could do different comparisons. There may be a different resolver plugin involved, but if the resolver code is simple enough then it might be hardcoded without a different plugin.

That said, I will probably keep this code in resolver but export it as a separate function that is called by backend. That makes it clearer what is happening and would still allow a different file-based resolver (e.g. wresolver) to use a different time check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should focus on file-based backends for this PR and 1.0. Later we might create new backends without resolvers, or different non-file-based resolving mechanisms.

4. From now on ignore all backends, which indicated that there is no update.
5. If all backends are now ignored, **return**.
6. If a global cache plugin is enabled:
Ask the global cache plugin for the cache times for all backends.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is what currently happens here

static void elektraCacheLoad (KDB * handle, KeySet * cache, Key * parentKey, Key * initialParent ELEKTRA_UNUSED, Key * cacheParent)
{
// prune old cache info
elektraCacheCutMeta (handle);
if (elektraGlobalGet (handle, cache, cacheParent, PREGETCACHE, MAXONCE) != ELEKTRA_PLUGIN_STATUS_SUCCESS)
{
ELEKTRA_LOG_DEBUG ("CACHE MISS: could not fetch cache");
elektraCacheCutMeta (handle);
return;
}
ELEKTRA_ASSERT (elektraStrCmp (keyName (initialParent), keyName (parentKey)) == 0, "parentKey name differs from initial");
if (elektraCacheCheckParent (handle->global, cacheParent, parentKey) != 0)
{
// parentKey in cache does not match, needs rebuild
ELEKTRA_LOG_DEBUG ("CACHE WRONG PARENTKEY");
elektraCacheCutMeta (handle);
return;
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this part compares whether all parentKeys match the cached versions etc.

Currently the resolver compares the cached times / mtimes and returns this to the core:

case ELEKTRA_PLUGIN_STATUS_CACHE_HIT:
// Keys in cache are up-to-date
++cacheHits;

I'm also quite confident that the resolvers are the best place to do this comparison. It's not clear that mtime is the only possibility: e.g. for gitresolver the actual change may be detected using commit hash or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the resolver compares the cached times / mtimes and returns this to the core

Yes, the comparison doesn't happen here. (see also #3693 (comment)) But the resolver plugin just reads from the global keyset. Where is the code that loads the cached times into the global keyset? That's what I thought was happening here (among other things).

I'm also quite confident that the resolvers are the best place to do this comparison. It's not clear that mtime is the only possibility: e.g. for gitresolver the actual change may be detected using commit hash or so.

See #3693 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the cached times are actually loaded by mmapstorage into the global keyset.

Since the global keyset is just a keyset it is simply also cached and then loaded here:

if (test_bit (mode, MODE_GLOBALCACHE))
{
KeySet * global = elektraPluginGetGlobalKeySet (handle);
KeySet * mmapTimeStamps = (KeySet *) (mappedRegion + OFFSET_GLOBAL_KEYSET);
if (ksGetSize (global) == 0)
{
// fast replace KeySet, if Global KeySet was empty
mmapFastReplace (global, mmapTimeStamps);
}
else
{
// do not overwrite, but append Keys to existing Global KeySet
ksAppend (global, mmapTimeStamps);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the cache plugin writes the times into the global keyset when the cache is updated. This is stored by mmapstorage and when the cache is loaded everything is there already. That would actually be very helpful, then this

Additionally, the metakey `internal/kdb/cachetime` is set to a value indicating the update time of the cache entry.
TODO: exact format of `internal/kdb/cachetime` TBD

wouldn't be needed and format wouldn't be a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the cache plugin writes the times into the global keyset when the cache is updated.

No, the resolver puts the times into the global keyset when a file is read. Resolver is the only code which knows what a change is (in this case mtime), and decides about all of this.

Putting into global keyset happens here:

/* Persist modification times for cache */
if (global != NULL && ELEKTRA_STAT_NANO_SECONDS (buf) != 0)
{
ELEKTRA_LOG_DEBUG ("global-cache: adding file modufication times");
Key * time = keyNew (name, KEY_BINARY, KEY_SIZE, sizeof (struct timespec), KEY_VALUE, &(pk->mtime), KEY_END);
ksAppendKey (global, time);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the resolver puts the times into the global keyset when a file is read. Resolver is the only code which knows what a change is (in this case mtime), and decides about all of this.

Okay that works too. Then we could actually go back to my original concept of using "cache entry IDs" instead of "cache terms". That might work better for e.g. a git-based backend were you could use a commit hash instead of a timestamp.

6. If a global cache plugin is enabled:
Ask the global cache plugin for the cache times for all backends.
7. If all backends have an existing cache entry:
Run the `cachecheck` phase on all backends
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that is currently implemented in resolver. The goal of this new phase is to make the interaction less magically and make easier for other backends (e.g. Database) to support the cache.

7. If all backends have an existing cache entry:
Run the `cachecheck` phase on all backends
8. If all backends indicated the cache is still valid:
Ask the global cache plugin for the cached data and **return**.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this currently happens here

static int elektraCacheLoadSplit (KDB * handle, Split * split, KeySet * ks, KeySet ** cache, Key ** cacheParent, Key * parentKey,
Key * initialParent, int debugGlobalPositions)
{
ELEKTRA_LOG_DEBUG ("CACHE parentKey: %s, %s", keyName (*cacheParent), keyString (*cacheParent));
if (splitCacheCheckState (split, handle->global) == -1)
{
ELEKTRA_LOG_DEBUG ("FAIL, have to discard cache because split state / SIZE FAIL, or file mismatch");
elektraCacheCutMeta (handle);
return -1;
}
ELEKTRA_LOG_DEBUG ("CACHE HIT");
if (splitCacheLoadState (split, handle->global) != 0) return -1;
if (debugGlobalPositions)
{
keySetName (parentKey, keyName (initialParent));
elektraGlobalGet (handle, *cache, parentKey, PREGETSTORAGE, INIT);
elektraGlobalGet (handle, *cache, parentKey, PREGETSTORAGE, MAXONCE);
elektraGlobalGet (handle, *cache, parentKey, PREGETSTORAGE, DEINIT);
}
keySetName (parentKey, keyName (initialParent));
// TODO: there are no error checks here, see kdbGet
elektraGlobalGet (handle, *cache, parentKey, PROCGETSTORAGE, INIT);
elektraGlobalGet (handle, *cache, parentKey, PROCGETSTORAGE, MAXONCE);
elektraGlobalGet (handle, *cache, parentKey, PROCGETSTORAGE, DEINIT);
// replace ks with cached keyset
ksRewind (*cache);
if (ks->size == 0)
{
ELEKTRA_LOG_DEBUG ("replacing keyset with cached keyset");
#ifdef ELEKTRA_ENABLE_OPTIMIZATIONS
if (ks->opmphm) cacheOpmphmDel (ks->opmphm);
if (ks->opmphmPredictor) cacheOpmphmPredictorDel (ks->opmphmPredictor);
#endif
ksClose (ks);
ks->array = (*cache)->array;
ks->size = (*cache)->size;
ks->alloc = (*cache)->alloc;
ks->flags = (*cache)->flags;
#ifdef ELEKTRA_ENABLE_OPTIMIZATIONS
ks->opmphm = (*cache)->opmphm;
ks->opmphmPredictor = (*cache)->opmphmPredictor;
#endif
elektraFree (*cache);
*cache = 0;
}
else
{
ELEKTRA_LOG_DEBUG ("appending cached keyset (ks was not empty)");
ksAppend (ks, *cache);
ksDel (*cache);
*cache = 0;
}
keyDel (*cacheParent);
*cacheParent = 0;
if (debugGlobalPositions)
{
keySetName (parentKey, keyName (initialParent));
elektraGlobalGet (handle, ks, parentKey, POSTGETSTORAGE, INIT);
elektraGlobalGet (handle, ks, parentKey, POSTGETSTORAGE, MAXONCE);
elektraGlobalGet (handle, ks, parentKey, POSTGETSTORAGE, DEINIT);
}
return 0;
}

The split will be removed in this PR, so there will have to be some changes to this code (and the code in the cache plugin). The replacement is a KeySet, but it currently doesn't contain anything that has to be cached. The sizes that were stored in the Split are not used right now (they will be in future).

We can't just store the KeySet that replaces the Split (because it contains pointers specific to the KDB instance), but I think I can create a separate KeySet with the data that has to be cached (AFAIK only the sizes). I assume storing a second KeySet in the cache shouldn't be too hard (at least not harder than storing the Split).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does some checks and if everything is fine loads the cached Split struct as well as the cached KeySet.

I think removing the split also implies changes in mmapstorage. This PR changes some firm assumptions that were made for the cache. I did not expect the Split going away, neither did I expect the whole backend stuff to be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume storing a second KeySet in the cache shouldn't be too hard (at least not harder than storing the Split).

Actually, the split data is already stored within a keyset for the cache. Since I already wrote the serialization code for keysets I tried to squeeze everything into a keyset.

Copy link
Member Author

@kodebach kodebach Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, the cache plugin doesn't know about the Split structure anyway. Instead the Split is written into the global keyset, which is then cached (didn't look into how that works exactly). So if I just write the new data into the global keyset instead, everything should still work.

Basically, I would replace the for loop in this function:

void splitCacheStoreState (KDB * handle, Split * split, KeySet * global, Key * parentKey, Key * initialParent)
{
Key * mountPoint = mountGetMountpoint (handle, keyName (parentKey));
const char * mountPointName = mountPoint == NULL ? "" : keyName (mountPoint);
const char * mountPointValue = mountPoint == NULL ? "default" : keyString (mountPoint);
Key * lastParentName = keyNew (KDB_CACHE_PREFIX "/lastParentName", KEY_VALUE, mountPointName, KEY_END);
Key * lastParentValue = keyNew (KDB_CACHE_PREFIX "/lastParentValue", KEY_VALUE, mountPointValue, KEY_END);
Key * lastInitalParentName = keyNew (KDB_CACHE_PREFIX "/lastInitialParentName", KEY_VALUE, keyName (initialParent), KEY_END);
Key * lastSplitSize =
keyNew (KDB_CACHE_PREFIX "/lastSplitSize", KEY_BINARY, KEY_SIZE, sizeof (size_t), KEY_VALUE, &(split->size), KEY_END);
ksAppendKey (global, lastParentName);
ksAppendKey (global, lastParentValue);
ksAppendKey (global, lastInitalParentName);
ksAppendKey (global, lastSplitSize);
ELEKTRA_LOG_DEBUG ("SIZE STORAGE STORE STUFF");
for (size_t i = 0; i < split->size; ++i)
{
// TODO: simplify this code below, seems like this affects only the last split anyway
if (!split->handles[i])
{
ELEKTRA_LOG_DEBUG (">>>> Skipping split->handle[%ld]: pseudo-backend or no mountpoint", i);
ELEKTRA_ASSERT (i == (split->size - 1), "ERROR: NOT THE LAST SPLIT");
continue;
}
if (test_bit (split->syncbits[i], 0))
{
/* Dont process keysets which come from the user
and not from the backends */
ELEKTRA_LOG_DEBUG (">>>> Skipping split->handle[%ld]: syncbits is 0, keyset from user", i);
ELEKTRA_ASSERT (i == (split->size - 1), "ERROR: NOT THE LAST SPLIT");
continue;
}
// TODO: simplify this code above, seems like this affects only the last split anyway
char * name = 0;
if (split->handles[i]->mountpoint != NULL)
{
name = elektraStrConcat (KDB_CACHE_PREFIX "/splitState/mountpoint/", keyName (split->handles[i]->mountpoint));
}
else
{
name = elektraStrConcat (KDB_CACHE_PREFIX "/splitState/", "default/");
}
// Append parent name for uniqueness (spec, dir, user, system, ...)
char * tmp = name;
name = elektraStrConcat (name, keyName (split->parents[i]));
elektraFree (tmp);
ELEKTRA_LOG_DEBUG (">>>> STORING split->handle[%ld] with name: %s :::: parentName: %s, parentValue: %s", i, name,
keyName (split->parents[i]), keyString (split->parents[i]));
Key * key = keyNew (name, KEY_END);
keyAddBaseName (key, "splitParentName");
keySetString (key, keyName (split->parents[i]));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
key = keyNew (name, KEY_END);
keyAddBaseName (key, "splitParentValue");
keySetString (key, keyString (split->parents[i]));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
key = keyNew (name, KEY_END);
keyAddBaseName (key, "specsize");
keySetBinary (key, &(split->handles[i]->specsize), sizeof (ssize_t));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
key = keyNew (name, KEY_END);
keyAddBaseName (key, "dirsize");
keySetBinary (key, &(split->handles[i]->dirsize), sizeof (ssize_t));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
key = keyNew (name, KEY_END);
keyAddBaseName (key, "usersize");
keySetBinary (key, &(split->handles[i]->usersize), sizeof (ssize_t));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
key = keyNew (name, KEY_END);
keyAddBaseName (key, "systemsize");
keySetBinary (key, &(split->handles[i]->systemsize), sizeof (ssize_t));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
key = keyNew (name, KEY_END);
keyAddBaseName (key, "syncbits");
keySetBinary (key, &(split->syncbits[i]), sizeof (splitflag_t));
ksAppendKey (global, key);
ELEKTRA_LOG_DEBUG (">>>> STORING key: %s, string: %s, strlen: %ld, valSize: %ld", keyName (key), keyString (key),
strlen (keyString (key)), keyGetValueSize (key));
keyDel (key);
elektraFree (name);
}
}

EDIT: Of course I'd also need to change the corresponding load function...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work. Note that I also needed to store the old parentKey name etc., but basically that's all there is to it.

14. Split data back into individual backends.
15. Run the `poststorage` phase for all backends.
16. Merge the data from all backends.
17. If a global cache plugin is enabled, update cache.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this currently happens here

if (handle->globalPlugins[POSTGETCACHE][MAXONCE])
{
splitCacheStoreState (handle, split, handle->global, cacheParent, initialParent);
KeySet * proc = elektraCutProc (ks); // remove proc keys before caching
if (elektraGlobalSet (handle, ks, cacheParent, POSTGETCACHE, MAXONCE) != ELEKTRA_PLUGIN_STATUS_SUCCESS)
{
ELEKTRA_LOG_DEBUG ("CACHE ERROR: could not store cache");
// we must remove the stored split state from the global keyset
// if there was an error, otherwise we get erroneous cache hits
elektraCacheCutMeta (handle);
}
elektraRestoreProc (ks, proc);
}
else
{
elektraCacheCutMeta (handle);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Comment on lines 129 to 156
5. Determine which backends contain changed data.
From now on ignore all backends that have not changed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this part is not implemented yet. That means right now we always write all backends, if anything has changed. Once this is implemented, it may also be possible to get rid of Step 1 and/or 2.

- cascading and `meta:/` keys are always illegal in `ks` (should be enforced via different KeySet types)
- `spec:/` backends only go through `init`, `resolver`, `cache`, `presetstorage` and `storage` phases as normal, but their `poststorage` phase is called earlier.
- `dir:/`, `user:/` and `system:/` go through all phases as described above.
- `proc:/` mountpoints go through all the phases as described above, but they are not stored in the cache.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interaction of proc:/ mountpoints and the cache is not quite clear yet. The easiest solution would be to just ignore the cache (no load, no store), if a proc:/ mountpoint is used. The problem is that gopts kind of acts like a proc:/ mountpoint and an incompatibility between cache and gopts would be bad.

I think it should be safe to just load the cache run prestorage and storage for proc:/ then call gopts and spec and finally run poststorage for proc:/. The assumption here is that a change in proc:/ keys can never affect other keys. For proc:/ mountpoints and gopts this is the case (both should only write to proc:/, needs to be enforced). The question is what we do with spec. The current implementation shouldn't have a problem AFAIK, but IIRC we wanted to generate more metakeys (e.g. array) there might be some edge cases there. I also haven't thought about notifications at all yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear which feedback should be given here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mostly I note to myself (I had already forgotten, about the cache + proc:/ problem). I'm pretty sure the strategy of "if cache-hit -> then load proc:/ in isolation" should work fine. If you have no issues with this either then there is nothing to discuss.

@kodebach
Copy link
Member Author

kodebach commented Apr 5, 2021

@markus2330 This branch now contains a working proof-of-concept implementation (*) of the kdbOpen/kdbGet/kdbSet and backend plugin code that is described in doc/dev/backend-plugins.md, doc/dev/mountpoints.md and doc/dev/kdb-operations.md. It would be nice, if you could look through these files (most of them aren't very details anyway) and check if I forgot about something important.

I don't know when I will have time to finish this PR (probably not for a while), so the review is not urgent. It would still be nice to know that the basic concept is sound.


Re global plugins: There are some preparations for the move to specialised/hardcoded positions for global plugins. Whether this will be done in this PR or another PR is still TBD. If the old global plugins are kept for this PR, then the PR will only support these positions: PROCGETSTORAGE MAXONCE, POSTGETSTORAGE MAXONCE, PRESETSTORAGE MAXONCE, POSTCOMMIT MAXONCE. These are the positions that are used for gopts, spec and notifications right now AFAIK.

Re notifications: The code for notifications is completely missing right now.

Re cache: The code for the cache is not there yet. There are some preparations, however. See the other comments for more information. @mpranj Please check that I understand the correctly which part of the current code does what in regard to the cache.


(*) A basic kdb get/kdb set should work for the default/hardcoded mountpoints. kdb mount is not updated yet, so mountpoints configs have to be written manually (haven't tested that yet). There are zero tests for the new code and none of the existing tests have been update. There are also still loads of memory leaks. Be careful when testing. If your environment isn't isolated you may break existing Elektra installs.

@lgtm-com
Copy link

lgtm-com bot commented Apr 5, 2021

This pull request introduces 18 alerts when merging 8989908 into 5702ce0 - view on LGTM.com

new alerts:

  • 18 for FIXME comment

@mpranj
Copy link
Member

mpranj commented Apr 6, 2021

@kodebach thank you so much for working on this.

I left some initial notes regarding the cache stuff. Since this is quite a huge change I think it would be best that I review and leave more notes regarding the cache when I have time. As noted, some assumptions are changing and there are probably more things to be adapted regarding the cache.

I also don't know when I will have time to thoroughly review it and think about the changes. (Probably not soon, but it should not be a problem as you also don't plan to finish this soon.)

@lgtm-com
Copy link

lgtm-com bot commented Nov 8, 2021

This pull request introduces 25 alerts and fixes 3 when merging c08f581 into c039996 - view on LGTM.com

new alerts:

  • 25 for FIXME comment

fixed alerts:

  • 3 for Comparison result is always the same

@kodebach kodebach changed the base branch from master to new-backend November 17, 2021 14:35
@kodebach kodebach marked this pull request as ready for review November 17, 2021 14:35
@kodebach
Copy link
Member Author

@markus2330 @mpranj If you agree, I would like to merge this PR into the newly created new-backend branch now. Then I can do the remaining work in smaller increments and separate PRs, without affecting master.

@lgtm-com
Copy link

lgtm-com bot commented Nov 17, 2021

This pull request introduces 25 alerts and fixes 3 when merging c08f581 into 373a87d - view on LGTM.com

new alerts:

  • 25 for FIXME comment

fixed alerts:

  • 3 for Comparison result is always the same

@kodebach kodebach merged commit cd33c3b into ElektraInitiative:new-backend Nov 24, 2021
@mpranj mpranj added this to the 0.9.12 milestone Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants