-
Notifications
You must be signed in to change notification settings - Fork 123
Binary metadata vs flag #4194
Comments
I agree the binary keys are a bit awkward as two different interfaces are mixed. Using KEY_FLAG_BINARY, however, does not solve the issue: if you have an interface to unset flags, you create the same problem. Solutions would be: 1, disable the possibility to remove These are the two extremes: 1. basically restricts users more so that they cannot do anything wrong, 2. basically allows this usage but makes it less awkward as the API does not automatically do anything I don't agree that it is not metadata, imho it is perfectly metadata: it describes how the value should be interpreted., e.g. value: 5, meta:/type=octet without meta:/binary we interpret the value as 5 |
Yes, the core issue is that you can remove the "binary marker" (whatever that is). Switching to a flag is how I would solve this problem, because:
Yes, and both are equally bad. As stated above, 1 creates an inconsistent API (not all metadata is equal), and 2 almost removes the idea of string vs. binary keys entirely. The worst thing would be that mean that this very innocent piece of code would fail: Key * k;
keySetBinary (k, "abc", 4);
char buf[4];
keyGetBinary (k, buf, 4); // fails, because k is not marked as binary Removing the automatic handling of the "binary marker" would only work, if we remove the distinction between string and binary keys entirely.
Regarding your example: IMO const void *keyValue (const Key *key);
ssize_t keyGetValueSize (const Key *key);
ssize_t keySetValue (Key *key, const void *newValue, size_t valueSize); However, the moment we automatically set the marker OR check the marker for any reason, it is no longer truely metadata. PS. You only addressed the "Metadata vs. data" and "Broken API" issues. I also added a section about the performance issues to the top post. I originally ignored performance, but I now think at least the memory impact can be a big problem. In fact the memory impact of |
I don't like the flag because it bakes a rarely-used feature even more strongly into Elektra. This is also why I did not respond to the memory concerns: the feature does not need any memory if not used.
I like it, maybe better we use char, as strings are the common way to be used as value: const char *keyValue (const Key *key);
ssize_t keyGetValueSize (const Key *key);
ssize_t keySetValue (Key *key, const void *newValue, size_t valueSize); Then we can have a feature where BUT: this would be a huge change... |
I'd argue that
However, in principle I agree that rarely used features should not affect the API (if possible).
But it is used; extensively and within a very core part of Elektra. Every plugin creates a keyset with binary keys for its contract. There are also other more internal uses, but this case is very much a public API. So we can't really call binary keys a "rarely-used feature".
You misunderstood. The code snippet, was NOT a proposal for a change. I just wanted to explain that this would be the only case, in which I'd consider
IMO
We can still keep ssize_t keySetString (Key * key, const char * newString)
{
return keySetValue (key, newString, 0);
}
Yes, it would be a massive time investment. I agree that the simple So in the end, I still think |
I agree that the API in its current form would work better with a KEY_FLAG_BINARY. This is also because the API was designed with a KEY_FLAG_BINARY, metadata was introduced later.
Because it would need a special handling for binary keys at many places, in particular in storage plugins. Currently xmltool, dump and so on "simply work" as they write out value&metadata, so binary keys are supported without any additional code. |
That is a pretty weak argument IMHO, considering that storage plugins are already very complex. Adding this little bit of extra code to the very few plugins that don't already need to handle binary keys separately wouldn't make much of a difference.
Do you have more than one example for these "many places"? I can't think of another use case, where binary vs. string makes a difference, but you don't need extra code. The only thing that comes to mind is Essentially, the only issue would be if (in some form) A gives a key value (and size) and key metadata, but not the whole
This is not true. libelektra/src/plugins/dump/dump.cpp Lines 60 to 87 in 3d7c47a
As to The only two storage plugins I know of, where string vs. binary value could be ignored, are libelektra/src/plugins/quickdump/quickdump.c Lines 434 to 479 in 3d7c47a
There may be another example, but AFAIK all other formats are text-based and therefore already need to encode binary values somehow, if they support them at all. Even if there were such a plugin, I still think the argument "we can't change the core API, because a handful of plugins need to change" is pretty weak. |
It demonstrates very well how different mindsets on these issues can be and how they reflect on code. It looks like you already implemented dumpv2 as-if binary would not be metadata. dump.cpp now has 589 lines, a few years ago it had only 300 lines. In my mindset, binary differs in
With 3. being the most important point but we currently do not specify how the types are actually interpreted as the feature is not used (except of storing some function pointers). Obviously, XML always needs escaping, for both string and binary values. I cannot follow the argument "it is already complicated, so let us make it more complicated" and prefer the mindset "it is complicated, so let us make it more simple". Imho to have binary as metadata made handling binary keys more uniform, especially in serialization/deserialization which is a bigger topic and not only storage plugins. Also UIs need less special treatment for binary keys. I agree it is not perfect, especially not in the API. Is your point that the API cannot be fixed because binary is fundamentally not metadata? |
That doesn't mean much. This could also just be formatting changes (e.g. single line
IMO this is the main difference.
A secondary restriction to make working with string values easier. But not really relevant in practice, since you can use the empty string
The interpretation of values isn't fully fixed for string values either. But yes, a string value can safely be treated as a string. A binary is opaque unless you have additional context.
In principle I agree, however, in the specific case of storage plugins it is not possible to "make it more simple". A lot of the complexity comes from parsing/writing the storage format and the subtle (or sometimes not so subtle) differences and conflicts between the storage format and Elektra's internal format.
I think this is a false conclusion. It seems to be more uniform, but it actually causes subtle issues and often doesn't make things easier. For example, a UI can't display non-print characters without extra code anyway, so it will always need to check for binary values. The fact that in one case, you get an automatic "visual" indicator for a binary key because you display
I actually don't have a problem with the API, but with the (leaky) implementation.
I'd say it is not metadata, IF we keep the current API. Fundamentally, the distinction between data and metadata depends on the context IMHO. For example, in the high-level API Most of all, I don't think it is worth sacrificing large amounts of memory (and some performance) and accepting the "Side effects" and "Broken API" issues, as a trade of for "slightly more uniform and sometimes easier to handle". |
UIs always need to do escaping of various escape characters, the null character is only one of many. Imho removing keyIsBinary, keyIsString, keySetString and keySetBinary would be the simplest solution. People simply add the metadata themselves as needed. Then there won't be any expectation that removing the metadata converts the key to non-binary. The main argument for keeping If the binary data in @kodebach if you don't fundamentally disagree, I would close this issue and create a new issue to remove keyIsBinary, keyIsString, keySetString and keySetBinary and automatic (un)setting of @lawli3t What are your thoughts on that issue? |
As far as I can tell, your main problem is that to you conceptually a "flag" is an in-memory-only thing. The fact that a value is a string/binary is clearly not in-memory-only, so it cannot be a flag. However, consider it like this: whether a value is string or binary is part of the value itself, without this information the value is incomplete. Now it is obvious that this distinction must be preserved with the value (a fact that is not obvious with It seems to me that once again we have a debate of philosophical reasons vs. technical reasons. With the key names I could accept the technical issue that come from allowing any key name part whatsoever, but here I feel like the technical issues are far too much to accept for philosophical reasons. More detailed answers below:
But IMHO this would only make sense, if the new "default" is binary, i.e. a key can contain any chunk of bytes. How these bytes should be interpreted is not defined. Metadata can be used to give some specs. Also this solution (even if the "default" remains string values) is the simplest in principle, but requires by far the biggest time investment.
IMO flags are an implementation detail. I wouldn't say
Again, I'd say flags are an implementation detail. The binary vs. string thing to me is actually an attribute of the key value. It should therefore be documented wherever key values are explained (which probably already contains a note about string vs. bianry).
I don't think this is the best (in terms of cost vs. benefit) solution here. That is unless, you are volunteering to actually do the PR for this massive change. Like I said, I'm fine with just a simple
Not sure what part exactly you are referring to, but the root issue of Changing just the |
Good idea, I totally agree that this the current situation is conceptually wrong. I added a draft for a proposal in #4201 (to be discussed on Monday).
Did you check which test breaks if we do not export them anymore? I think it is a very rare feature to actually call such functions. Anyway, I don't think we should stop offering binary keys.
Unfortunately we didn't write a decision about this. Do you remember where we discussed this? It would be good if you can add this to https://github.com/ElektraInitiative/libelektra/tree/new-backend
This doesn't need to be that way, it would be possible to have "special static keys" like
👍
Complexity is a big technical downside. The binary/string API is unnecessarily difficult to use and has many weird problems like #3797. |
There are at least the
I think it was just a sidenote in one of the comments in #3693. IIRC we more or less agreed the issue is independent of the new backend system and we postponed it.
Of course hacky optimizations are possible. But such work around make things a lot more complicated to think about, required a lot more code, create more potential for errors, etc.
I don't see how the flag would be any more complex than the metakey. The complexity is in different places, but it is there in both cases.
Please note, because of the enormous effort involved in actually implementing this change, I still prefer the solution with a flag. Even though that is not the optimal solution. |
As discussed, see #4201 |
We already had this discussion once (*), but I think we need to have it again before we finish the Rust implementation or release 1.0
Current situation
Currently, a binary key is identified by the presence of the
meta:/binary
meta key. This is a very simply solution, but it has many problems IMO.Problems
Side effects
Certain functions modify metadata as a side-effect. Apart from the possible performance and memory issues (which are discussed below), this also means that
KEY_FLAG_RO_VALUE
andKEY_FLAG_RO_META
are tied together for binary keys.Memory and performance impact
Checking if a metakey is present may not be significantly slower than checking for a flag (*), but everything else is very problematic.
Setting a metakey (that isn't present already) in the best case means creating a new
Key
and doing aksAppendKey
. Even without any benchmarks I can guarantee that this is significantly slower than the single x86 instruction (**) you need to set a bit flag.In the worst case, we also need to create the metadata
KeySet
, before we can add the the newKey
. This adds even moremalloc
s.And then there is the memory impact. A new flag would use a single bit in theory and no memory at all in practice, because we still have space in the current flags field. The metakey on the other hand needs memory for both the
Key
itself and theKeySet
containing it. In the best case, this means we useHowever, in the current setup a
KeySet
will always allocate at least 16 elements in it's array, so we actually use 256 B of memory to store 1 bit of information. In other words, we use 2048 times more memory than necessary.This may not matter, if binary keys are rare. But if a user creates a simple config with 16 binary keys and no metadata, this will use 4096 B of memory just for the
meta:/binary
keys. To make this even more clear: We are only storing 16 bits (= 2 B) of information here. We are wasting 4094 B or 99.95 % (Note: actually we are wasting all 4096B because the flag doesn't need additional memory).Of course the ratio will improve as soon as other metadata is used (then we "only" waste ~72 B per Key). But I would argue that actually storing binary data in the KDB is not that common, while using a binary Key internally as a generic wrapper for data is quite common. For example, we use it for every plugin's contract (via the
KEY_FUNC
shortcut). Such keys will almost never use any actual metadata. For all of these keys, we are wasting huge amounts of memory.(*) I suspect the difference will still be significant. Checking the bit-flag should only be 1-2 instructions.
(**) RISC ISAs will need ~3 instructions (load, or, store)
Metadata vs Data
IMO
meta:/binary
is not actually metadata, but part of the data itself.The distinction between metadata and data (in my mind) is that, if you strip away all metadata the data itself can still be interpreted and does not loose its meaning. Without the metadata some information may be lost, but the core information must remain.
As an analog take a JPEG. There is some raw pixel data in the file. Alone it doesn't make much sense, so we also store the width and height of the image (and various bits of compression data, but let's ignore that). The raw pixel data, together with width and height is enough to display the image. So this is the data.
Some JPEGs also have an EXIF section, which may contain the DPI value of the image. This can be useful, in certain contexts (e.g. when printing the image), but the core information (i.e. the image itself) remains the same, whether we know the DPI or not. Therefore, the DPI is metadata.
Now, imaging we have a binary key in Elektra and remove all metadata. Suddenly, it is no longer a binary key. If its value contains a zero byte, it will be interpreted totally differently and we loose core information. Therefore, whether or not a key is binary cannot be metadata.
Exposing implementation details
The fact that in Elektra Key can contain a string value or binary value is part of the public API. However, how a Key is marked as string/binary should be up to the implementation. It is an implementation detail, that internally we just use a
union
and a single pointer together with some other tag. In another language you might want to use another mechanism (e.g. inheritance/subtyping, sum types, etc.). The fact that all implementations are forced to use themeta:/binary
metakey as a marker, exposes an implementation detail.Broken API
You can do this:
This is simply wrong and broken. It shouldn't be possible to change the value of a key, by changing it's metadata. But here we remove
meta:/binary
and therefore thedata
part of the value will be ignored by anyone usingk
. Effectively the value is now justsome
.Solution
The solution to all of this is incredibly simple. We don't use
meta:/binary
anymore and instead use a newKEY_FLAG_BINARY
.Notes
Many of the arguments above, can also be made for
meta:/type
. Especially, when the high-level API is considered. However, since we also store numbers as strings, the type of the key is actually metadata. The core information remains the same. Only in certain contexts (i.e. when converting the key value to a native type likeint
ordouble
) does the interpretation change.(*) I didn't find the original issue, if someone does, please link it
cc @markus2330 @lawli3t
The text was updated successfully, but these errors were encountered: