-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change hashing algorithm in Saved Object Migrations to be FIPS compliant #169734
Comments
Pinging @elastic/kibana-core (Team:Core) |
This hashing is only used internally as a serialization / comparison mechanism, with internal and controlled data and used, again, only internally. Also note that swaping the algorithm would be a somewhat complicated challenge given we would need a way to still compare the two hashes during the transition. Knowing that, @kc13greiner, can we know how critical this request is? |
Pinging @elastic/kibana-security (Team:Security) |
If we run Node.js in FIPS mode, the If you're asking about the time-critical aspect, we're still figuring it out, but it's likely to be one of the top priorities since it might prevent Kibana from starting (assuming I understand the code correctly, |
So basically, we need to change the algorithm we're using to generate the SO mapping hashes that we're using for mapping comparisons for the I don't think we can do any 2-stage conversion/migration, because it's impacting the traditional offering, and we can't force customers to migrate to a given minor before upgrading higher. Meaning that, we will have, at some point, hashes stored in So.. there's two way to approach that. The easy, and the way less easy.
cc @rudolf @gsoldevila what do you think about this? Do you think just blindly changing the algo we're using for the hashes without doing anything else could be acceptable? |
Yeah we use the hashes stored in the _meta and compare it to the hashes calculated from the mappings do decide if everything is up to date, if not we update the mappings and run an update_by_query to "pick up" the mappings. This is potentially quite an expensive operation so we optimize this by detecting which types of saved objects had new mappings and then only updating those types. One way to preserve all of this in a FIPS compliant way would be to generate FIPS compliant hash snapshots of all previous versions. That way Kibana can find a md5sum in the _meta, lookup what the equivalent sha256 is in the snapshots and then generate a sha256 from the mappings we expect to compare. These optimizations only matter if we're not going to have to reindex anyway, so we don't have to generate a snapshot of each and every possible version, just "index compatible" versions, for .kibana_task_manager this means a snapshot for every migration introduced since 7.17 since they're all compatible, but for .kibana (and the other indices) it's just back to the split (8.8.0). Taking this idea one step further we could consider if this is a good time to switch from mappings comparisons to comparing model versions. Kibana would then use the md5sum in the _meta and lookup the type's model version from the snapshots. If the model versions are different and contain mappings changes we can run an update_by_query. Instead of storing the hashes in the _meta, we'd then store the model versions in the _meta for future migrations. |
At a first glance, (1) seems like a valid solution. But as Rudolf said, This might be too expensive, so we have to consider possible strategies to mitigate this and allow updating only SO types that have actually changed. |
That sounds incredibly tedious, painful and error-prone to be totally honest.... Even if I won't deny this is a possibility, I think I would first focus on to my initial question, "do we think this would be strictly necessary / what would be the impact of NOT doing it? because I feared it would cause a full reindex (to another index, "old-style", considering the mappings are not compatible) but given @gsoldevila's answer
I understand that it would "just" trigger a full
You've lost me here. What about migration from versions prior to model versions being a thing? |
Not sure it's acceptable though. Since we no longer reindex, the
I discussed that one with Rudolf, as I was lost too: It would also involve having a hardcoded snapshot with all model versions for each compatible stack version, and comparing current model versions against that. |
Circling back to Pierre's original comment, since this hash is not encrypting any sensitive information, but it rather is for comparing reasons mainly, couldn't we have an in-house "comparing" algorithm that happens to be backwards compatible with the existing hashes? |
I think we can get away with only storing a snapshot of a single version: the one where we release the update. If we store a mapping of the form: // NB it only contains the md5s of the current version types
const MIGRATION_HASHES_TO_MODEL_VERSION = {
md5_1: '10.0.1', // might correspond to type "T1"
...
md5_n: '10.0.0' // might correspond to type "Tn"
} We can update the logic so that:
WDYT? |
It's definitely only the largest clusters where this is a problem, maybe even less than 0.01%, but we have had SDH's where millions of saved objects took hours to complete the update_by_query step. So if we can come up with a solution that we're confident in would be robust it would be really good.
Yes that's a good point, we'd just need one md5 hash snapshot to know if a type is up to date or old. |
I'm probably missing something, but is that really true for traditional deployments, given a customer can jump versions between upgrades? E.g
Would that still work? |
Yes, I believe that would still work. Given a
|
Are there any additional questions or concerns that need to be discussed before this work can be planned? |
@kc13greiner There's likely still implementation details to be discussed by @elastic/kibana-core, but I don't think we have any more questions for your team. Or, well, maybe one: do we have a better visibility on when we'll need to do this for? |
@pgayvallet We'd like this work complete by "early February" so Ops can begin their work. Please let me know if I can provide any additional information! |
@gsoldevila do you think this could realistically be done for "early February" ? it feels very close tbh |
…6803) ## Summary Address #169734 . We're currently storing information about _Saved Object_ types in the `<index>.mapping._meta`. More specifically, we're storing hashes of the mappings of each type, under the `migrationMappingPropertyHashes` property. This allows us to detect which types' mappings have changed, in order to _update and pick up_ changes only for those types. **The problem is that `md5` cannot be used if we want to be FIPS compliant.** Thus, the idea is to stop using hashes to track changes in the SO mappings. Instead, we're going to use the same `modelVersions` introduced by ZDT: Whenever mappings change for a given SO type, the corresponding `modelVersion` will change too, so `modelVersions` can be used to determine if mappings have changed. --------- Co-authored-by: kibanamachine <[email protected]>
Resolved via #176803 |
Describe the feature:
On Kibana startup, index mappings are currently compared using an md5 hash. This should be updated to use a compliant algorithm.
Describe a specific use case for the feature:
To be FIPS compliant, we need to remove all uses of insecure hashing algorithms, which includes md5.
Acceptable hash algorithms can be found here on pg. 18 in Table 8
The text was updated successfully, but these errors were encountered: