-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create patient resource using a deterministic id #183
Create patient resource using a deterministic id #183
Conversation
src/lib/Microsoft.Health.Extensions.Fhir.R4/Service/ResourceManagementService.cs
Outdated
Show resolved
Hide resolved
src/lib/Microsoft.Health.Extensions.Fhir.R4/Service/ResourceManagementService.cs
Outdated
Show resolved
Hide resolved
src/lib/Microsoft.Health.Extensions.Fhir.R4/Service/ResourceManagementService.cs
Outdated
Show resolved
Hide resolved
src/lib/Microsoft.Health.Extensions.Fhir.R4/Service/ResourceManagementService.cs
Outdated
Show resolved
Hide resolved
src/lib/Microsoft.Health.Extensions.Fhir.R4/Service/ResourceManagementService.cs
Outdated
Show resolved
Hide resolved
sb.Replace("/", "_"); | ||
sb.Replace("=", "."); | ||
|
||
return sb.ToString().ToLower(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do ToLower(). Base64 uses upper and lowercase characters. Using ToLower() will open the possibilities of collisions. It looks like upper case letters are supported for FHIR id values, https://www.hl7.org/fhir/datatypes.html#id. Is there a reason for the ToLower()?
It looks like the max length of an id is 64 characters though, so we will need to guard against that. The regex for the id is [A-Za-z0-9-.] so I don't think _ is valid.
The 64 character limit is going to be problematic I think, especially since we are combining the Identifier and System. We could XOR the bytes together after padding the the shorter byte array instead of concatenating the strings themselves.
We may need to look into a high entropy hash function again (I forgot about the id length limitation when I suggested Base64 encoding).
If you want to control the characters you are generating from the byte array you can do something like this (ensure we only get supported characters).
static class OpaqueValueGenerator
{
private const string OpaqueIdCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-.";
private const int CharacterShift = 64;
private const int MaxLength = 64;
public static string GenerateValue(string input)
{
var idBytes = Encoding.UTF8.GetBytes(input);
var value = new BigInteger(idBytes);
var sb = new StringBuilder();
while (value != 0)
{
sb.Append(OpaqueIdCharacters[(byte)(value % CharacterShift)]);
value /= CharacterShift;
}
string id = sb.ToString();
if (id.Length > MaxLength)
{
throw new ArgumentOutOfRangeException(nameof(input));
}
return id;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Dustin - that is a very good catch regarding the valid fhir id characters and length, I had not thought about that. I think we should go with a sha256 hash. It will always result in a 64 character string with the characters [a-fA-F0-9] and I believe has the one of lowest chances of a collision. It seems to perform decently as well.
src/lib/Microsoft.Health.Extensions.Fhir.R4/Service/ResourceManagementService.cs
Show resolved
Hide resolved
This reverts commit 0600d80.
Previously when the Iot Connector was in Create mode and we received messages at the same time for the same patient and different device ids (which landed in different partitions) it was possible to create a duplicate patient resources. This is because we first lookup the patient and then create the patient if it does not exist. However there is a problematic case when the messages are in different partitions because it is possible for the lookup for to not find any patients (as expected) but then the Iot Connector goes on to create a patient for partition 1 and then create a patient for partition 2. To mitigate this issue/race condition we are now creating a hash of the patient identifier so that the patient id is deterministically generated and the patient resource is saved with this deterministic id. That way we will always create and update a patient resource with the same id. Now, if we were to hit this race condition, it will result in the patient resource being updated twice instead of creating two separate patient resources. While not perfect solution it is one of the more straightforward implementations, and it mitigates the issue.