Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create patient resource using a deterministic id #183

Merged
merged 7 commits into from
Apr 15, 2022

Conversation

wi-y
Copy link
Contributor

@wi-y wi-y commented Apr 11, 2022

Previously when the Iot Connector was in Create mode and we received messages at the same time for the same patient and different device ids (which landed in different partitions) it was possible to create a duplicate patient resources. This is because we first lookup the patient and then create the patient if it does not exist. However there is a problematic case when the messages are in different partitions because it is possible for the lookup for to not find any patients (as expected) but then the Iot Connector goes on to create a patient for partition 1 and then create a patient for partition 2. To mitigate this issue/race condition we are now creating a hash of the patient identifier so that the patient id is deterministically generated and the patient resource is saved with this deterministic id. That way we will always create and update a patient resource with the same id. Now, if we were to hit this race condition, it will result in the patient resource being updated twice instead of creating two separate patient resources. While not perfect solution it is one of the more straightforward implementations, and it mitigates the issue.

@wi-y wi-y requested a review from a team as a code owner April 11, 2022 14:12
sb.Replace("/", "_");
sb.Replace("=", ".");

return sb.ToString().ToLower();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do ToLower(). Base64 uses upper and lowercase characters. Using ToLower() will open the possibilities of collisions. It looks like upper case letters are supported for FHIR id values, https://www.hl7.org/fhir/datatypes.html#id. Is there a reason for the ToLower()?

It looks like the max length of an id is 64 characters though, so we will need to guard against that. The regex for the id is [A-Za-z0-9-.] so I don't think _ is valid.

The 64 character limit is going to be problematic I think, especially since we are combining the Identifier and System. We could XOR the bytes together after padding the the shorter byte array instead of concatenating the strings themselves.

We may need to look into a high entropy hash function again (I forgot about the id length limitation when I suggested Base64 encoding).

If you want to control the characters you are generating from the byte array you can do something like this (ensure we only get supported characters).

    static class OpaqueValueGenerator
    {
        private const string OpaqueIdCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-.";
        private const int CharacterShift = 64;
        private const int MaxLength = 64;

        public static string GenerateValue(string input)
        {
            var idBytes = Encoding.UTF8.GetBytes(input);
            var value = new BigInteger(idBytes);

            var sb = new StringBuilder();

            while (value != 0)
            {
                sb.Append(OpaqueIdCharacters[(byte)(value % CharacterShift)]);
                value /= CharacterShift;
            }

            string id = sb.ToString();

            if (id.Length > MaxLength)
            {
                throw new ArgumentOutOfRangeException(nameof(input));
            }

            return id;
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dustin - that is a very good catch regarding the valid fhir id characters and length, I had not thought about that. I think we should go with a sha256 hash. It will always result in a 64 character string with the characters [a-fA-F0-9] and I believe has the one of lowest chances of a collision. It seems to perform decently as well.

@wi-y wi-y requested a review from dustinburson April 14, 2022 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants