Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating data type constraints #1174

Merged
merged 7 commits into from
Sep 13, 2024
Merged

Conversation

floydtree
Copy link
Contributor

@floydtree floydtree commented Sep 10, 2024

Related Issue: Discussed in the Weekly Call Sep 10, 2024.

Description of changes:

  1. Removing max limits from file_hash_t, resource_uid_t & string_t
  2. Fixing datetime_t regex, to allow t/T , z/Z in the timestamp string, allowing supported variations as defined in the RFC3339 - https://www.rfc-editor.org/rfc/rfc3339#section-5.6

@floydtree floydtree added bug Something isn't working non_breaking Non Breaking, backwards compatible changes labels Sep 10, 2024
Signed-off-by: Rajas Panat <[email protected]>
mikeradka
mikeradka previously approved these changes Sep 10, 2024
dictionary.json Outdated Show resolved Hide resolved
dictionary.json Outdated Show resolved Hide resolved
@rmouritzen-splunk
Copy link
Contributor

rmouritzen-splunk commented Sep 10, 2024

What's the rationale for removing the max lengths for these types? I don't mind them, but I'm curious why. (I missed most of today's weekly call. But this should be captured anyway.)

I suspect these were defined as sanity checks. Also, the string_t limit of 65535 seems like another Java-ism. Java has a 65535 "character" (UTF-16 code unit) limit for string constants (just constants, not arbitrary strings).

@floydtree
Copy link
Contributor Author

floydtree commented Sep 11, 2024

What's the rationale for removing the max lengths for these types? I don't mind them, but I'm curious why. (I missed most of today's weekly call. But this should be captured anyway.)
I suspect these were defined as sanity checks. Also, the string_t limit of 65535 seems like another Java-ism. Java has a 65535 "character" (UTF-16 code unit) limit for string constants (just constants, not arbitrary strings).

Sure, good to recap and have it documented.

Current max length limits for file_hash_t was 64 chars, doesn't make sense even if you consider existing hashing algorithms like SHA-512, which yields 128 chars. Similarly for resource_uid_t, the limit of 64 is arbitrary, that type was added to capture various different uids, e.g. AWS ARNs, 64 chars is not a suitable max length. Likewise, 65535 char limit for string, again is arbitrary for an implementer of the framework with the most recent example coming from cisco, where they are looking to add env var strings, which can very well go beyond 65k chars. @pagbabian-splunk feel free to add more context if necessary.

Signed-off-by: Rajas Panat <[email protected]>
@rmouritzen-splunk
Copy link
Contributor

rmouritzen-splunk commented Sep 11, 2024

Excellent.

Minor point:

... SHA-512, which yield 128 chars.

SHA-512 is 64 characters (512 bits / 8 bits per byte = 64 bytes). (Edit: This is incorrect -- 64 bytes becomes 128 characters.)

The point stands though. There may well be longer hashes out there now or in the future.

@floydtree
Copy link
Contributor Author

SHA-512 is 64 characters (512 bits / 8 bits per byte = 64 bytes). The point stands though. There may well be longer hashes out there now or in the future.

Right 64bytes, wouldn't that be 128 hexadec chars?

@rmouritzen-splunk
Copy link
Contributor

Right 64bytes, wouldn't that be 128 hexadec chars?

Heh. Yup. I was thinking bytes, not hexadecimal characters, which double the length.

Signed-off-by: Rajas Panat <[email protected]>
@pagbabian-splunk
Copy link
Contributor

Right 64bytes, wouldn't that be 128 hexadec chars?

Heh. Yup. I was thinking bytes, not hexadecimal characters, which double the length.

Are you assuming Unicode characters? UTF-8 is usually what is considered, and yes, the limits may have been Java oriented but Java strings are actually not limited to 65K. I read somewhere they can be 2B characters.

@rmouritzen-splunk
Copy link
Contributor

Are you assuming Unicode characters? UTF-8 is usually what is considered, and yes, the limits may have been Java oriented but Java strings are actually not limited to 65K. I read somewhere they can be 2B characters.

This is getting in the weeds... but hey, we are all engineers and we like getting technical.

I was just saying that the string_t limit of 65,535 seemed suspiciously like Java's limit on string constants (string literals in source files). Java max string length is as you mention limited by its use of a signed 32-bit int for length, so 2 some billion. I also mentioned that Java string lengths are in terms of its UTF-16 code units (where a Unicode characters require 1 or 2 UTF-16 units), which makes checking string lengths messy in Java anyway.

But this is all moot. The main point is that the 65,535 length limit for OCSF's string_t should be removed as it was a bit arbitrary and is too short for some known cases.

@floydtree floydtree merged commit 2238718 into ocsf:main Sep 13, 2024
3 checks passed
@floydtree floydtree deleted the data_type_updates branch September 13, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non_breaking Non Breaking, backwards compatible changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants