diff --git a/text/0187-data-classification.md b/text/0187-data-classification.md index 136835d37..44d8ff2ab 100644 --- a/text/0187-data-classification.md +++ b/text/0187-data-classification.md @@ -1,13 +1,12 @@ # Introduce Data Classification for Telemetry -Adding optional classification to the attributes and resources to support simplified processing of data. - +Adding optional classification to the attributes and resources to support simplified processing of data. ## Motivation As the scope of Observability changes to include user monitoring and analytics (Real User Monitoring for now), ensuring that data is handled correctly is becoming more problematic as the problem space changes to include more. The need for pre processing data in order that high cardinality, sensitive user data, and handling sparse data impacts the reliability and usability of the data. Furthermore, some vendors are not able to selectively remove sensitive user data or some agreements with customers mandate that sensitive data not to leave the internal edge. Moreover, attribute values that are known high cardinality (for example a container id within a cloud based environment) that are expensive to store within observability systems, making it easier to omit attributes as part of exporter or collector configuration can greatly reduce cost of the observability suite.; not all attributes and resource data hold the same value or have the same processing requirements. -Updating the SDK to include the convention of data classification would mean that users can extend their current telemetry via an opt in process to enable enhanced controls over the telemetry data without impacting the developer experience. Instrumentation authors are not limited by what attributes they can use when developing their telemetry and not a hindrance for existing standards within Open Telemetry. Organisations can define what resource classifications are allowed to be processed by an observability suite, what is subject to additional processing, and understand the types of data being sent. For example, all high cardinality data can be sent to on prem short term storage and then forwarded to a vendor with high cardinality attributes removed so that it can be viewed over a longer time period to show trends. +Updating the SDK to include the convention of data classification would mean that users can extend their current telemetry via an opt in process to enable enhanced controls over the telemetry data without impacting the developer experience. Instrumentation authors are not limited by what attributes they can use when developing their telemetry and not a hindrance for existing standards within Open Telemetry. Organisations can define what resource classifications are allowed to be processed by an observability suite, what is subject to additional processing, and understand the types of data being sent. For example, all high cardinality data can be sent to on prem short term storage and then forwarded to a vendor with high cardinality attributes removed so that it can be viewed over a longer time period to show trends. Having the ability to set data classification on resources means that: @@ -21,8 +20,6 @@ Ensure that sparse data is not sampled out by mistake This approach can also be extended by vendors to enable more advanced features in their tool such as high resolution, short retention, data residency, or an endless list of features that can be adopted. - - ## Explanation A service that adopts using resource classifications allows the open telemetry exporters to route data, remove problematic attributes, or filter values being sent to downstream systems. Using classifications simplify the existing processes since it does not use a lookup table that needs to be maintained, regular expressions needing to be maintained, required user intervention of what is acceptable or not, or needing to validate the entire resource object thus meaning no performance impact. @@ -37,9 +34,7 @@ The following are examples of how a service owner or a on call engineer could ad > An external observability vendor is not able to offer protections on the resource data it can accept, however the collector can be set up as a proxy to that vendor and add the required protection that the vendor can not currently offer. - - -An Instrumentation Author can add additional metadata can be set on the attributes that will be added to the resource object, the example shows how an to configure a middleware in Golang using classifications: +An Instrumentation Author can add additional metadata can be set on the attributes that will be added to the resource object, the example shows how an to configure a middleware in Golang using classifications: ```go // Middleware implements the semantic convention for http trace spans but @@ -73,7 +68,7 @@ func Middleware(next http.Handler) http.Handler { } ``` -Once classification have been added to the attribute keys, a tracing sampler can +Once classification have been added to the attribute keys, a tracing sampler can ```go func (ts TailBasedSampler) Sample(spans trace.Spans) (export trace.Spans) { @@ -138,7 +133,6 @@ service: - external-trace-vendor ``` - ## Internal details To add classifications support into the SDK, the OTLP definition would need to extend the definitions used by Attributes to include a new field that will be used to store a bit mask and the the Resource type (datapoint, span, message) would also need to be extended to add a bit mask field. @@ -179,22 +173,22 @@ message LogRecord { + int64 classification = 11; } ``` -Once that has been update in the protobuf definition has been published, a classification list will need to be defined as part of the SDK with space for vendors to allow for their own classification definitions. +Once that has been update in the protobuf definition has been published, a classification list will need to be defined as part of the SDK with space for vendors to allow for their own classification definitions. A classification MAY contain multiple classification, therefore a classification value MUST not overlap with other classification so that they can be combined together using bitwise inclusive or. The SDK will reserve the 32 Least Significant Bits (LSB) to allow for future additions that are yet to be considered; allow vendors to define values for the 32 Most Significant bits to define as they see fit. A staring list can look like the following : -| Classification | Purpose | BitShift | Hint Value (base 16) | bit mask | -|----------------------------------- |------------------------------------------------------------------------------------------------------ |---------- |---------------------- |--------------------- | -| No Value | Is the default value when un set | 0 << 0 | 0x0000 | 0000 0000 0000 0000 | -| Ephemeral | The attributes that are short lived and have high potential to change over extended periods of time. | 1 << 0 | 0x0001 | 0000 0000 0000 0001 | -| High Cardinality | The value is an unbounded set | 1 << 1 | 0x0002 | 0000 0000 0000 0001 | -| Sensitive Value | The value MAY contain information that requires sanitisation | 1 << 2 | 0x0004 | 0000 0000 0000 0100 | -| Personal Identifiable Information | The value DOES contain Personal Identifiable Information | 1 << 3 | 0x0008 | 0000 0000 0000 1000 | -| User Generated Content | The value DOES contain User Generated Content | 1 << 4 | 0x0010 | 0000 0000 0001 0000 | -| Service Level Objective | The value is used to track a service level objective | 1 << 5 | 0x0020 | 0000 0000 0010 0000 | +| Classification | Purpose | BitShift | Hint Value (base 16) | bit mask | +|----------------------------------- |------------------------------------------------------------------------------------------------------ |---------- |---------------------- |--------------------- | +| No Value | Is the default value when un set | 0 << 0 | 0x0000 | 0000 0000 0000 0000 | +| Ephemeral | The attributes that are short lived and have high potential to change over extended periods of time. | 1 << 0 | 0x0001 | 0000 0000 0000 0001 | +| High Cardinality | The value is an unbounded set | 1 << 1 | 0x0002 | 0000 0000 0000 0001 | +| Sensitive Value | The value MAY contain information that requires sanitisation | 1 << 2 | 0x0004 | 0000 0000 0000 0100 | +| Personal Identifiable Information | The value DOES contain Personal Identifiable Information | 1 << 3 | 0x0008 | 0000 0000 0000 1000 | +| User Generated Content | The value DOES contain User Generated Content | 1 << 4 | 0x0010 | 0000 0000 0001 0000 | +| Service Level Objective | The value is used to track a service level objective | 1 << 5 | 0x0020 | 0000 0000 0010 0000 | In order to support the idea, the required code to be implemented is: @@ -279,8 +273,8 @@ func RemoveMatchedAttributes(c Classification) func(t T) T { ## Trade-offs and mitigations -- Going forward with this does imply a lot of ownership on the USER developing the instrumentation code, and a lot of potential cognitive load perceived with it. - - This only works with USERs implementing their code using OTEL instrumentation. +- Going forward with this does imply a lot of ownership on the USER developing the instrumentation code, and a lot of potential cognitive load perceived with it. + - This only works with USERs implementing their code using OTEL instrumentation. - Converting from non OTLP formats to OTLP would need depend on processors / converters to support this, however doing so will likely impact performance. - Managing classifications within the semantic convention has not be explored - The tolerance of what is considered a given classification could change between organisations @@ -292,20 +286,20 @@ There was a conversation that was related here on attributes that should be subj The alternatives that were considered are: - The Semantic Convention defines classification for known attributes - - this was no considered heavily since it misses USERS (developers) whom are using the SDK natively and then would need to manage their own Semantic Convention that would also need to be understood by the down stream processors and vendors - - Does not account for version drift with a breaking change involved - - Does a processor need to know all versions? - - Does that mean values can not be changed in the semantic convention? + - this was no considered heavily since it misses USERS (developers) whom are using the SDK natively and then would need to manage their own Semantic Convention that would also need to be understood by the down stream processors and vendors + - Does not account for version drift with a breaking change involved + - Does a processor need to know all versions? + - Does that mean values can not be changed in the semantic convention? - Appending prefixes/suffixes to attribute keys - - String matching is expensive operation and would considerable slow down the current perform wins that the project strives for - - Can not allow for multiple definitions to be set on one attribute - - Has the potential to overlap with an exciting keys and require USERs to do a migration of what attributes they are already sending - - Unneeded work that should be avoided - - Limits what USERs can define their attributes keys - - The ownership is then moved to processors and vendors to set the definition - - Goes against the projects goal of being vendor neutral - - Each vendor processor could define different values (same as above) + - String matching is expensive operation and would considerable slow down the current perform wins that the project strives for + - Can not allow for multiple definitions to be set on one attribute + - Has the potential to overlap with an exciting keys and require USERs to do a migration of what attributes they are already sending + - Unneeded work that should be avoided + - Limits what USERs can define their attributes keys + - The ownership is then moved to processors and vendors to set the definition + - Goes against the projects goal of being vendor neutral + - Each vendor processor could define different values (same as above) ## Open questions @@ -327,6 +321,6 @@ Some ideas of future possibilities have been eluded to throughout the proposal, - Vendor neutral data protection mechanisms - Enforcing Data Regulations on telemetry - Classification based routing - - An exporter can only with certain set of classifications + - An exporter can only with certain set of classifications -Thinking more broadly, it would mean that Open Telemetry could easily extend to support analytical based telemetry on user interactions while adhering to data regulation or organisational policies. \ No newline at end of file +Thinking more broadly, it would mean that Open Telemetry could easily extend to support analytical based telemetry on user interactions while adhering to data regulation or organisational policies.