Add Performance and Blocking specification (open-telemetry#130)

* Add Performance and Blocking specification Performance and Blocking specification is specified in a separate document and is linked from Language Library Design principles document. Implements issue: open-telemetry#94 * PR fix (open-telemetry#94). - Write about Metrics & Logging to cover entire API - Write about shut down / flush operations - Leave room for blocking implementation options (should not block "as default behavior") - Grammar & syntax fix * PR fix (open-telemetry#94). - Not limit for tracing, metrics. * PR fix (open-telemetry#94). - Mentioned about inevitable overhead - Shutdown may block, but it should support configurable timeout also * PR fix (open-telemetry#94) - s/traces/telemetry data/ - Syntax fix Co-Authored-By: Yang Song <[email protected]> * PR fix (open-telemetry#130) - Remove duplication with open-telemetry#186 - Mention about configurable timeout of flush operation * PR fix (open-telemetry#130) - Not specify default strategy (blocking or information loss)
carlosalberto · Jul 31, 2019 · f5518ea · f5518ea
1 parent 5e2a1e4
commit f5518ea
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 1 deletion.
diff --git a/specification/library-guidelines.md b/specification/library-guidelines.md
@@ -83,6 +83,13 @@ Note that mocking is also possible by using SDK and a Mock `Exporter` without ne
 
 The mocking approach chosen will depend on the testing goals and at which point exactly it is desirable to intercept the telemetry data path during the test.
 
+## Performance and Blocking
+
+See the [Performance and Blocking](performance.md) specification for
+guidelines on the performance expectations that API implementations should meet, strategies for meeting these expectations, and a description of how implementations should document their behavior under load.
+
 ## Concurrency and Thread-Safety
 
-See [Concurrency and Thread-Safety](concurrency.md) specification for guidelines on what concurrency safeties should API implementations provide and how they should be documented.
+See the [Concurrency and Thread-Safety](concurrency.md) specification for
+guidelines on what concurrency safeties should API implementations provide
+and how they should be documented.
diff --git a/specification/performance.md b/specification/performance.md
@@ -0,0 +1,44 @@
+# Performance and Blocking of OpenTelemetry API
+
+This document defines common principles that will help designers create language libraries that are safe to use. 
+
+## Key principles
+
+Here are the key principles:
+
+- **Library should not block end-user application by default.**
+- **Library should not consume unbounded memory resource.**
+
+Although there are inevitable overhead to achieve monitoring, API should not degrade the end-user application as possible. So that it should not block the end-user application nor consume too much memory resource.
+
+See also [Concurrency and Thread-Safety](concurrency.md) if the implementation supports concurrency.
+
+### Tradeoff between non-blocking and memory consumption
+
+Incomplete asynchronous I/O tasks or background tasks may consume memory to preserve their state. In such a case, there is a tradeoff between dropping some tasks to prevent memory starvation and keeping all tasks to prevent information loss.
+
+If there is such tradeoff in language library, it should provide the following options to end-user:
+
+- **Prevent information loss**: Preserve all information but possible to consume many resources
+- **Prevent blocking**: Dropping some information under overwhelming load and show warning log to inform when information loss starts and when recovered
+  - Should provide option to change threshold of the dropping
+  - Better to provide metric that represents effective sampling ratio
+  - Language library might provide this option for Logging
+
+### End-user application should be aware of the size of logs
+
+Logging could consume much memory by default if the end-user application emits too many logs. This default behavior is intended to preserve logs rather than dropping it. To make resource usage bounded, the end-user should consider reducing logs that are passed to the exporters.
+
+Therefore, the language library should provide a way to filter logs to capture by OpenTelemetry. End-user applications may want to log so much into log file or stdout (or somewhere else) but not want to send all of the logs to OpenTelemetry exporters.
+
+In a documentation of the language library, it is a good idea to point out that too many logs consume many resources by default then guide how to filter logs.
+
+### Shutdown and explicit flushing could block
+
+The language library could block the end-user application when it shut down. On shutdown, it has to flush data to prevent information loss. The language library should support user-configurable timeout if it blocks on shut down.
+
+If the language library supports an explicit flush operation, it could block also. But should support a configurable timeout.
+
+## Documentation
+
+If language specific implementation has special characteristics that are not described in this document, such characteristics should be documented.