Skip to content

Commit

Permalink
Add Performance and Blocking specification (open-telemetry#130)
Browse files Browse the repository at this point in the history
* Add Performance and Blocking specification

Performance and Blocking specification is specified in a separate document and
is linked from Language Library Design principles document.

Implements issue: open-telemetry#94

* PR fix (open-telemetry#94).

- Write about Metrics & Logging to cover entire API
- Write about shut down / flush operations
- Leave room for blocking implementation options (should not block "as default behavior")
- Grammar & syntax fix

* PR fix (open-telemetry#94).

- Not limit for tracing, metrics.

* PR fix (open-telemetry#94).

- Mentioned about inevitable overhead
- Shutdown may block, but it should support configurable timeout also

* PR fix (open-telemetry#94)

- s/traces/telemetry data/
- Syntax fix

Co-Authored-By: Yang Song <[email protected]>

* PR fix (open-telemetry#130)

- Remove duplication with open-telemetry#186
- Mention about configurable timeout of flush operation

* PR fix (open-telemetry#130)

- Not specify default strategy (blocking or information loss)
  • Loading branch information
saiya authored and carlosalberto committed Jul 31, 2019
1 parent 5e2a1e4 commit f5518ea
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 1 deletion.
9 changes: 8 additions & 1 deletion specification/library-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,13 @@ Note that mocking is also possible by using SDK and a Mock `Exporter` without ne

The mocking approach chosen will depend on the testing goals and at which point exactly it is desirable to intercept the telemetry data path during the test.

## Performance and Blocking

See the [Performance and Blocking](performance.md) specification for
guidelines on the performance expectations that API implementations should meet, strategies for meeting these expectations, and a description of how implementations should document their behavior under load.

## Concurrency and Thread-Safety

See [Concurrency and Thread-Safety](concurrency.md) specification for guidelines on what concurrency safeties should API implementations provide and how they should be documented.
See the [Concurrency and Thread-Safety](concurrency.md) specification for
guidelines on what concurrency safeties should API implementations provide
and how they should be documented.
44 changes: 44 additions & 0 deletions specification/performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Performance and Blocking of OpenTelemetry API

This document defines common principles that will help designers create language libraries that are safe to use.

## Key principles

Here are the key principles:

- **Library should not block end-user application by default.**
- **Library should not consume unbounded memory resource.**

Although there are inevitable overhead to achieve monitoring, API should not degrade the end-user application as possible. So that it should not block the end-user application nor consume too much memory resource.

See also [Concurrency and Thread-Safety](concurrency.md) if the implementation supports concurrency.

### Tradeoff between non-blocking and memory consumption

Incomplete asynchronous I/O tasks or background tasks may consume memory to preserve their state. In such a case, there is a tradeoff between dropping some tasks to prevent memory starvation and keeping all tasks to prevent information loss.

If there is such tradeoff in language library, it should provide the following options to end-user:

- **Prevent information loss**: Preserve all information but possible to consume many resources
- **Prevent blocking**: Dropping some information under overwhelming load and show warning log to inform when information loss starts and when recovered
- Should provide option to change threshold of the dropping
- Better to provide metric that represents effective sampling ratio
- Language library might provide this option for Logging

### End-user application should be aware of the size of logs

Logging could consume much memory by default if the end-user application emits too many logs. This default behavior is intended to preserve logs rather than dropping it. To make resource usage bounded, the end-user should consider reducing logs that are passed to the exporters.

Therefore, the language library should provide a way to filter logs to capture by OpenTelemetry. End-user applications may want to log so much into log file or stdout (or somewhere else) but not want to send all of the logs to OpenTelemetry exporters.

In a documentation of the language library, it is a good idea to point out that too many logs consume many resources by default then guide how to filter logs.

### Shutdown and explicit flushing could block

The language library could block the end-user application when it shut down. On shutdown, it has to flush data to prevent information loss. The language library should support user-configurable timeout if it blocks on shut down.

If the language library supports an explicit flush operation, it could block also. But should support a configurable timeout.

## Documentation

If language specific implementation has special characteristics that are not described in this document, such characteristics should be documented.

0 comments on commit f5518ea

Please sign in to comment.