GoogleCloudPlatform · olavloite · Nov 22, 2023 · Nov 12, 2023 · Nov 13, 2023 · Nov 13, 2023
diff --git a/README.md b/README.md
@@ -47,6 +47,10 @@ See [Frequently Asked Questions](docs/faq.md) for answers to frequently asked qu
 See [Latency Comparisons](benchmarks/latency-comparison/README.md) for benchmark comparisons between
 using PostgreSQL drivers with PGAdapter and using native Cloud Spanner drivers and client libraries.
 
+## Insights
+See [OpenTelemetry in PGAdapter](docs/open_telemetry.md) for how to use `OpenTelemetry` to collect
+and export traces to Google Cloud Trace.
+
 ## Usage
 PGAdapter can be started both as a Docker container, a standalone process as well as an
 in-process server (the latter is only supported for Java and other JVM-based applications).

diff --git a/benchmarks/ycsb/run.sh b/benchmarks/ycsb/run.sh
@@ -10,7 +10,7 @@ if [[ "$DATABASES" != *"$SPANNER_DATABASE"* ]]; then
 fi
 SPANNER_PROJECT=$(gcloud --quiet config get project)
 
-java -jar pgadapter.jar -p $SPANNER_PROJECT -i $SPANNER_INSTANCE -r="minSessions=1000;maxSessions=1000;numChannels=20" &
+java -jar pgadapter.jar -p $SPANNER_PROJECT -i $SPANNER_INSTANCE -enable_otel -r="minSessions=1000;maxSessions=1000;numChannels=20" &
 sleep 6
 export PGDATABASE=$SPANNER_DATABASE
 psql -h localhost -c "CREATE TABLE IF NOT EXISTS usertable (

diff --git a/docs/faq.md b/docs/faq.md
@@ -17,6 +17,9 @@ You can run [these latency comparison benchmarks](../benchmarks/latency-comparis
 the difference between using PostgreSQL drivers with PGAdapter and using native Cloud Spanner drivers
 and client libraries.
 
+### How can I get insights into execution times in PGAdapter?
+PGAdapter supports [using OpenTelemetry](open_telemetry.md) to collect and export traces to Google Cloud Trace.
+
 ## Connection Options
 
 ### How can I specify the credentials that should be used for a connection?

diff --git a/docs/img/dml_batch_trace_sample.png b/docs/img/dml_batch_trace_sample.png
diff --git a/docs/img/dml_trace_sample.png b/docs/img/dml_trace_sample.png
diff --git a/docs/img/query_trace_sample.png b/docs/img/query_trace_sample.png
diff --git a/docs/img/trace_filter_connection_id.png b/docs/img/trace_filter_connection_id.png
diff --git a/docs/img/trace_filter_retry_attempt.png b/docs/img/trace_filter_retry_attempt.png
diff --git a/docs/img/trace_filter_transaction_id.png b/docs/img/trace_filter_transaction_id.png
diff --git a/docs/open_telemetry.md b/docs/open_telemetry.md
@@ -0,0 +1,187 @@
+# Open Telemetry in PGAdapter
+
+PGAdapter supports Open Telemetry tracing. You can enable this by adding the `-enable_otel` command
+line argument when starting PGAdapter.
+
+Optionally, you can also set a trace sample ratio to limit the number of traces that will be
+collected and set with the `-otel_trace_ratio=<ratio>` command line argument. If you omit this
+argument, PGAdapter will use a `0.05` trace ratio. The ratio must be in the range `[0.0, 1.0]`.
+
+Example:
+
+```shell
+docker run \
+  -d -p 5432:5432 \
+  -v /path/to/credentials.json:/credentials.json:ro \
+  gcr.io/cloud-spanner-pg-adapter/pgadapter \
+  -p my-project -i my-instance \
+  -c /credentials.json -x \
+  -enable_otel -otel_trace_ratio=0.1
+```
+
+## Exporter
+PGAdapter will by default export traces to [Google Cloud Trace](https://cloud.google.com/trace).
+PGAdapter uses the [OpenTelemetry SDK Autoconfigure module](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/README.md)
+for configuring OpenTelemetry. You can use the options of this module to configure another exporter.
+
+PGAdapter uses a different set of default values for the following OpenTelemetry Autoconfigure
+properties:
+- `otel.traces.exporter`: `none`
+- `otel.metrics.exporter`: `none`
+- `otel.logs.exporter`: `none`
+
+In addition, PGAdapter always adds an exporter for Google Cloud Trace.
+
+## Traces
+
+The traces are by default exported to [Google Cloud Trace](https://cloud.google.com/trace). The
+traces follow the structure of the PostgreSQL wire-protocol. Traces typically consist of the
+following spans:
+
+1. `query_protocol_handler`: The [PostgreSQL wire-protocol](https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY)
+   consists of multiple messages to parse, describe, bind, and execute SQL statements. Clients and
+   drivers will send multiple of these to PGAdapter, and PGAdapter will translate these into a set
+   of RPC invocations on CLoud Spanner. PGAdapter minimizes the number of RPC invocations that is
+   needed to respond to these messages. One span of `query_protocol_handler` consists of all the
+   round-trips that PGAdapter needed to respond to the messages that it received during one query
+   session.
+2. `execute` / `analyze` / `execute_batch`: PGAdapter will translate the incoming messages into
+   `execute`, `analyze`, and `execute_batch` actions on Cloud Spanner. One span of
+   `query_protocol_handler` normally consists of one or more of these actions.
+3. `execute_on_spanner`: This span contains the time when PGAdapter hands off the execution of a
+   statement to the Cloud Spanner Java client, and is roughly identical to the end-to-end
+   execution of the statement.
+4. `execute_batch`: This span indicates a batch of DML or DDL statements that are being received
+   and buffered by PGAdapter. The statements will be sent as one batch to Cloud Spanner for
+   execution. The actual execution of the batch is recorded in the `execute_batch_on_spanner` span.
+5. `send_result_set`: This span is only recorded for queries that send rows back to the client. It
+   shows the time that it took for PGAdapter to receive all rows from Cloud Spanner and send these
+   to the client. This span will be longer for queries that return a large number of rows. The time
+   it takes to send the results to the client also depends on how quickly the client application can
+   consume the rows that are being sent, and on the network speed between PGAdapter and Cloud
+   Spanner.
+
+### Examples
+
+The following sections show samples of commonly executed statements. All spans that are exported by
+Cloud Spanner contain the following attributes:
+1. `db.statement`: The SQL statement that is being executed.
+2. `pgadapter.connection_id`: Each connection in PGAdapter is assigned a random ID. You can use this
+   ID to filter spans for a specific connection.
+3. `pgadapter.transaction_id`: Each transaction in PGAdapter is assigned a random ID. YOu can use
+   this ID to filter spans for a specific transaction. This field is not present for statements that
+   are executed outside an (explicit) transaction.
+
+#### Query (SELECT)
+
+A typical execution of a query consists of the following spans:
+1. `query_protocol_handler`: The front-end query protocol handler of PGAdapter that receives
+   messages from the client and returns results to the client. The query handler normally receives
+   five messages from the client when executing a query:
+    1. `P` (Parse): Parse the query string.
+    2. `B` (Bind): Bind the query parameter values.
+    3. `D` (Describe): Describe the query result (which columns will be returned by the query).
+    4. `E` (Execute): Execute the statement and return the results.
+    5. `S` (Sync): Flush all buffered messages and return all results.
+2. `execute`: The front-end query protocol handler requests the backend connection to Cloud Spanner
+   to execute the query that it constructed from the messages it received.
+3. `execute_on_spanner`: This the actual RPC invocation on Cloud Spanner.
+4. `send_result_set`: Shows the time it takes to receive all rows from Cloud Spanner and the time it
+   takes to send these to the client application.
+
+![PGAdapter Cloud Trace - Query example](img/query_trace_sample.png?raw=true "PGAdapter Cloud Trace - Query example")
+
+The `query_protocol_handler` span also includes the time it takes to send the results to the client.
+This time will also depend on the time that the client requires to actually receive and process the
+results. You can see this time in the example above as the time that the `query_protocol_handler`
+span is active after the statement has been executed.
+
+#### DML (INSERT/UPDATE/DELETE)
+
+A typical execution of a DML statement consists of the following spans:
+1. `query_protocol_handler`: The front-end query protocol handler of PGAdapter that receives
+   messages from the client and returns results to the client. The query handler normally receives
+   five messages from the client when executing a DML statement:
+    1. `P` (Parse): Parse the DML string.
+    2. `B` (Bind): Bind the DML parameter values.
+    3. `D` (Describe): Describe the result of the statement. For DML statement, this is an empty result.
+    4. `E` (Execute): Execute the statement and return the update count.
+    5. `S` (Sync): Flush all buffered messages and return all results.
+2. `execute`: The front-end query protocol handler requests the backend connection to Cloud Spanner
+   to execute the DML statement that it constructed from the messages it received.
+3. `execute_on_spanner`: This the actual RPC invocation on Cloud Spanner.
+
+![PGAdapter Cloud Trace - DML example](img/dml_trace_sample.png?raw=true "PGAdapter Cloud Trace - DML example")
+
+The `query_protocol_handler` span also includes the time it takes to send the update count to the
+client. This time is negligible compared to sending a large result set from PGAdapter to the client.
+
+#### Batch
+
+The PostgreSQL wire-protocol also supports batching multiple statements into one execution. This is
+done by sending multiple `E` (Execute) messages to the server (PGAdapter) without sending an `S`
+(Sync) message. This will allow the server to buffer the statements and wait with sending the
+results of these until it sees an `S` (Sync) message. A typical execution of a batch of DML
+statements consists of the following spans:
+1. `query_protocol_handler`: The front-end query protocol handler of PGAdapter that receives
+   messages from the client and returns results to the client. The query handler normally receives
+   four messages for each DML statement in a batch from the client:
+   1. `P` (Parse): Parse the DML string.
+   2. `B` (Bind): Bind the DML parameter values.
+   3. `D` (Describe): Describe the result of the statement. For DML statement, this is an empty result.
+   4. `E` (Execute): Execute the statement and return the update count.
+2. It then receives an `S` (Sync) message when the client has sent all the DML statements. 
+3. `execute_batch`: The front-end query protocol handler requests the backend connection
+   to create a DML batch, buffer the DML statements on the client, and then execute the statements
+   on Cloud Spanner. The total time of this span is the time it took to buffer all the statements on
+   the client and to execute the batch on Cloud Spanner.  
+4. `buffer`: For each DML statement, a `buffer` span is created within the DML batch. These
+   statements are not yet executed on spanner, but buffered in the Spanner client until
+   all the DML statements have been collected.
+5. `execute_batch_on_spanner`: Once all DML statements have been collected, PGAdapter creates one
+   batch and sends this as one request to Cloud Spanner. This span shows the time it takes to
+   execute this batch request on Cloud Spanner.
+
+![PGAdapter Cloud Trace - Batch DML example](img/dml_batch_trace_sample.png?raw=true "PGAdapter Cloud Trace - Batch DML example")
+
+## Frequently Asked Questions
+
+#### How can I find all the statements that were executed on the same connection as the trace I'm looking at?
+
+All traces include an attribute `pgadapter.connection_id`. This connection ID is randomly chosen,
+but remains the same for a connection during its lifetime. You can use this ID to look up all other
+traces for the same connection by adding a filter `pgadapter.connection_id:<id>`.
+
+![PGAdapter Cloud Trace - Filter connection id](img/trace_filter_connection_id.png?raw=true "PGAdapter Cloud Trace - Filter connection id")
+
+#### How can I find all the statements in the same transaction?
+
+All traces for statements that used an explicit transaction include an attribute `pgadapter.transaction_id`.
+This transaction ID is randomly chosen, but remains the same for a transaction during its lifetime.
+You can use this ID to look up all other traces for the same transaction by adding a filter `pgadapter.transaction_id:<id>`.
+
+![PGAdapter Cloud Trace - Filter transaction id](img/trace_filter_transaction_id.png?raw=true "PGAdapter Cloud Trace - Filter transaction id")
+
+#### How can I find statements that waited for an aborted read/write transaction to retry?
+
+Cloud Spanner can abort any read/write transaction. A transaction can be aborted for various
+reasons:
+- Lock conflicts
+- Database schema changes
+- Too long inactivity (a read/write transaction is aborted after 10 seconds of inactivity)
+
+But it can also be aborted for internal technical reasons in Cloud Spanner. PGAdapter will attempt
+to retry transactions that are aborted by Cloud Spanner. This can cause some statements to seem to
+execute much slower than they should, as they have to wait for the internal retry attempt to finish
+first. You can find statements that waited for a transaction retry by adding the filter
+`HasLabel:pgadapter.retry_attempt`.
+
+![PGAdapter Cloud Trace - Filter for retried transactions](img/trace_filter_retry_attempt.png?raw=true "PGAdapter Cloud Trace - Filter for retried transactions")
+
+Note from the above screenshot:
+1. The retry internal retry is initiated and executed as part of the statement execution. This makes
+   it seem like the statement itself takes long to execute.
+2. Cloud Spanner also sends the client a back-off value that it should wait before retrying the
+   transaction. This is the reason that the total execution time of the statements seems to be much
+   longer than the transaction retry, as the Java client library will wait for this back-off time
+   before actually starting the retry.
diff --git a/pom.xml b/pom.xml
@@ -87,6 +87,13 @@
         <type>pom</type>
         <scope>import</scope>
       </dependency>
+      <dependency>
+        <groupId>io.opentelemetry</groupId>
+        <artifactId>opentelemetry-bom</artifactId>
+        <version>1.31.0</version>
+        <type>pom</type>
+        <scope>import</scope>
+      </dependency>
     </dependencies>
   </dependencyManagement>
 
@@ -105,6 +112,45 @@
       <groupId>com.google.cloud</groupId>
       <artifactId>google-cloud-spanner</artifactId>
     </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-api</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-sdk</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-sdk-metrics</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-exporter-otlp</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-exporter-logging</artifactId>
+    </dependency>
+    <dependency>
+      <!-- Not managed by opentelemetry-bom -->
+      <groupId>io.opentelemetry.semconv</groupId>
+      <artifactId>opentelemetry-semconv</artifactId>
+      <version>1.22.0-alpha</version>
+    </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-sdk-extension-autoconfigure</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.opentelemetry</groupId>
+      <artifactId>opentelemetry-sdk-extension-autoconfigure-spi</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>com.google.cloud.opentelemetry</groupId>
+      <artifactId>exporter-trace</artifactId>
+      <version>0.15.0</version>
+    </dependency>
     <dependency>
       <groupId>org.postgresql</groupId>
       <artifactId>postgresql</artifactId>

diff --git a/src/main/java/com/google/cloud/spanner/pgadapter/ConnectionHandler.java b/src/main/java/com/google/cloud/spanner/pgadapter/ConnectionHandler.java
@@ -81,6 +81,7 @@
 import java.util.Map.Entry;
 import java.util.Properties;
 import java.util.Set;
+import java.util.UUID;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.Future;
 import java.util.concurrent.atomic.AtomicInteger;
@@ -123,6 +124,10 @@ public class ConnectionHandler extends Thread {
   // Separate the following from the threat ID generator, since PG connection IDs are maximum
   //  32 bytes, and shouldn't be incremented on failed startups.
   private static final AtomicInteger incrementingConnectionId = new AtomicInteger(0);
+
+  /** Randomly generated UUID that is included in tracing to identify a connection. */
+  private final UUID traceConnectionId = UUID.randomUUID();
+
   private ConnectionMetadata connectionMetadata;
   private WireMessage message;
   private int invalidMessagesCount;
@@ -708,6 +713,10 @@ public int getSecret() {
     return this.secret;
   }
 
+  public UUID getTraceConnectionId() {
+    return traceConnectionId;
+  }
+
   public void setMessageState(WireMessage message) {
     this.message = this.server.recordMessage(message);
   }

diff --git a/src/main/java/com/google/cloud/spanner/pgadapter/ProxyServer.java b/src/main/java/com/google/cloud/spanner/pgadapter/ProxyServer.java
@@ -26,6 +26,7 @@
 import com.google.cloud.spanner.pgadapter.statements.IntermediateStatement;
 import com.google.cloud.spanner.pgadapter.wireprotocol.WireMessage;
 import com.google.common.collect.ImmutableList;
+import io.opentelemetry.api.OpenTelemetry;
 import java.io.File;
 import java.io.IOException;
 import java.net.InetAddress;
@@ -54,6 +55,7 @@ public class ProxyServer extends AbstractApiService {
 
   private static final Logger logger = Logger.getLogger(ProxyServer.class.getName());
   private final OptionsMetadata options;
+  private final OpenTelemetry openTelemetry;
   private final Properties properties;
   private final List<ConnectionHandler> handlers = Collections.synchronizedList(new LinkedList<>());
 
@@ -78,29 +80,33 @@ public class ProxyServer extends AbstractApiService {
   private final ConcurrentLinkedQueue<WireMessage> debugMessages = new ConcurrentLinkedQueue<>();
   private final AtomicInteger debugMessageCount = new AtomicInteger();
 
+  ProxyServer(OptionsMetadata optionsMetadata) {
+    this(optionsMetadata, OpenTelemetry.noop());
+  }
+
   /**
    * Instantiates the ProxyServer from CLI-gathered metadata.
    *
    * @param optionsMetadata Resulting metadata from CLI.
+   * @param openTelemetry The {@link OpenTelemetry} to use to collect metrics
    */
-  public ProxyServer(OptionsMetadata optionsMetadata) {
-    this.options = optionsMetadata;
-    this.localPort = optionsMetadata.getProxyPort();
-    this.properties = new Properties();
-    this.debugMode = optionsMetadata.isDebugMode();
-    addConnectionProperties();
+  public ProxyServer(OptionsMetadata optionsMetadata, OpenTelemetry openTelemetry) {
+    this(optionsMetadata, openTelemetry, new Properties());
   }
 
   /**
    * Instantiates the ProxyServer from metadata and properties. For use with in-process invocations.
    *
    * @param optionsMetadata Resulting metadata from CLI.
-   * @param properties Properties for specificying additional information to JDBC like an external
+   * @param openTelemetry The {@link OpenTelemetry} to use to collect metrics
+   * @param properties Properties for specifying additional information to JDBC like an external
    *     channel provider (see ConnectionOptions in Java Spanner client library for more details on
    *     supported properties).
    */
-  public ProxyServer(OptionsMetadata optionsMetadata, Properties properties) {
+  public ProxyServer(
+      OptionsMetadata optionsMetadata, OpenTelemetry openTelemetry, Properties properties) {
     this.options = optionsMetadata;
+    this.openTelemetry = openTelemetry;
     this.localPort = optionsMetadata.getProxyPort();
     this.properties = properties;
     this.debugMode = optionsMetadata.isDebugMode();
@@ -344,6 +350,10 @@ public OptionsMetadata getOptions() {
     return this.options;
   }
 
+  public OpenTelemetry getOpenTelemetry() {
+    return this.openTelemetry;
+  }
+
   /** @return the JDBC connection properties that are used by this server */
   public Properties getProperties() {
     return (Properties) this.properties.clone();