feat: Implement PDP/DataProxy tracing protocol in the binary engine #3505

miguelff · 2022-12-14T17:48:52Z

See #3618 instead

Supersedes #3112
Tracked in https://github.com/prisma/client-planning/issues/158
Sibling to prisma/prisma#16748

To be released after the prep refactoring in #3616
Closes https://github.com/prisma/client-planning/issues/138
Closes https://github.com/prisma/client-planning/issues/144
Closes https://github.com/prisma/client-planning/issues/145

Implements traces and log capturing in concordance with the (private link) contract specification

The technical design is best explained throw the interaction diagram below:

In the diagram, you will see objects whose lifetime is static. The boxes for those have a double
width margin. These are:

The server itself
The global TRACER, which handles log! and span! and uses the global PROCESSOR to
process the data constituting a trace Spans and log Events
The global PROCESSOR, which manages the Storage set of data structures, holding logs,
traces (and capture settings) per request.

Then, through the request lifecycle, different objects are created and dropped:

When a request comes in, its headers are processed and a [Settings] object is built, this
object determines, for the request, how logging and tracing are going to be captured: if only
traces, logs, or both, and which log levels are going to be captured.
Based on the settings, a new Capturer is created; a capturer is nothing but an exporter
wrapped to start capturing / fetch the captures for this particular request.

Then the capturing process works in this way:

The server receives a query [1]
It grabs the HTTP headers and builds a Capture object [2], which is configured with the settings
denoted by the X-capture-telemetry
Now the server tells the Capturer to start capturing all the logs and traces occurring on
the request [3] (denoted by a trace_id) The trace_id is either carried on the traceparent
header or implicitly created on the first span of the request. To start capturing implies
creating for the trace_id in two different data structures: logs and traces; and storing
the settings for selecting the Spans and Event to capture [4].
The server dispatches the request and Somewhere else in the code, it is processed [5].
There the code logs events and emits traces asynchronously, as part of the processing [6]
Traces and Logs arrive at the TRACER, and get hydrated in the PROCESSOR [7],
which writes them in the shard corresponding to the current trace_id, into the
logs and traces storage [8]. The settings previously stored under the trace_id
key, are used to pick which events and spans are going to be captured based on their level.
When the code that dispatches the request is done it returns a PrismaResponse to the
server [9].
Then the server asks the PROCESSOR to fetch the captures [10]
And right after that, it fetches the captures from the Storage [11]. At that time, although
that's not represented in the diagram, the captures are deleted from the storage, thus
freeing any memory used for capturing during the request
Finally, the server sets the logs and traces extensions in the PrismaResponse[12],
it serializes the extended response in json format and returns it as an HTTP Response
blob [13].

Sample use cases

Server configuration: LOG_QUERIES=y QE_LOG_LEVEL=TRACE query-engine --enable-telemetry-in-response that log level determines trace spans and events up to that level of specificity will be generated. If that was e.g. INFO, trace or debug span events won't appear in the response.

Requested both query logs, and tracing events:

query {
  findFirstUser {
    id
  }
}

{"X-capture-telemetry": "query,tracing", "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}

Response

{
  "data": {
    "findFirstUser": {
      "id": "clbmo34ul0000rb9qk1lyds45"
    }
  },
  "extensions": {
    "traces": [
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "7bd99b5853f8d089",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:build_graph",
        "start_time": [
          1672158025,
          881801000
        ],
        "end_time": [
          1672158025,
          884372000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "6e699af2b490bdf7",
        "parent_span_id": "3f839889df49a93d",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158025,
          884831000
        ],
        "end_time": [
          1672158025,
          903142000
        ],
        "attributes": {
          "db.statement": "SELECT 1"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "3f839889df49a93d",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:connection",
        "start_time": [
          1672158025,
          884444000
        ],
        "end_time": [
          1672158025,
          903852000
        ],
        "attributes": {
          "db.type": "postgres"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "fca9180d4bf1051c",
        "parent_span_id": "3b4097ee973907f7",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158025,
          906643000
        ],
        "end_time": [
          1672158025,
          913675000
        ],
        "attributes": {
          "db.statement": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-00000000000000003b4097ee973907f7-01 */"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "3b4097ee973907f7",
        "parent_span_id": "24b4ea79b5d76326",
        "name": "filter read query",
        "start_time": [
          1672158025,
          906075000
        ],
        "end_time": [
          1672158025,
          913943000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "24b4ea79b5d76326",
        "parent_span_id": "430a13c07f515f10",
        "name": "prisma:engine:read-execute",
        "start_time": [
          1672158025,
          904796000
        ],
        "end_time": [
          1672158025,
          914101000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "430a13c07f515f10",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:interpret",
        "start_time": [
          1672158025,
          904378000
        ],
        "end_time": [
          1672158025,
          914129000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "06144a592e692bf3",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:serialize",
        "start_time": [
          1672158025,
          914184000
        ],
        "end_time": [
          1672158025,
          914446000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "af3ec494b0b8a4ff",
        "parent_span_id": "00f067aa0ba902b7",
        "name": "prisma:engine",
        "start_time": [
          1672158025,
          875028000
        ],
        "end_time": [
          1672158025,
          914560000
        ]
      }
    ],
    "logs": [
      {
        "span_id": "6e699af2b490bdf7",
        "name": "SELECT 1",
        "level": "query",
        "timestamp": [
          1672158025,
          884831000
        ],
        "attributes": {}
      },
      {
        "span_id": "fca9180d4bf1051c",
        "name": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-00000000000000003b4097ee973907f7-01 */",
        "level": "query",
        "timestamp": [
          1672158025,
          906643000
        ],
        "attributes": {}
      }
    ]
  }
}

Requested only query logs:

query {
  findFirstUser {
    id
  }
}

{"X-capture-telemetry": "query", "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}

Response

{
  "data": {
    "findFirstUser": {
      "id": "clbmo34ul0000rb9qk1lyds45"
    }
  },
  "extensions": {
    "traces": [],
    "logs": [
      {
        "span_id": "699bd0e58f094fac",
        "name": "SELECT 1",
        "level": "query",
        "timestamp": [
          1672158078,
          668690000
        ],
        "attributes": {}
      },
      {
        "span_id": "be0fa29f52252e34",
        "name": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-0000000000000000c5086fae7603fd87-01 */",
        "level": "query",
        "timestamp": [
          1672158078,
          684525000
        ],
        "attributes": {}
      }
    ]
  }
}

Requested only traces:

query {
  findFirstUser {
    id
  }
}

{"X-capture-telemetry": "tracing, "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}

Response

{
  "data": {
    "findFirstUser": {
      "id": "clbmo34ul0000rb9qk1lyds45"
    }
  },
  "extensions": {
    "traces": [
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "d99a53cde24529d8",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:build_graph",
        "start_time": [
          1672158113,
          501612000
        ],
        "end_time": [
          1672158113,
          501812000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "53d6563a75bbedb9",
        "parent_span_id": "f082d56668ff706f",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158113,
          502055000
        ],
        "end_time": [
          1672158113,
          518957000
        ],
        "attributes": {
          "db.statement": "SELECT 1"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "f082d56668ff706f",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:connection",
        "start_time": [
          1672158113,
          501861000
        ],
        "end_time": [
          1672158113,
          519340000
        ],
        "attributes": {
          "db.type": "postgres"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "5686cdfae69e5bef",
        "parent_span_id": "01b5d4ee55b6563d",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158113,
          519751000
        ],
        "end_time": [
          1672158113,
          525036000
        ],
        "attributes": {
          "db.statement": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-000000000000000001b5d4ee55b6563d-01 */"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "01b5d4ee55b6563d",
        "parent_span_id": "6d506495af6c00e3",
        "name": "filter read query",
        "start_time": [
          1672158113,
          519682000
        ],
        "end_time": [
          1672158113,
          525504000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "6d506495af6c00e3",
        "parent_span_id": "371083ff7226fee5",
        "name": "prisma:engine:read-execute",
        "start_time": [
          1672158113,
          519498000
        ],
        "end_time": [
          1672158113,
          525610000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "371083ff7226fee5",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:interpret",
        "start_time": [
          1672158113,
          519450000
        ],
        "end_time": [
          1672158113,
          525637000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "2c630ba99b665921",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:serialize",
        "start_time": [
          1672158113,
          525684000
        ],
        "end_time": [
          1672158113,
          525789000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "e6a6a50a20617362",
        "parent_span_id": "00f067aa0ba902b7",
        "name": "prisma:engine",
        "start_time": [
          1672158113,
          501190000
        ],
        "end_time": [
          1672158113,
          525883000
        ]
      }
    ],
    "logs": []
  }
}

miguelff · 2022-12-14T18:54:22Z

Status update:

I solved the problem of some query logs not being captured. I did this by flushing the global tracer at the end of the request, before grabbing the logs for rendering. There's a tradeoff tho' We are using more than one tracing layer, 1) so far the one in charge of capturing logs in memory and returning them in the result. And 2) another one to output logs to STDOUT. The latter flushes spans by default in batches of 512. However, an average request generates less than that (I don't have metrics to assess how much less, and it actually depends a lot on distribution). This means that we are flushing each N < 512 traces, which can lead to an increased frequency in flushing to the STDOUT. I don't think that's super important right now, but might be something to take into account when running this in PDP.

The reason why I did not flush the different SpanExporters selectively is that those are hidden abstractions I don't have access to once the pipeline is built. The GlobalTracerProvider used in the application has its own type, and doesn't expose the list of span processors used, as its private counterpart. I would need to do something fancy (bad) to access and flush them selectively. And that's something I won't do right now. I created a ticket instead.

With this, the implementation is ready for @aqrln and @danstarns to implement front-end changes.

I will continue by

adding tests to this feature
get a review
merge

Which I hope to finish tomorrow / friday

query-engine/query-engine/src/capture_tracer.rs

query-engine/query-engine/src/lib.rs

query-engine/query-engine/src/server/mod.rs

query-engine/query-engine/src/capture_exporter.rs

query-engine/core/src/trace_helpers/mod.rs

miguelff · 2022-12-28T11:37:56Z

Status udpate: The scope of this PR is frozen to implement the capture protocol as described in the contract spec

This is to let the prisma/prisma proceed with the integration.

Further improvements are carried on in a separate PR: #3549

miguelff · 2022-12-30T17:51:16Z

Given I didn't have a review on this PR and that the diff of #3549 was almost neutral. I proceed to merge improvements into this branch, and that way, let reviewers to only review one PR.

This commit is limited to sql-query-connector, it should not interfere with #3505. In general, we do not need ownership of the trace id in sql-query-connector, since we only re-render it in comments. Working with a reference is easier (it ii `Copy`, etc.), and it already saves us string copies and allocations with this commit. The other purpose of this PR is that we discussed yesterday about introducing a `QueryContext<'_>` type struct to hold things like the trace id and connection info. Experience from designing similar context structs in the schema team showed it is much easier to do if everything in the context can be a reference. On the side, I could not resist making a few public items crate-public, to make the public surface of the crate smaller and clearer.

...ory/query-engine/connector-test-kit-rs/query-tests-setup/src/runner/binary_20230116193406.rs

query-engine/core/src/interactive_transactions/error.rs

miguelff · 2023-01-19T12:56:12Z

From 423cba4

I think logs are working on the server but not properly handled by either mini-proxy or the client: https://github.com/prisma/prisma/pull/16748/files#r1081177632

According to https://www.notion.so/prismaio/Data-Proxy-Logs-Scale-Team-Spec-184242eb7271476d82113c59c761bed9#4ec77c2aa9004015ac2d392c7458dfb5

miguelff self-assigned this Dec 14, 2022

miguelff added kind/feature A request for a new feature. domain/client and removed kind/feature A request for a new feature. domain/client labels Dec 14, 2022

miguelff mentioned this pull request Dec 14, 2022

Selectively flush individual exporter layers #3507

Closed

sonatype-lift bot reviewed Dec 15, 2022

View reviewed changes

query-engine/query-engine/src/capture_tracer.rs Outdated Show resolved Hide resolved

miguelff changed the title ~~feat: Implement PDP/DataProxy logging protocol in the binary engine~~ feat: Implement PDP/DataProxy tracing protocol in the binary engine Dec 15, 2022

sonatype-lift bot reviewed Dec 15, 2022

View reviewed changes

query-engine/query-engine/src/lib.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch from 5bfa149 to ac57c9d Compare December 15, 2022 13:12

miguelff commented Dec 16, 2022

View reviewed changes

query-engine/query-engine/src/server/mod.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch from cbcaa14 to 24e3702 Compare December 16, 2022 12:20

miguelff marked this pull request as ready for review December 16, 2022 12:20

miguelff requested review from aqrln and Weakky December 16, 2022 12:20

miguelff commented Dec 16, 2022

View reviewed changes

query-engine/query-engine/src/server/mod.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch from 6b27b32 to a6032cf Compare December 16, 2022 15:58

miguelff commented Dec 16, 2022

View reviewed changes

query-engine/query-engine/src/capture_exporter.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch 2 times, most recently from 488c03a to 0f4ba10 Compare December 18, 2022 18:04

miguelff marked this pull request as draft December 19, 2022 12:27

miguelff force-pushed the log-capture branch from 012a890 to 9ff2b46 Compare December 19, 2022 21:23

aqrln reviewed Dec 20, 2022

View reviewed changes

query-engine/core/src/trace_helpers/mod.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch 6 times, most recently from afa1bf2 to 7ca3af9 Compare December 25, 2022 22:08

miguelff force-pushed the log-capture branch from 0a2c2f1 to f79549f Compare December 27, 2022 18:03

miguelff mentioned this pull request Dec 28, 2022

feat: Log capturing improvements + Opportunistic refactoring #3549

Merged

7 tasks

danstarns approved these changes Dec 30, 2022

View reviewed changes

miguelff force-pushed the log-capture branch from 18a15a1 to 94aa3ad Compare December 30, 2022 17:51

tomhoule mentioned this pull request Jan 6, 2023

sql-query-connector: take trace ids by reference #3571

Merged

miguelff force-pushed the log-capture branch from 557ef83 to 38aa91d Compare January 10, 2023 11:28

danstarns mentioned this pull request Jan 12, 2023

test(client): Logging prisma/prisma#17277

Merged

miguelff force-pushed the log-capture branch 2 times, most recently from 40d3dd8 to 70023df Compare January 17, 2023 13:24

janpio reviewed Jan 18, 2023

View reviewed changes

...ory/query-engine/connector-test-kit-rs/query-tests-setup/src/runner/binary_20230116193406.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch from 171e03a to d57824e Compare January 18, 2023 14:17

sonatype-lift bot reviewed Jan 18, 2023

View reviewed changes

query-engine/core/src/interactive_transactions/error.rs Outdated Show resolved Hide resolved

miguelff force-pushed the log-capture branch 2 times, most recently from c2dd630 to 2fc3873 Compare January 18, 2023 16:06

miguelff mentioned this pull request Jan 19, 2023

feat(client): DataProxy query logs prisma/prisma#16748

Merged

miguelff requested a review from a team January 19, 2023 12:34

miguelff mentioned this pull request Jan 20, 2023

chore: (qe) Log capture preparatory refactoring #3616

Closed

Implement log/tracing capturing

5fc71ed

According to https://www.notion.so/prismaio/Data-Proxy-Logs-Scale-Team-Spec-184242eb7271476d82113c59c761bed9#4ec77c2aa9004015ac2d392c7458dfb5

miguelff force-pushed the log-capture branch from b87ca5a to 5fc71ed Compare January 20, 2023 10:51

miguelff changed the base branch from main to log-capture-prep-refactoring January 20, 2023 10:52

miguelff closed this Jan 20, 2023

miguelff deleted the log-capture branch January 20, 2023 11:01

This was referenced Jan 20, 2023

chore: (qe) Preparatory refactoring to introduce log capturing. #3617

Merged

feat: (qe) Telemetry capturing for PDP/Data proxy #3618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement PDP/DataProxy tracing protocol in the binary engine #3505

feat: Implement PDP/DataProxy tracing protocol in the binary engine #3505

miguelff commented Dec 14, 2022 •

edited

Loading

miguelff commented Dec 14, 2022 •

edited

Loading

miguelff commented Dec 28, 2022 •

edited

Loading

miguelff commented Dec 30, 2022

miguelff commented Jan 19, 2023 •

edited

Loading

feat: Implement PDP/DataProxy tracing protocol in the binary engine #3505

feat: Implement PDP/DataProxy tracing protocol in the binary engine #3505

Conversation

miguelff commented Dec 14, 2022 • edited Loading

Sample use cases

miguelff commented Dec 14, 2022 • edited Loading

miguelff commented Dec 28, 2022 • edited Loading

miguelff commented Dec 30, 2022

miguelff commented Jan 19, 2023 • edited Loading

miguelff commented Dec 14, 2022 •

edited

Loading

miguelff commented Dec 14, 2022 •

edited

Loading

miguelff commented Dec 28, 2022 •

edited

Loading

miguelff commented Jan 19, 2023 •

edited

Loading