Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement PDP/DataProxy tracing protocol in the binary engine #3505

Closed

Conversation

miguelff
Copy link
Contributor

@miguelff miguelff commented Dec 14, 2022

See #3618 instead

Supersedes #3112
Tracked in https://github.com/prisma/client-planning/issues/158
Sibling to prisma/prisma#16748

To be released after the prep refactoring in #3616
Closes https://github.com/prisma/client-planning/issues/138
Closes https://github.com/prisma/client-planning/issues/144
Closes https://github.com/prisma/client-planning/issues/145

Implements traces and log capturing in concordance with the (private link) contract specification

The technical design is best explained throw the interaction diagram below:

diagram

In the diagram, you will see objects whose lifetime is static. The boxes for those have a double
width margin. These are:

  • The server itself
  • The global TRACER, which handles log! and span! and uses the global PROCESSOR to
    process the data constituting a trace Spans and log Events
  • The global PROCESSOR, which manages the Storage set of data structures, holding logs,
    traces (and capture settings) per request.

Then, through the request lifecycle, different objects are created and dropped:

  • When a request comes in, its headers are processed and a [Settings] object is built, this
    object determines, for the request, how logging and tracing are going to be captured: if only
    traces, logs, or both, and which log levels are going to be captured.
  • Based on the settings, a new Capturer is created; a capturer is nothing but an exporter
    wrapped to start capturing / fetch the captures for this particular request.

Then the capturing process works in this way:

  • The server receives a query [1]
  • It grabs the HTTP headers and builds a Capture object [2], which is configured with the settings
    denoted by the X-capture-telemetry
  • Now the server tells the Capturer to start capturing all the logs and traces occurring on
    the request [3] (denoted by a trace_id) The trace_id is either carried on the traceparent
    header or implicitly created on the first span of the request. To start capturing implies
    creating for the trace_id in two different data structures: logs and traces; and storing
    the settings for selecting the Spans and Event to capture [4].
  • The server dispatches the request and Somewhere else in the code, it is processed [5].
  • There the code logs events and emits traces asynchronously, as part of the processing [6]
  • Traces and Logs arrive at the TRACER, and get hydrated in the PROCESSOR [7],
    which writes them in the shard corresponding to the current trace_id, into the
    logs and traces storage [8]. The settings previously stored under the trace_id
    key, are used to pick which events and spans are going to be captured based on their level.
  • When the code that dispatches the request is done it returns a PrismaResponse to the
    server [9].
  • Then the server asks the PROCESSOR to fetch the captures [10]
  • And right after that, it fetches the captures from the Storage [11]. At that time, although
    that's not represented in the diagram, the captures are deleted from the storage, thus
    freeing any memory used for capturing during the request
  • Finally, the server sets the logs and traces extensions in the PrismaResponse[12],
    it serializes the extended response in json format and returns it as an HTTP Response
    blob [13].

Sample use cases

Server configuration: LOG_QUERIES=y QE_LOG_LEVEL=TRACE query-engine --enable-telemetry-in-response that log level determines trace spans and events up to that level of specificity will be generated. If that was e.g. INFO, trace or debug span events won't appear in the response.

Requested both query logs, and tracing events:

query {
  findFirstUser {
    id
  }
}

{"X-capture-telemetry": "query,tracing", "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}
Response
{
  "data": {
    "findFirstUser": {
      "id": "clbmo34ul0000rb9qk1lyds45"
    }
  },
  "extensions": {
    "traces": [
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "7bd99b5853f8d089",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:build_graph",
        "start_time": [
          1672158025,
          881801000
        ],
        "end_time": [
          1672158025,
          884372000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "6e699af2b490bdf7",
        "parent_span_id": "3f839889df49a93d",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158025,
          884831000
        ],
        "end_time": [
          1672158025,
          903142000
        ],
        "attributes": {
          "db.statement": "SELECT 1"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "3f839889df49a93d",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:connection",
        "start_time": [
          1672158025,
          884444000
        ],
        "end_time": [
          1672158025,
          903852000
        ],
        "attributes": {
          "db.type": "postgres"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "fca9180d4bf1051c",
        "parent_span_id": "3b4097ee973907f7",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158025,
          906643000
        ],
        "end_time": [
          1672158025,
          913675000
        ],
        "attributes": {
          "db.statement": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-00000000000000003b4097ee973907f7-01 */"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "3b4097ee973907f7",
        "parent_span_id": "24b4ea79b5d76326",
        "name": "filter read query",
        "start_time": [
          1672158025,
          906075000
        ],
        "end_time": [
          1672158025,
          913943000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "24b4ea79b5d76326",
        "parent_span_id": "430a13c07f515f10",
        "name": "prisma:engine:read-execute",
        "start_time": [
          1672158025,
          904796000
        ],
        "end_time": [
          1672158025,
          914101000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "430a13c07f515f10",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:interpret",
        "start_time": [
          1672158025,
          904378000
        ],
        "end_time": [
          1672158025,
          914129000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "06144a592e692bf3",
        "parent_span_id": "af3ec494b0b8a4ff",
        "name": "prisma:engine:serialize",
        "start_time": [
          1672158025,
          914184000
        ],
        "end_time": [
          1672158025,
          914446000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "af3ec494b0b8a4ff",
        "parent_span_id": "00f067aa0ba902b7",
        "name": "prisma:engine",
        "start_time": [
          1672158025,
          875028000
        ],
        "end_time": [
          1672158025,
          914560000
        ]
      }
    ],
    "logs": [
      {
        "span_id": "6e699af2b490bdf7",
        "name": "SELECT 1",
        "level": "query",
        "timestamp": [
          1672158025,
          884831000
        ],
        "attributes": {}
      },
      {
        "span_id": "fca9180d4bf1051c",
        "name": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-00000000000000003b4097ee973907f7-01 */",
        "level": "query",
        "timestamp": [
          1672158025,
          906643000
        ],
        "attributes": {}
      }
    ]
  }
}

Requested only query logs:

query {
  findFirstUser {
    id
  }
}

{"X-capture-telemetry": "query", "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}
Response
{
  "data": {
    "findFirstUser": {
      "id": "clbmo34ul0000rb9qk1lyds45"
    }
  },
  "extensions": {
    "traces": [],
    "logs": [
      {
        "span_id": "699bd0e58f094fac",
        "name": "SELECT 1",
        "level": "query",
        "timestamp": [
          1672158078,
          668690000
        ],
        "attributes": {}
      },
      {
        "span_id": "be0fa29f52252e34",
        "name": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-0000000000000000c5086fae7603fd87-01 */",
        "level": "query",
        "timestamp": [
          1672158078,
          684525000
        ],
        "attributes": {}
      }
    ]
  }
}

Requested only traces:

query {
  findFirstUser {
    id
  }
}

{"X-capture-telemetry": "tracing, "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}
Response
{
  "data": {
    "findFirstUser": {
      "id": "clbmo34ul0000rb9qk1lyds45"
    }
  },
  "extensions": {
    "traces": [
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "d99a53cde24529d8",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:build_graph",
        "start_time": [
          1672158113,
          501612000
        ],
        "end_time": [
          1672158113,
          501812000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "53d6563a75bbedb9",
        "parent_span_id": "f082d56668ff706f",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158113,
          502055000
        ],
        "end_time": [
          1672158113,
          518957000
        ],
        "attributes": {
          "db.statement": "SELECT 1"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "f082d56668ff706f",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:connection",
        "start_time": [
          1672158113,
          501861000
        ],
        "end_time": [
          1672158113,
          519340000
        ],
        "attributes": {
          "db.type": "postgres"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "5686cdfae69e5bef",
        "parent_span_id": "01b5d4ee55b6563d",
        "name": "prisma:engine:db_query",
        "start_time": [
          1672158113,
          519751000
        ],
        "end_time": [
          1672158113,
          525036000
        ],
        "attributes": {
          "db.statement": "SELECT \"public\".\"User\".\"id\" FROM \"public\".\"User\" WHERE 1=1 LIMIT $1 OFFSET $2 /* traceparent=00-4bf92f3577b34da6a3ce929d0e0e4736-000000000000000001b5d4ee55b6563d-01 */"
        }
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "01b5d4ee55b6563d",
        "parent_span_id": "6d506495af6c00e3",
        "name": "filter read query",
        "start_time": [
          1672158113,
          519682000
        ],
        "end_time": [
          1672158113,
          525504000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "6d506495af6c00e3",
        "parent_span_id": "371083ff7226fee5",
        "name": "prisma:engine:read-execute",
        "start_time": [
          1672158113,
          519498000
        ],
        "end_time": [
          1672158113,
          525610000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "371083ff7226fee5",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:interpret",
        "start_time": [
          1672158113,
          519450000
        ],
        "end_time": [
          1672158113,
          525637000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "2c630ba99b665921",
        "parent_span_id": "e6a6a50a20617362",
        "name": "prisma:engine:serialize",
        "start_time": [
          1672158113,
          525684000
        ],
        "end_time": [
          1672158113,
          525789000
        ]
      },
      {
        "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "e6a6a50a20617362",
        "parent_span_id": "00f067aa0ba902b7",
        "name": "prisma:engine",
        "start_time": [
          1672158113,
          501190000
        ],
        "end_time": [
          1672158113,
          525883000
        ]
      }
    ],
    "logs": []
  }
}

@miguelff miguelff self-assigned this Dec 14, 2022
@miguelff miguelff added kind/feature A request for a new feature. domain/client and removed kind/feature A request for a new feature. domain/client labels Dec 14, 2022
@miguelff
Copy link
Contributor Author

miguelff commented Dec 14, 2022

Status update:

I solved the problem of some query logs not being captured. I did this by flushing the global tracer at the end of the request, before grabbing the logs for rendering. There's a tradeoff tho' We are using more than one tracing layer, 1) so far the one in charge of capturing logs in memory and returning them in the result. And 2) another one to output logs to STDOUT. The latter flushes spans by default in batches of 512. However, an average request generates less than that (I don't have metrics to assess how much less, and it actually depends a lot on distribution). This means that we are flushing each N < 512 traces, which can lead to an increased frequency in flushing to the STDOUT. I don't think that's super important right now, but might be something to take into account when running this in PDP.

The reason why I did not flush the different SpanExporters selectively is that those are hidden abstractions I don't have access to once the pipeline is built. The GlobalTracerProvider used in the application has its own type, and doesn't expose the list of span processors used, as its private counterpart. I would need to do something fancy (bad) to access and flush them selectively. And that's something I won't do right now. I created a ticket instead.

With this, the implementation is ready for @aqrln and @danstarns to implement front-end changes.

I will continue by

  • adding tests to this feature
  • get a review
  • merge

Which I hope to finish tomorrow / friday

@miguelff miguelff changed the title feat: Implement PDP/DataProxy logging protocol in the binary engine feat: Implement PDP/DataProxy tracing protocol in the binary engine Dec 15, 2022
@miguelff miguelff marked this pull request as ready for review December 16, 2022 12:20
@miguelff miguelff requested review from aqrln and Weakky December 16, 2022 12:20
@miguelff miguelff force-pushed the log-capture branch 2 times, most recently from 488c03a to 0f4ba10 Compare December 18, 2022 18:04
@miguelff miguelff marked this pull request as draft December 19, 2022 12:27
@miguelff miguelff force-pushed the log-capture branch 6 times, most recently from afa1bf2 to 7ca3af9 Compare December 25, 2022 22:08
@miguelff
Copy link
Contributor Author

miguelff commented Dec 28, 2022

Status udpate: The scope of this PR is frozen to implement the capture protocol as described in the contract spec

This is to let the prisma/prisma proceed with the integration.

Further improvements are carried on in a separate PR: #3549

@miguelff
Copy link
Contributor Author

Given I didn't have a review on this PR and that the diff of #3549 was almost neutral. I proceed to merge improvements into this branch, and that way, let reviewers to only review one PR.

tomhoule added a commit that referenced this pull request Jan 6, 2023
This commit is limited to sql-query-connector, it should not interfere
with #3505.

In general, we do not need ownership of the trace id in
sql-query-connector, since we only re-render it in comments. Working
with a reference is easier (it ii `Copy`, etc.), and it already saves us
string copies and allocations with this commit.

The other purpose of this PR is that we discussed yesterday about
introducing a `QueryContext<'_>` type struct to hold things like the
trace id and connection info. Experience from designing similar context
structs in the schema team showed it is much easier to do if everything
in the context can be a reference.

On the side, I could not resist making a few public items crate-public,
to make the public surface of the crate smaller and clearer.
tomhoule added a commit that referenced this pull request Jan 6, 2023
This commit is limited to sql-query-connector, it should not interfere
with #3505.

In general, we do not need ownership of the trace id in
sql-query-connector, since we only re-render it in comments. Working
with a reference is easier (it ii `Copy`, etc.), and it already saves us
string copies and allocations with this commit.

The other purpose of this PR is that we discussed yesterday about
introducing a `QueryContext<'_>` type struct to hold things like the
trace id and connection info. Experience from designing similar context
structs in the schema team showed it is much easier to do if everything
in the context can be a reference.

On the side, I could not resist making a few public items crate-public,
to make the public surface of the crate smaller and clearer.
tomhoule added a commit that referenced this pull request Jan 6, 2023
This commit is limited to sql-query-connector, it should not interfere
with #3505.

In general, we do not need ownership of the trace id in
sql-query-connector, since we only re-render it in comments. Working
with a reference is easier (it ii `Copy`, etc.), and it already saves us
string copies and allocations with this commit.

The other purpose of this PR is that we discussed yesterday about
introducing a `QueryContext<'_>` type struct to hold things like the
trace id and connection info. Experience from designing similar context
structs in the schema team showed it is much easier to do if everything
in the context can be a reference.

On the side, I could not resist making a few public items crate-public,
to make the public surface of the crate smaller and clearer.
@miguelff miguelff force-pushed the log-capture branch 2 times, most recently from 40d3dd8 to 70023df Compare January 17, 2023 13:24
@miguelff miguelff force-pushed the log-capture branch 2 times, most recently from c2dd630 to 2fc3873 Compare January 18, 2023 16:06
@miguelff miguelff requested a review from a team January 19, 2023 12:34
@miguelff
Copy link
Contributor Author

miguelff commented Jan 19, 2023

From 423cba4

Screenshot 2023-01-19 at 13 54 23

I think logs are working on the server but not properly handled by either mini-proxy or the client: https://github.com/prisma/prisma/pull/16748/files#r1081177632

@miguelff miguelff changed the base branch from main to log-capture-prep-refactoring January 20, 2023 10:52
@miguelff miguelff closed this Jan 20, 2023
@miguelff miguelff deleted the log-capture branch January 20, 2023 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants