Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize & deserialize from connect server #2

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open

serialize & deserialize from connect server #2

wants to merge 35 commits into from

Conversation

hotpocket
Copy link
Owner

this demonstrates using the connect protocol to make requests and receive responses.

@jlewi
Copy link

jlewi commented Nov 11, 2024

This is great progress. What work is needed to get to a fully functional implementation of the ConnectSerializer? In particular a working implementation of saveNotebook? Right now it looks like your code is just creating a fake notebook to save. We'd like to save the actual notebook.

It looks like the GRPCSerializer implements a lot of functionality that will need to be reused by the ConnectSerializer.
For example here in SaveNotebook it calls marshallNotebook
https://github.com/stateful/vscode-runme/blob/3b21ee8912b9973d9a066fb39cd4a6d185b2925d/src/extension/serializer.ts#L743

marshalNotebook is probably converting from vscode's data structure representing the notebook into the proto representation so that it can be used in the request. So you will likely need to reuse that functionality in your implementation of the serializer.

I suspect this means you will likely want to refactor the GRPCSerializer to put common functionality into a base class that can be reused by the GRPCSerializer and ConnectSerializer.

In order to reuse existing code you will probably run into the fact that you are using a different generated client library for the protocol buffers then what the existing code uses. Therefore you will need to convert the protocol buffers back and forth in order to exchange them.

You can take a look at https://github.com/stateful/vscode-runme/blob/main/src/extension/ai/protos.ts.


```

Not sure how to modify the service to remove the `runme.parser.v1.ParserService` part but this works to serialize a very simple notebook structure
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you want to remove it? I believe that's a critical part of the protocol in terms of being able to route different requests to the appropriate handler.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have misunderstood our prior conversations. I was under the impression that we wanted to have the url formed similar to http://localhost:1234/Deserialize. If not then there is no issue here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL is determined by the connect protocol.


Open Questions:

? I see createPromiseClient is used to create a connection to agent but don't see this method being used by runme to connect to it's native grpc service. Is this OK?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The gRPC library and the connect libraries are two different libraries so the code for consuming them will be different.

src/extension/serializer.ts Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
return -1
}

public async saveNotebookOutputs(_uri: Uri): Promise<number> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these functions be implemented? It looks like they are implemented for the gRPC serializer so we probably need them for the connect serializer as well.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be addressing output caching tomorrow.

this.log.info(`Serializer: Client pointed to: ${this.serializerServiceUrl}`)
return createPromiseClient(
ParserService,
createConnectTransport({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For manual testing purposes, as discussed in other places I think it will be helpful if you can point this at a fully functional implementation of the serializer server so you can verify this is working.
The Runme server has a working implementation which uses the gRPC protocol.
With the connect client you can optionally select what protocol to use. Right now you are using the Connect Transport. However, you can optionally use gRPC by calling the function to create a gRPC transport. That would allow you to test that your implementation is actually working.

Copy link
Owner Author

@hotpocket hotpocket Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussion about the tls issues and disabling it, I have discovered the source of the issue. for the issue & solution see fake_server/client2.ts and fake_server/readme.md . I subsequently incorporated a tls connection into serializer.ts also

@jlewi
Copy link

jlewi commented Nov 17, 2024

@hotpocket My suggestion is to break this down. There are two main functions we need to make work

  1. serializing a notebook (i.e. saveNotebook)
  2. loading a notebook (i.e. reviveNotebook)

The logic for loading a notebook looks much simpler. So I'd probably start there.

To verify that function is working, I would try to make sure that you can run the extension and load a notebook using the connect serializer. That will require a working implementation of the server. However, as I mentioned here
#2 (comment)
using the connect client it should be trivial to swap out the GRPC transport for the connect transport. If you configure it to use the gRPC transport then you should be able to reuse the serialization server inside the Runme server.

So my suggestion would be to aim for the following

  1. Configure connectserializer to use gRPC transport
  2. Run the extension in debug mode configured to use the connect serializer
  3. Try to open an existing markdown
  4. Does the markdown properly load? If not try to figure out why not and iterate on the code?

@jlewi
Copy link

jlewi commented Nov 20, 2024

@hotpocket how's this going? Have you had a chance to try running the extension in debug mode and loading a document using the connect serializer?

@hotpocket
Copy link
Owner Author

@hotpocket how's this going? Have you had a chance to try running the extension in debug mode and loading a document using the connect serializer?

I have. I have been met with cert errors, and now just a general rejection of the connection. I have committed my current incantation for your inspection.

Furthermore the magic of marshalling seems to happen in the serialize and deserialize method from the timostamm buf/lib, which does not use the connect protocol AFAIK, and if I understand correctly, this will inhibit our aim to get this to run on the web if these libs are used to marshall.

I think I would benefit from some pair coding session if you have time. I keep hitting brick walls.

@jlewi
Copy link

jlewi commented Nov 20, 2024

What is the specific error you are getting?
What address are you connecting to?

What happens if you try to connect to your fake server? Do you get the same error?

@hotpocket
Copy link
Owner Author

hotpocket commented Nov 20, 2024

What is the specific error you are getting? What address are you connecting to?

ApplicationInsights:An invalid instrumentation key was provided. There may be resulting telemetry loss (1) [undefined]

What happens if you try to connect to your fake server?

My fake server connects just fine using createConnectTransport but when trying createGrpcTransport I get an additional error to the one above that is

Error: Unexpected SIGPIPE {stack: 'Error: Unexpected SIGPIPE at process.<anonym…Trampoline (node:internal/async_hooks:130:17)', message: 'Unexpected SIGPIPE'}

If I change the http version to 1.1 I am able to connect to my fake server using the createGrpcTransport

Using http 1.1 and trying to connect to the go grpc server with createGrpcTransport yields the above SIGPIPE error with http and just the telemetry error with https. Specifying a cert seems to make no difference when trying to connect with https. I have tried config options of ca and cert as ways to specify the cert. ca seems to be the wrong way AFAICT but was the code snippet I found first when trying to get this to work .

@jlewi
Copy link

jlewi commented Nov 26, 2024

How's this coming?

@hotpocket
Copy link
Owner Author

hotpocket commented Nov 26, 2024

How's this coming?

need to address caching as this is implemented by grpc serializer. marshaling is working for connect & grpc serializers. tls is working. need to clean things up & split into sane PR's before this part is "done". if you see anything that needs a course correction let me know plz & ty!

@@ -979,6 +984,9 @@ export class ConnectSerializer extends SerializerBase {
marshalFrontmatter,
}) as typeof this.protoNotebookType

// cast so when we serialize we don't get missing methods due to it not being a real proto
notebook.frontmatter = new es_proto.Frontmatter(notebook.frontmatter)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cast and the one below for the maskedNotebook squash a very annoying bug.

const maskedNotebook = Marshal.notebook(data, maskedNotebookProto, {
marshalFrontmatter,
}) as typeof this.protoNotebookType
const cacheOutputs = this.cacheNotebookOutputs(notebook, maskedNotebook, cacheId)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the signature of this.cacheNotebookOutputs changed?

Copy link
Owner Author

@hotpocket hotpocket Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some background:
upstream marshalNotebook augments and converts a vscode notebook to a protobuf-ts Notebook. I wanted to have it be proto agnostic for reuse in grpc and connect serializers. This was the start of my attempt, which created a Marshal class allowing abstraction via composition(see comment below). Unfortunately I underestimated the level of integration of the operations of the class and the proto types. Since then, I have started to store type information directly in the class via properties and methods.

the method signature changed due to my original efforts at abstraction via composition. Since then, given the class properties & methods, I could likely re-integrate this and move the method signature back to the original(this is on the todo list). I need to clean this up for public consumption.

})

const plainReq = this.getSerializationRequest({ notebook, options })
const plainRes = this.client!.serialize(plainReq)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to make two requests one for the masked data and one for the unmasked requests?
What does the original code do?
https://github.com/stateful/vscode-runme/blob/c3c135e1c47351f0e56f266efde296b354d26f5e/src/extension/serializer.ts#L756
Is it only making a single request?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original code makes two requests. I assume this is because the masked version has had parts modified and thus is not the same notebook.

// convert vscode.NotebookData to one of two runme Notebook proto type bindings
public static notebook(
data: NotebookData,
notebookProto: es_proto.Notebook | Notebook,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to reuse code without taking a union type? What is the code doing that it can take either type?

Copy link
Owner Author

@hotpocket hotpocket Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think extracting the code that relies on particulars of each proto transpiler type, within it's host class, and re-incorporating the methods currently contained in Marshal will eliminate this union type. The proto types will then be specific to that class (for example: protoNotebookType as it exist differently in GrpcSerializer and ConnectSerializer, and would thus be used as the type in each classes method signatures). The union type were required here because there were two serializers that called it, each w/ their own proto types. My plan is to eliminate this in the next day or two.

@jlewi
Copy link

jlewi commented Nov 28, 2024

Left a couple comments.

It seems like some of the logic for handling masking is changing. It looks like there are a couple places where masked and unmasked data are now handled in parallel? In particular, it looks like you are sending to serialize requests one for masked and one for un masked data.

One thing that would help would be to update the PR description with some explanation of the key design points.
Here are some top level questions

  1. How are you refactoring the caching functionality to be reusable across the two serializers?
  2. How are you handling common logic for converting to/from NotebookData structures to protos? i.e. the Marshal class?


// convert vscode.NotebookData to a runme Notebook type as generated by protoc-gen-es
public static marshalNotebook<T extends es_proto.Notebook | Notebook>(
class Marshal {
Copy link
Owner Author

@hotpocket hotpocket Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

playing with abstraction via composition. referenced here to compliment prior comments made

@hotpocket
Copy link
Owner Author

Left a couple comments.

It seems like some of the logic for handling masking is changing. It looks like there are a couple places where masked and unmasked data are now handled in parallel? In particular, it looks like you are sending to serialize requests one for masked and one for un masked data.

One thing that would help would be to update the PR description with some explanation of the key design points. Here are some top level questions

  1. How are you refactoring the caching functionality to be reusable across the two serializers?
  2. How are you handling common logic for converting to/from NotebookData structures to protos? i.e. the Marshal class?

I have started to address this in prior comments about changing method signatures and masked & plain serialization calls. I will have a more cohesive summary when I clean things up here in the next day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants