Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect refactor #4

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Connect refactor #4

wants to merge 22 commits into from

Conversation

hotpocket
Copy link
Owner

Created GrpcSerializerBase and moved as much as the proto types would allow into it from GrpcSerializer.

start runme server on port 9999 to test this
./bin/runme server --address localhost:9999 --runner --tls ~/git/forks/vscode-runme/tls/

seems to do everything the GrpcSerializer does.

src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
@jlewi
Copy link

jlewi commented Dec 6, 2024

It seems like a lot of code duplication right now is the same function written against each version of the proto.

e.g. You have two versions of marshalFrontMatter one which es_proto and one which returns ts_proto. Rather than duplicate the logic, could you have a single function which returns a ts_proto and then just convert the ts_proto to the es_proto if you need an es_proto?

Similarly if you have somFunc(ts_proto) and you want to create someFunc(es_proto) could you just convert the es_proto to ts_proto and then invoke someFunc(ts_proto)?

Duplicating core business logic (E.g. marshalFrontMatter) will likely lead to problems over time as we will forgot to update both implementations. So the more we can minimize duplication the better.

@jlewi
Copy link

jlewi commented Dec 6, 2024

Actually I think we can make this much simpler. We don't need to maintain two separate classes GrpcSerializer and ConnectSerializer in order to continue to support gRPC. We can have a single class and just use either the Grpc or connect transport. Put another way, we can effectively just refactor the GRPCSerializer to use the bufbuild/es library rather than the timostamp library but continue to use gRPC.

Migrating to es is something we want to do independent of using connect protocol because the es library is the newer maintained library (see stateful/runme#641).

@hotpocket
Copy link
Owner Author

hotpocket commented Dec 6, 2024

Actually I think we can make this much simpler. We don't need to maintain two separate classes GrpcSerializer and ConnectSerializer in order to continue to support gRPC. We can have a single class and just use either the Grpc or connect transport. Put another way, we can effectively just refactor the GRPCSerializer to use the bufbuild/es library rather than the timostamp library but continue to use gRPC.

Migrating to es is something we want to do independent of using connect protocol because the es library is the newer maintained library (see stateful/runme#641).

So if I'm interpreting this correctly, effectively what you're saying is rename ConnectSerializer to GrpcSerializer, add a method called setProtocol or something like that, use this to construct the transport, and call it good?

@hotpocket hotpocket closed this Dec 6, 2024
@hotpocket hotpocket reopened this Dec 6, 2024
@jlewi
Copy link

jlewi commented Dec 6, 2024

@hotpocket The bulk of the changes will be refactoring the code to use the es library for the protos rather than timostamm library. The other thing that will need to change is how the client and transport get started. Right now the client and transport gets instantiated in IServer. We can't simply change that code to use the es GRpc Transport because the execute server will still use the timostamp library. While we could also update it to use the es in one go I think its better to try to limit the scope of the change

@jlewi
Copy link

jlewi commented Dec 10, 2024

How's this coming?

@hotpocket
Copy link
Owner Author

How's this coming?

My apologies, I have not been able to get to this until today.

I was hoping to get some more detail on what level of disruption/modification of the existing code base is tolerable. The prior method we discussed was to duplicate and modify code to get a usable non-interfering connect serializer(I did some refactoring in kernel.ts for clarity, but this is not strictly necessary).

If we want to combine things, we run back into integrations issues. If we want a drop in replacement I imagine we would use the server via the constructor if we receive it, otherwise obtain a transport some other way. Depending on the protocol we want to use (connect|grpc) call something like setProtocol which would create the right client & transport, which was what I was thinking when I mentioned this in a prior comment. If we don't want to duplicate logic, and want to keep our method signatures the same, it seems we would need to construct three proto objects instead of one, per method that uses protos (and some of these are nested and in loops like the marshal methods). (1) obtain the proto (something like Notebook.clone(data) currently), (2) convert to es proto (via code you have shared previously), do some work (as per the current ConnectSerializer), and (3) to convert back to ts for a valid return type. I haven't done any performance testing, and I know premature optimization is the root of all evil, but this seems less than ideal. There doesn't seem to be an ideal solution. The only hard blocker I see in all of this is the missing tag property mentioned in a prior comment. If we must have this property, this needs to be addressed.

Refactoring from ts to es is already done via the ConnectSerializer. I have already talked about the refactoring concerns, but maybe we can have an instance of a ConnectSerializer inside of each instance of GrpcSerializer and forward calls & convert proto types? Server use and conversion overhead notwithstanding.

If we want to change if/how a server gets started (maybe depending on config options?) we should discuss this as this falls outside of the scope of implementing a different serialization mechanism I think.

We can of course get on a call to hash this out if that is a better medium to address these issues.

@jlewi
Copy link

jlewi commented Dec 11, 2024

The intent was to preserve existing functionality; i.e. the ability to use gRPC. Your current code proves we can do this by using the es library and a gRPC transport.

The limitation of the current PR is that in order to have two code paths (1 for es and 1 for timostamp) you are duplicating a bunch of code. e.g applyIdentity. This will be a burden to maintain.

The way to fix that is to drop support for timostamp in the serializer. Basically, replace GRPCSerializer with your connect serializer.

To reduce the amount of change in one PR; I would focus this PR on replacing the use of the timostamp library with the es library in the Serializer. That is I would not introduce an option to use the connect transport or set the serializer address in this PR. I would suggest moving that into a follow on PR.

@hotpocket
Copy link
Owner Author

hotpocket commented Dec 12, 2024

I have needed to fix some of the tests as a result of the GrpcSerializerBase refactor. There very well may be other issues. It may be worth eliminating this base class and integrating everything into the new GrpcSerializer class. That is, unless we will want to support multiple serializers that talk to grpc services. In general, testing likely needs some attention after all these changes.

One test still fails whose solution is unclear "should backfill the output type for buffers". A comment has been added to start the discussion around this test.
src/extension/serializer.ts Outdated Show resolved Hide resolved
@jlewi
Copy link

jlewi commented Dec 14, 2024

Seems like the remaining work is

  1. Get the tests to pass
  2. Minimize the diff by getting rid of the unnecessary new base class and other refactorings that are no longer needed now that we have a single class
  3. Figure out the best solution for initializing the transport and in particular getting the suitable TLS information

@hotpocket
Copy link
Owner Author

hotpocket commented Dec 15, 2024

Seems like the remaining work is

  1. Get the tests to pass
  2. Minimize the diff by getting rid of the unnecessary new base class and other refactorings that are no longer needed now that we have a single class
  3. Figure out the best solution for initializing the transport and in particular getting the suitable TLS information

1 & 2 are "easy". 3 is an executive decision. How do we get a cert in a secure way if we're not reading it from the file system? I'm using the method as gleaned from the server process, as per previous discussions. What do you want this process to look like? see comment earlier in this PR where tls is discussed.

})
let res
try {
res = await this.client.deserialize(deserializeRequest)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added extra error handling here because without it there was no error shown in vscode when the the request for deserialization failed. the old grpc serializer didn't do any additional error checking ... i can remove it if we feel safe to do so

@@ -187,7 +187,7 @@ export default async function saveCellExecution(
},
},
}
const result = await graphClient.mutate(mutation)
const result = await graphClient.mutate(mutation as MutationOptions)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about this because I have no idea why this is needed. The typescript error on this was very hairy. If someone smarter than me knows what is going on here please let me know.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this code need to change at all if its running on head? What was the error you got?

Copy link
Owner Author

@hotpocket hotpocket Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  The types of 'variables.input.notebook.cells' are incompatible between these types.
    Type '{ outputs: { items: CellOutputItem[]; metadata: { [key: string]: string; }; processInfo?: CellOutputProcessInfo | undefined; }[]; kind: CellKind; ... 4 more ...; executionSummary?: CellExecutionSummary | undefined; }[]' is not assignable to type 'ReporterCellInput[]'.
      Type '{ outputs: { items: CellOutputItem[]; metadata: { [key: string]: string; }; processInfo?: CellOutputProcessInfo; }[]; kind: CellKind; ... 4 more ...; executionSummary?: CellExecutionSummary; }' is not assignable to type 'ReporterCellInput'.
        Types of property 'executionSummary' are incompatible.
          Type 'CellExecutionSummary | undefined' is not assignable to type 'InputMaybe<ReporterCellExecutionSummaryInput> | undefined'.
            Type 'CellExecutionSummary' is not assignable to type 'ReporterCellExecutionSummaryInput'.
              Types of property 'executionOrder' are incompatible.
                Type 'number | undefined' is not assignable to type 'InputMaybe<UInt32> | undefined'.
                  Type 'number' has no properties in common with type 'UInt32'.ts(2345)```

Copy link
Owner Author

@hotpocket hotpocket Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after testing the wall thoroughly with my forehead I have found a loose brick. It seems this graphql stuff was generated and it seems from the timostam-ts proto type. On line 2014 in graphql.ts there exists a line executionOrder?: InputMaybe<UInt32>; which conforms to the timostam-ts type Notebook.Cell.CellExecutionSummary.executionOrder which is executionOrder?: UInt32Value; which is in contrast to the es type, which is, as you'd guess, a number ... so my guess is this graphql.ts needs to be regenerated.

Can you say more about what you mean by "running on head"? I am on this branch looking for issues caused by this code. I assume you mean the head of the upstream repo? If this is not correct please elaborate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I mean making sure your branch is up to date with upstream changes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this graphql code continue to use the timeostamp protos in this PR?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this graphql code continue to use the timeostamp protos in this PR?

This brings us back to the "can we make these two coexist" question as I see it, which was attempted for quite a long time until it was determined they could not coexist, and the original GrpcSerializer was modified so that there would not be two implementations. From what I can tell this will need to be regenerated, or we can force cast (as I have done in this branch) and pray it doesn't cause any runtime errors. I ran the tests for this and didn't get any failures, but I'm not sure this specific case is being tested.

I don't see where graphql.ts gets generated. Should I post a question on discord?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I didn't realize from the diff this was using grpcSerializer. Yeah if its using grpcSerializer it might need to change. Sure ask in discord.

@hotpocket
Copy link
Owner Author

i feel we're getting close here....

src/extension/serializer.ts Outdated Show resolved Hide resolved
try {
if (!getTLSEnabled()) {
return {}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a method to KernelServer which returns the pem async getPem() and then just call that function here so the logic for getting the PEM is written in a single place inside Kernel?

Copy link
Owner Author

@hotpocket hotpocket Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods in kernelServer.ts would need to be made public private static async getTLS(tlsDir: string) , and protected getTLSDir(): string. Adding other method(s) would cause duplication. Are we ok w/ making this public? This has been discussed in this thread above. Thoughts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not make it public? If it avoids code duplications what's the downside?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just makes me feel a bit dirty to enable the server to serve up it's certs to anyone that asks for them. I'll make these methods public and make a note of it in the final PR to have it reviewed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done in the most recent commit. let me know if there are any other issues you see here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to expose the methods. Then the other option is to create a second routine on the KernelServer that can return a connect es transport (in addition) to the gRPC transport. I thought that might lead to more duplicate code but maybe that's cleaner like you say than exposing TLSDir, TLS, and serverUrl.

src/extension/serializer.ts Outdated Show resolved Hide resolved
src/extension/serializer.ts Outdated Show resolved Hide resolved
@jlewi
Copy link

jlewi commented Dec 20, 2024

Nice work. I think this is almost there.

  1. I'd suggest going ahead and opening a PR with the updates to serializertest.ts in the upstream repo
  2. I think there are some minor edits about the comment that need to be fixed
  3. I think after fixing 2 you can open up a PR against the main repo and see what Sebastien thinks about the best way to instantiate the transport

@hotpocket
Copy link
Owner Author

hotpocket commented Dec 20, 2024

  1. I'd suggest going ahead and opening a PR with the updates to serializertest.ts in the upstream repo

This is already in the PR queue - stateful#1875

  1. I think there are some minor edits about the comment that need to be fixed

fixed

  1. I think after fixing 2 you can open up a PR against the main repo and see what Sebastien thinks about the best way to instantiate the transport

Submitted here stateful#1876

@jlewi
Copy link

jlewi commented Dec 21, 2024

Fantastic. Looks like the first one got merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants