Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Add ErrorHandling for LdkServer APIs #19

Closed
wants to merge 6 commits into from

Conversation

G8XSU
Copy link
Contributor

@G8XSU G8XSU commented Oct 30, 2024

This is a draft PR, please ignore some missing docs and field naming etc.
Only last 2 commits are of importance.

  • Adds proto definition for api errors.
    • When HttpStatusCode is not ok (200), the response content contains a serialized ErrorResponse.
  • Add error struct for LdkServerError.
    • It is mainly used as error struct that will be returned from internal layers of ldk-server implementation.
    • It will be converted to proto::error::ErrorResponse at top level service layer.
    • It is intentionally kept flat instead of nesting of fields in enum, so that we can re-use something similar in ldk-server-client. A flat structure without nesting of enums/string might be helpful in case we want to generate ldk-server-client bindings later.

@G8XSU G8XSU requested a review from jkczyz October 30, 2024 21:07
@G8XSU G8XSU mentioned this pull request Oct 30, 2024
13 tasks
Comment on lines +36 to +38
// Used when an internal server error occurred, client is probably at no fault and can safely retry
// this error with exponential backoff.
InternalServerError internal_server_error = 6;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually there is something like "deadline exceeded" if the client should retry, whereas an internal error means something is seriously wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes internal_server_error usually means there is something wrong with server implementation,
an error that shouldn't have happened at all or is unexpected.
It could be a dependency outage, or service outage and is similar to 5xx error, hence it needs to be retried.

OPERATION_FAILED = 1;

// There was a timeout during the requested operation.
OPERATION_TIMED_OUT = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we see this over InternalServerError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were bunch "Timeout" errors in ldk-node, but i think many of them might not be applicable here.
I was going to merge them into OPERATION_TIMED_OUT.
But it is fair, that we can merge them into OPERATION_FAILED, as both might need to be retried and from client perspective it might not be that important.

OPERATION_TIMED_OUT = 2;

// Sending a payment has failed.
PAYMENT_SENDING_FAILED = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use OPERATION_FAILED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes normally we could just use that, but overtime, i assume we would want to expose more out of "https://github.com/lightningdevkit/rust-lightning/blob/main/lightning/src/ln/outbound_payment.rs#L446-L455"

But i think for now, i could reuse operation_failed.

Comment on lines +56 to +58
message LightningError {
LightningErrorCode lightning_error_code = 1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should have this use a oneof containing an enum for each possible operation. Then each enum would be defined specifically for the operation. Otherwise, we are left with values for LightningErrorCode that aren't relevant for the given operation, which results in the user needing to know which values are possible for said operation.

Alternatively, we could make a dedicated ErrorResponse for each operation with the same structure, only using an operation-specific enum here. This would result in a lot of duplication, for the non-LightningError part, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneof containing an enum for each possible operation. Then each enum would be defined specifically for the operation.

I tried couple of combinations for oneofs and enums.
Haven't tried this one but my general learning has been one-ofs introduce high-complexity when combined with other nested forms.
If we do this, it will be struct->struct->oneof->enum, which might become unwieldy but i can try.
Also, note that modelling these in protobuf is just part of the problem, we don't have a good way to represent these structures in rust enums. (which are binding safe.)
Another problem with oneofs is when a new field is introduced, they are treated as unset, which might be unexpected.

Otherwise, we are left with values for LightningErrorCode that aren't relevant for the given operation, which results in the user needing to know which values are possible for said operation.

I 100% agree to this concern, as i have been to trying to address it with an error model, but there isn't one right solution here. Not only this, but a user has to handle/understand all or atleast most variants of error for error-handling. This was the main motivation for going with sub_error_code or nesting within lightning_error_code, so that user can choose to just treat lightning_error as another retryable error for most operations.

Another possibility is to introduce List<String> error_tags in ErrorResponse (similar to this),
and they will contain specific tag related to error acc. to the api being used, and we can have enums:
GenericTags, PaymentSendErrorTags etc.. and for every api we can document which tags it can use.
Might look over-complicated, but i think it might be simpler than having api level structs, since most of the time error is generic and not api specific, and client can only consume the tags they are interested in. (also bindings safe.)

we could make a dedicated ErrorResponse for each operation

Yes ideally, we would have dedicated api level error structs, but neither ldk nor ldk-node have api level error structs, so if we do this, there is still a possibility that server might get an error that it didn't expect for a particular operation, and for that it will need "OTHER_ERROR" in each of those structs as a fallback.
Moreover such api level error struct are mostly not that different from each other. In reality, there are very few places where we need such granularity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried couple of combinations for oneofs and enums. Haven't tried this one but my general learning has been one-ofs introduce high-complexity when combined with other nested forms. If we do this, it will be struct->struct->oneof->enum, which might become unwieldy but i can try.

Currently, it is:

message -> oneof -> message -> enum

which in rust is represented as:

struct -> enum -> struct -> enum

Though, can't we replace LightningError with LightningErrorCode in the oneof to get rid of the second struct? i.e., the enum representing the oneof would have one variant containing the LightningErrorCode enum, while the other variants would contain the marker structs.

message -> oneof -> message|enum

Or in rust:

struct -> enum -> struct|enum

And what I suggested was:

message -> oneof -> message -> oneof -> enum

Or in rust:

struct -> enum -> struct -> enum -> enum

Alternatively, we can combine both oneofs by removing LightningError entirely and adding a field to the oneof for each operation. This gives:

message -> oneof -> message|enum

Or in rust:

struct -> enum -> struct|enum

Also, note that modelling these in protobuf is just part of the problem, we don't have a good way to represent these structures in rust enums. (which are binding safe.)

Could you explain what you mean by this? Some restriction on rust enums that make them bindings-safe?

Another problem with oneofs is when a new field is introduced, they are treated as unset, which might be unexpected.

Not sure what you mean here. Are you saying a new field to the oneof? Internally they are just optional fields where only one can be set and where the rust representation uses an enum, IIUC. Do you mean if the server and client versions of the proto are not in sync?

I 100% agree to this concern, as i have been to trying to address it with an error model, but there isn't one right solution here. Not only this, but a user has to handle/understand all or atleast most variants of error for error-handling. This was the main motivation for going with sub_error_code or nesting within lightning_error_code, so that user can choose to just treat lightning_error as another retryable error for most operations.

Should we simply have a retryable and non-retryable mapping then? The actual error would still be conveyed via the message field.

Another possibility is to introduce List<String> error_tags in ErrorResponse (similar to this), and they will contain specific tag related to error acc. to the api being used, and we can have enums: GenericTags, PaymentSendErrorTags etc.. and for every api we can document which tags it can use. Might look over-complicated, but i think it might be simpler than having api level structs, since most of the time error is generic and not api specific, and client can only consume the tags they are interested in. (also bindings safe.)

FWIW, this is a bit more type safe, but not sure if I'd consider it just yet.

https://github.com/googleapis/googleapis/blob/c7ce97ebdeb85009fed49b1256586dbd3867adc6/google/rpc/status.proto#L48

Yes ideally, we would have dedicated api level error structs, but neither ldk nor ldk-node have api level error structs, so if we do this, there is still a possibility that server might get an error that it didn't expect for a particular operation, and for that it will need "OTHER_ERROR" in each of those structs as a fallback. Moreover such api level error struct are mostly not that different from each other. In reality, there are very few places where we need such granularity.

Could you remind me why we decided to not expose ldk_node::NodeError? Just want to wrap my head around what level of granularity we should expose. Seems like we have a few options:

  • None (i.e., LightningError with no oneof)
  • Retryable / Non-retryable
  • Some mapping from ldk_node::NodeError to a smaller set (i.e., like this PR)
  • Subset per operation (i.e., what I originally suggested, but can be fragile as you pointed out)
  • All (i.e., create a proto enum matching ldk_node::NodeError)

In all cases, ldk_node::NodeError::to_string would populate message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what you mean by this? Some restriction on rust enums that make them bindings-safe?

Yes, Uniffi error enums cannot have nested fields.
a struct with message, enum, error_code will work fine, but if we move more fields inside of enum, that isn't supported.
Modelling of one-of in rust will involve enums, and we can't have proper fields inside of enums.

I can discuss further details offline.

@G8XSU
Copy link
Contributor Author

G8XSU commented Nov 7, 2024

Closing in favor of #20,
We can add the error_details field later.

@G8XSU G8XSU closed this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants