-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Add ErrorHandling for LdkServer APIs #19
Conversation
// Used when an internal server error occurred, client is probably at no fault and can safely retry | ||
// this error with exponential backoff. | ||
InternalServerError internal_server_error = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually there is something like "deadline exceeded" if the client should retry, whereas an internal error means something is seriously wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes internal_server_error usually means there is something wrong with server implementation,
an error that shouldn't have happened at all or is unexpected.
It could be a dependency outage, or service outage and is similar to 5xx error, hence it needs to be retried.
OPERATION_FAILED = 1; | ||
|
||
// There was a timeout during the requested operation. | ||
OPERATION_TIMED_OUT = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we see this over InternalServerError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were bunch "Timeout" errors in ldk-node, but i think many of them might not be applicable here.
I was going to merge them into OPERATION_TIMED_OUT.
But it is fair, that we can merge them into OPERATION_FAILED, as both might need to be retried and from client perspective it might not be that important.
OPERATION_TIMED_OUT = 2; | ||
|
||
// Sending a payment has failed. | ||
PAYMENT_SENDING_FAILED = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use OPERATION_FAILED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes normally we could just use that, but overtime, i assume we would want to expose more out of "https://github.com/lightningdevkit/rust-lightning/blob/main/lightning/src/ln/outbound_payment.rs#L446-L455"
But i think for now, i could reuse operation_failed.
message LightningError { | ||
LightningErrorCode lightning_error_code = 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should have this use a oneof
containing an enum for each possible operation. Then each enum would be defined specifically for the operation. Otherwise, we are left with values for LightningErrorCode
that aren't relevant for the given operation, which results in the user needing to know which values are possible for said operation.
Alternatively, we could make a dedicated ErrorResponse
for each operation with the same structure, only using an operation-specific enum here. This would result in a lot of duplication, for the non-LightningError
part, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oneof containing an enum for each possible operation. Then each enum would be defined specifically for the operation.
I tried couple of combinations for oneofs and enums.
Haven't tried this one but my general learning has been one-ofs introduce high-complexity when combined with other nested forms.
If we do this, it will be struct->struct->oneof->enum, which might become unwieldy but i can try.
Also, note that modelling these in protobuf is just part of the problem, we don't have a good way to represent these structures in rust enums. (which are binding safe.)
Another problem with oneofs is when a new field is introduced, they are treated as unset, which might be unexpected.
Otherwise, we are left with values for LightningErrorCode that aren't relevant for the given operation, which results in the user needing to know which values are possible for said operation.
I 100% agree to this concern, as i have been to trying to address it with an error model, but there isn't one right solution here. Not only this, but a user has to handle/understand all or atleast most variants of error for error-handling. This was the main motivation for going with sub_error_code or nesting within lightning_error_code, so that user can choose to just treat lightning_error as another retryable error for most operations.
Another possibility is to introduce List<String> error_tags
in ErrorResponse (similar to this),
and they will contain specific tag related to error acc. to the api being used, and we can have enums:
GenericTags, PaymentSendErrorTags etc.. and for every api we can document which tags it can use.
Might look over-complicated, but i think it might be simpler than having api level structs, since most of the time error is generic and not api specific, and client can only consume the tags they are interested in. (also bindings safe.)
we could make a dedicated
ErrorResponse
for each operation
Yes ideally, we would have dedicated api level error structs, but neither ldk nor ldk-node have api level error structs, so if we do this, there is still a possibility that server might get an error that it didn't expect for a particular operation, and for that it will need "OTHER_ERROR" in each of those structs as a fallback.
Moreover such api level error struct are mostly not that different from each other. In reality, there are very few places where we need such granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried couple of combinations for oneofs and enums. Haven't tried this one but my general learning has been one-ofs introduce high-complexity when combined with other nested forms. If we do this, it will be struct->struct->oneof->enum, which might become unwieldy but i can try.
Currently, it is:
message -> oneof -> message -> enum
which in rust is represented as:
struct -> enum -> struct -> enum
Though, can't we replace LightningError
with LightningErrorCode
in the oneof
to get rid of the second struct? i.e., the enum representing the oneof
would have one variant containing the LightningErrorCode
enum, while the other variants would contain the marker structs.
message -> oneof -> message|enum
Or in rust:
struct -> enum -> struct|enum
And what I suggested was:
message -> oneof -> message -> oneof -> enum
Or in rust:
struct -> enum -> struct -> enum -> enum
Alternatively, we can combine both oneof
s by removing LightningError
entirely and adding a field to the oneof
for each operation. This gives:
message -> oneof -> message|enum
Or in rust:
struct -> enum -> struct|enum
Also, note that modelling these in protobuf is just part of the problem, we don't have a good way to represent these structures in rust enums. (which are binding safe.)
Could you explain what you mean by this? Some restriction on rust enums that make them bindings-safe?
Another problem with oneofs is when a new field is introduced, they are treated as unset, which might be unexpected.
Not sure what you mean here. Are you saying a new field to the oneof
? Internally they are just optional fields where only one can be set and where the rust representation uses an enum, IIUC. Do you mean if the server and client versions of the proto are not in sync?
I 100% agree to this concern, as i have been to trying to address it with an error model, but there isn't one right solution here. Not only this, but a user has to handle/understand all or atleast most variants of error for error-handling. This was the main motivation for going with sub_error_code or nesting within lightning_error_code, so that user can choose to just treat lightning_error as another retryable error for most operations.
Should we simply have a retryable and non-retryable mapping then? The actual error would still be conveyed via the message
field.
Another possibility is to introduce
List<String> error_tags
in ErrorResponse (similar to this), and they will contain specific tag related to error acc. to the api being used, and we can have enums: GenericTags, PaymentSendErrorTags etc.. and for every api we can document which tags it can use. Might look over-complicated, but i think it might be simpler than having api level structs, since most of the time error is generic and not api specific, and client can only consume the tags they are interested in. (also bindings safe.)
FWIW, this is a bit more type safe, but not sure if I'd consider it just yet.
Yes ideally, we would have dedicated api level error structs, but neither ldk nor ldk-node have api level error structs, so if we do this, there is still a possibility that server might get an error that it didn't expect for a particular operation, and for that it will need "OTHER_ERROR" in each of those structs as a fallback. Moreover such api level error struct are mostly not that different from each other. In reality, there are very few places where we need such granularity.
Could you remind me why we decided to not expose ldk_node::NodeError
? Just want to wrap my head around what level of granularity we should expose. Seems like we have a few options:
- None (i.e.,
LightningError
with nooneof
) - Retryable / Non-retryable
- Some mapping from
ldk_node::NodeError
to a smaller set (i.e., like this PR) - Subset per operation (i.e., what I originally suggested, but can be fragile as you pointed out)
- All (i.e., create a proto enum matching
ldk_node::NodeError
)
In all cases, ldk_node::NodeError::to_string
would populate message
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what you mean by this? Some restriction on rust enums that make them bindings-safe?
Yes, Uniffi error enums cannot have nested fields.
a struct with message, enum, error_code will work fine, but if we move more fields inside of enum, that isn't supported.
Modelling of one-of in rust will involve enums, and we can't have proper fields inside of enums.
I can discuss further details offline.
Closing in favor of #20, |
This is a draft PR, please ignore some missing docs and field naming etc.
Only last 2 commits are of importance.
content
contains a serializedErrorResponse
.proto::error::ErrorResponse
at top level service layer.