Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add event for error statistics #18

Open
farao opened this issue May 23, 2020 · 5 comments
Open

Add event for error statistics #18

farao opened this issue May 23, 2020 · 5 comments
Assignees

Comments

@farao
Copy link
Member

farao commented May 23, 2020

We need a protocol extension for the palava protocol and an implementation of it in the signal tower for receiving error stats from the client and adding them to prometheus. Proposition:

{
  event: "stats",
  type: "peerconnection established" | "peerconnection_error",
  category: ...
}

Of course we can add more types than peerconnection_established and peerconnection_error in the future, but for now, I would start with those 2. The category can be one of different (error) categories and can also be optional(?). For peerconnection_error the categories will most likely be something like "nat_failed", "media_incompatible", "network_error" (just examples, we'll see if we can distinguish these in the in the frontend). For peerconnection_established, these could be "first_connection" or "reconnect"...

@thammi
Copy link
Member

thammi commented May 23, 2020

I am not sure whether this should be integrated into a signaling server. It might be better suited in a small REST server as an additional prometheus endpoint.

@farao
Copy link
Member Author

farao commented May 24, 2020

We already have some prometheus statistics in the signal tower. Of course, this makes statistics more visible because they become part of the protocol, but the event can be optional and would not need to be supported by other palava protocol servers (except that they would need to actvely ignore it).

I actually like that the setup is kept so simple that we only have one server software (and one standing connection).

@erikzenker
Copy link
Member

I can understand both standpoints. Collecting client side errors must not be part of the palava protocol and thus could be collected on a separate service. But this also means another, server , another connection, another source of errors in a distributed system.

I would go for a single endpoint. Since nobody else is implementing the palava protocol. Adding the error event does nö harm and is quite simple, right?

@farao
Copy link
Member Author

farao commented Jun 4, 2020

If we implement TURN, then this will probably be less necessary, because when the client asks for a turn credential, we know that stun failed. The only information that we would be missing out on would be how often even TURN fails...

(and probably different reasons why the connection with stun failed, but I'm not even sure if there are now ways to find that out - there didn't use to be any when we first researched this some years ago)

@farao
Copy link
Member Author

farao commented Nov 22, 2020

New proposition (since we configure turn in the client directly and not after a failed connection now):

{
  event: "stats",
  type: "peerconnection_established_direct" | "peerconnection_established_relay" | "peerconnection_error_ice" | "peerconnection_error_offeranswer"
}

There are new types now for differentiating direct and relayed (turn) connection and the two errors that we can meaningfully differentiate in the client.

Any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants