-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please clarify the exact sanitization process for messages. #1286
Comments
We have no current plans to create or maintain documentation on the sanitization techniques we employ with message sending. The existing documentation is broad enough to cover the sanitization as a whole. We are, however, monitoring feedback/reports related to mention abuse through bots internally and may make API changes down the road if it proves to be problematic. For right now, however, the existing sanitization techniques seem to be manageable and many bots have already patched this behavior. |
Many bots, mine included, took drastic measures which break other things to avoid this issue because of the lack of communication not allowing anything better. |
Would it be possible to cite the exact documentation on this subject? Since not everyone has a existing bot here that has solved the issues of sanitation and have nothing official to reference to, what knowledge is there for new developers getting into the Discord API to understand what discord does in-between from the message being send to Discord, and Discord rendering it on the end user's end? |
mikeshardmind quotes it a few messages up. It's not particularly helpful. |
While I did quote that, I also mentioned how it is insufficient and also incorrect on top of it, and would generally agree with the question asked. To be explicitly clear this section from the docs:
Is outright incorrect, as I pointed out:
There's no such thing as "surprising formatting" or invalid unicode when sending something already NFC normalized. This is entirely misleading, and would be better replaced by "don't send messages containing user content, we're not gonna explain anything" |
It seems to me that bot devs have mostly been 'managing' this through manual testing and hearsay from other developers; isn't the point of having an open-source documentation repository that people can contribute things like this to a central source of information? At the very least, failing Discord providing adequate documentation for their own system, shouldn't people be able to list the known properties of the system in the docs? Moreover, the system was 'manageable' for a long time before the recent uptick of mention abuse, but then it wasn't, because a previously unknown property of the system was discovered. People patched things and bots are mostly working again, but what happens when another anomaly is discovered? It's strange to me that Discord is so reticent to provide this information to developers interested in securing their bots against these incidents (and thereby reducing the likelihood of them occurring in the first place, saving everyone at Discord time and effort in response). |
This is a joke, right? Over the course of a few issues you've closed saying it's not an issue and Discord won't do more, you've had multiple library maintainers, a handful of large bot owners, and a couple of people who build self-hosted solutions all chime in that there needs to be more and how this is a problem. You've been presented with multiple options that developers could work with from just being clear about the documentation, to letting developers set the mentions everyone flag on messages sent, rather than that only being part of a payload from discord. The larger bots like tatsumaki solved this by removing some functionality of their bots rather than attempt to guess at your sanitization. |
Although I agree 100% that there should be a solution from Discord on this, I believe it is being blown way out of proportion. I simply solved this issue in my bot by moving any form of content that I can not control aka User Input, Data from API etc inside of an embed. Embeds will automatically remove mentions. |
Embeds can also be disabled on the user and permission levels. I don't see how that's a remotely satisfying solution. |
2020 and we still have problems with sanitation? :pepecoffee: I agree with night on this one. If the safety of the app wasn't at risk, I would publish it, however, if doing so could mean people would find ways around it, I would keep in in-house. |
You're heavily misunderstanding the issue. Because it hasn't been documented, bot devs are being exposed to sanitization issues. This is like if some SQLite interface silently changed your inputs changing the meaning, only in this case only causing message content to change to be heavy annoyances, rather than exploit databses. |
The root cause of mention-spam exploits are bots not implementing input sanitization correctly. The root cause of that is that there is no documentation about how to do it. Unless the system is documented, it's impossible to be reasonably confident that any given sanitization method covers all its bases. This situation is analogous to web developers trying to prevent XSS attacks without having any information about how HTML escaping works. @Gibigbig, your notion that "the safety of the app is at risk" is inaccurate. Bot developers are asking about the full behavior so they can implement the proper precautions. Arguably, the safety of the app is at risk because Discord hasn't documented their system well enough, because as long as the system is a black box, attackers could still be discovering flaws in bots' sanitization methods before bot developers can patch them. |
@night It's been around a week, is there any update on the team's feedback-gathering? Given everything that's been said in this issue and the previous ones, I really think this deserves a second look. |
@Geo1088 Definitely the XSS case. Discord should document theirs santization. It's not that sensitive, it shoule not be a trade secret, it should be something that we can consider how safe it is and how to properly implement in our bots. |
We have no plans to document our message santization methods, however, we are working on exposing expose a new set of fields you can provide when you create a message or post a webhook that will specify what kinds of mentions will be allowed to be processed in a message. More details forthcoming as we reach code completion on that. This will let you specify on a per-message level a white-list of which mentions should be processed (by users, roles, and everyone(/here)), and specifically what users/roles should be mentioned if found in the message content. We think this should solve all of the above feedback, while allowing us to keep our sanitization methods (which change more often than you'd think) undocumented, so as an API end-user you don't have to worry about it. |
@jhgg |
It is likely to change which is why we are not wanting to document it. Our "obscure character" filtering rules change as we discover more obscure characters. Documenting this means having people rely on a specific behavior, which we do not want to have people do. |
Right, I understand you may add more characters, but I meant that it could be documented as a matter of something such as:
This leaves the door open to changing behavior, while noting a class of character which has legitimate use, but that discord does not allow. At least in the case of (intentional use of) directional control characters, you can use things like bidi algorithms to remove them while getting the intended text layout. |
I don't follow. If you want to remove control characters from messages before sending them to us, that's your prerogative, and I think out of scope for whatever documentation we can provide. |
While it's obvious to someone who has run into the issue how to minimize some of the impacts, I figured it might be worth a small additional note that there was a way to minimize issues with the filtering by preprocessing on a bot developer's end. That said, as adding an explicit way to chose which mentions get sent removes all of the abusable edge cases of the lack of documentation, people who run into the other cases should be able to fix it themselves when noticing it without the additional frustrations of users who got mass mentions as a result. It's not a big deal either way with that additional change in the future. |
This sounds like a great solution, thanks so much for the update! I'm happy that the level of control will be so fine, it does seem like pretty much every use case will have the tools needed to keep unwanted mentions out of their messages with this feature. Eagerly awaiting further news. |
And this marks the hopeful conclusion of this issue: #1396 |
The only current documented guidance on this is the following:
However, the behavior of collapsing characters has been observed with NFC-Normalized Unicode with all control characters pre-processed out. With this being the case, the above definition about it happening only with invalid Unicode characters, or with characters with unexpected formatting is clearly not correct.
This leaves no ability for a bot to take in web content, such as from RSS feeds without removing all
@
characters, which changes the intent of a message. Changing this to insert zero-width spaces is also not universally acceptible, as there may be things in web content which users should be able to copy paste without unintended effects.If you don't want to provide alternative ways such as message flags or just rejecting "bad" messages (see #1276), then please at least document the behavior in such a way that developers can properly account for it without breaking normal formatting.
The text was updated successfully, but these errors were encountered: