Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Use better plaintext representation of status for filtering #3301

Merged
merged 13 commits into from
Sep 16, 2024

Conversation

tsmethurst
Copy link
Contributor

@tsmethurst tsmethurst commented Sep 15, 2024

Description

If this is a code change, please include a summary of what you've coded, and link to the issue(s) it closes/implements.

If this is a documentation change, please briefly describe what you've changed and why.

This pull request updates our filtering logic to not use our SanitizeToPlaintext function for reducing status HTML content to plaintext, but instead use https://github.com/k3a/html2text, which doesn't cause weird line concatenation, and can competently extract links, mentions, and hashtags properly from the text.

To avoid re-parsing a status from HTML every time we want to filter it, a TTLCache has been added to the converter which stores the parsed-to-text version of statuses.

Also some minor fixes to our filter regexes, to include whitespace and start/end line in our whole word match.

closes #3298
closes #3128

Checklist

Please put an x inside each checkbox to indicate that you've read and followed it: [ ] -> [x]

If this is a documentation change, only the first checkbox must be filled (you can delete the others if you want).

  • I/we have read the GoToSocial contribution guidelines.
  • I/we have discussed the proposed changes already, either in an issue on the repository, or in the Matrix chat.
  • I/we have not leveraged AI to create the proposed changes.
  • I/we have performed a self-review of added code.
  • I/we have written code that is legible and maintainable by others.
  • I/we have commented the added code, particularly in hard-to-understand areas.
  • I/we have made any necessary changes to documentation.
  • I/we have added tests that cover new code.
  • I/we have run tests and they pass locally with the changes.
  • I/we have run go fmt ./... and golangci-lint run.

internal/gtsmodel/filter.go Show resolved Hide resolved
internal/typeutils/util.go Outdated Show resolved Hide resolved
internal/typeutils/util.go Outdated Show resolved Hide resolved
internal/typeutils/internaltofrontend_test.go Outdated Show resolved Hide resolved
internal/typeutils/internaltofrontend.go Show resolved Hide resolved
internal/typeutils/util.go Outdated Show resolved Hide resolved
@tsmethurst tsmethurst merged commit efd1a4f into main Sep 16, 2024
3 checks passed
@tsmethurst tsmethurst deleted the status_filtering_bugfix branch September 16, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] SanitizeToPlaintext concatenates text in unexpected ways [bug] Filters not always filtering (boosts)
3 participants