Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Use better plaintext representation of status for filtering #3301

Merged
merged 13 commits into from
Sep 16, 2024
Merged
4 changes: 2 additions & 2 deletions .drone.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ steps:
go test
-failfast
-timeout=20m
-tags "netgo osusergo static_build kvformat timetzdata"
-tags "netgo osusergo static_build kvformat timetzdata purego"
./...
- ./test/envparsing.sh
- ./test/swagger.sh
Expand Down Expand Up @@ -207,6 +207,6 @@ steps:

---
kind: signature
hmac: f4008d87e4e5b67251eb89f255c1224e6ab5818828cab24fc319b8f829176058
hmac: 3f3a24557b67760dd0c4091eaaed4842b0545f5aa65f90ce70d5e45da23c5260

...
1 change: 1 addition & 0 deletions .goreleaser.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ builds:
- static_build
- kvformat
- timetzdata
- purego
- >-
{{ if and (index .Env "DEBUG") (.Env.DEBUG) }}debugenv{{ end }}
- >-
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ For bugs and feature requests, please check to see if there's [already an issue]
The following open source libraries, frameworks, and tools are used by GoToSocial, with gratitude 💕

- [buckket/go-blurhash](https://github.com/buckket/go-blurhash); used for generating image blurhashes. [GPL-3.0 License](https://spdx.org/licenses/GPL-3.0-only.html).
- [cespare/xxhash](https://github.com/cespare/xxhash); xxHash generation. [MIT License](https://spdx.org/licenses/MIT.html).
- [coreos/go-oidc](https://github.com/coreos/go-oidc); OIDC client library. [Apache-2.0 License](https://spdx.org/licenses/Apache-2.0.html).
- [DmitriyVTitov/size](https://github.com/DmitriyVTitov/size); runtime model memory size calculations. [MIT License](https://spdx.org/licenses/MIT.html).
- Gin:
Expand Down Expand Up @@ -273,6 +274,7 @@ The following open source libraries, frameworks, and tools are used by GoToSocia
- [jackc/pgconn](https://github.com/jackc/pgconn); Postgres driver. [MIT License](https://spdx.org/licenses/MIT.html).
- [jackc/pgx](https://github.com/jackc/pgx); Postgres driver and toolkit. [MIT License](https://spdx.org/licenses/MIT.html).
- [KimMachineGun/automemlimit](https://github.com/KimMachineGun/automemlimit); cgroups memory limit checking. [MIT License](https://spdx.org/licenses/MIT.html).
- [k3a/html2text](https://github.com/k3a/html2text); HTML-to-text conversion. [MIT License](https://spdx.org/licenses/MIT.html).
- [mcuadros/go-syslog](https://github.com/mcuadros/go-syslog); Syslog server library. [MIT License](https://spdx.org/licenses/MIT.html).
- [microcosm-cc/bluemonday](https://github.com/microcosm-cc/bluemonday); HTML user-input sanitization. [BSD-3-Clause License](https://spdx.org/licenses/BSD-3-Clause.html).
- [miekg/dns](https://github.com/miekg/dns); DNS utilities. [Go License](https://go.dev/LICENSE).
Expand Down
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ require (
github.com/DmitriyVTitov/size v1.5.0
github.com/KimMachineGun/automemlimit v0.6.1
github.com/buckket/go-blurhash v1.1.0
github.com/cespare/xxhash v1.1.0
github.com/coreos/go-oidc/v3 v3.11.0
github.com/gin-contrib/cors v1.7.2
github.com/gin-contrib/gzip v1.0.1
Expand All @@ -40,6 +41,7 @@ require (
github.com/gorilla/feeds v1.2.0
github.com/gorilla/websocket v1.5.2
github.com/jackc/pgx/v5 v5.6.0
github.com/k3a/html2text v1.2.1
github.com/microcosm-cc/bluemonday v1.0.27
github.com/miekg/dns v1.1.62
github.com/minio/minio-go/v7 v7.0.76
Expand Down
8 changes: 8 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ github.com/Masterminds/semver/v3 v3.2.1 h1:RN9w6+7QoMeJVGyfmbcgs28Br8cvmnucEXnY0
github.com/Masterminds/semver/v3 v3.2.1/go.mod h1:qvl/7zhW3nngYb5+80sSMF+FG2BjYrf8m9wsX0PNOMQ=
github.com/Masterminds/sprig/v3 v3.2.3 h1:eL2fZNezLomi0uOLqjQoN6BfsDD+fyLtgbJMAj9n6YA=
github.com/Masterminds/sprig/v3 v3.2.3/go.mod h1:rXcFaZ2zZbLRJv/xSysmlgIM1u11eBaRMhvYXJNkGuM=
github.com/OneOfOne/xxhash v1.2.2 h1:KMrpdQIwFcEqXDklaen+P1axHaj9BSKzvpUUfnHldSE=
github.com/OneOfOne/xxhash v1.2.2/go.mod h1:HSdplMjZKSmBqAxg5vPj2TmRDmfkzw+cTzAElWljhcU=
github.com/ajg/form v1.5.1 h1:t9c7v8JUKu/XxOGBU0yjNpaMloxGEJhUkqFRq0ibGeU=
github.com/ajg/form v1.5.1/go.mod h1:uL1WgH+h2mgNtvBq0339dVnzXdBETtL2LeUXaIv25UY=
github.com/andybalholm/brotli v1.0.0/go.mod h1:loMXtMfwqflxFJPmdbJO0a3KNoPuLBgiu3qAvBg8x/Y=
Expand All @@ -118,6 +120,8 @@ github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4
github.com/cenkalti/backoff/v4 v4.3.0 h1:MyRJ/UdXutAwSAT+s3wNd7MfTIcy71VQueUuFK343L8=
github.com/cenkalti/backoff/v4 v4.3.0/go.mod h1:Y3VNntkOUPxTVeUxJ/G5vcM//AlwfmyYozVcomhLiZE=
github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
github.com/cespare/xxhash v1.1.0 h1:a6HrQnmkObjyL+Gs60czilIUGqrzKutQD6XZog3p+ko=
github.com/cespare/xxhash v1.1.0/go.mod h1:XrSqR1VqqWfGrhpAt58auRo0WTKS1nRRg3ghfAqPWnc=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI=
Expand Down Expand Up @@ -384,6 +388,8 @@ github.com/jstemmer/go-junit-report v0.9.1/go.mod h1:Brl9GWCQeLvo8nXZwPNNblvFj/X
github.com/jtolds/gls v4.20.0+incompatible h1:xdiiI2gbIgH/gLH7ADydsJ1uDOEzR8yvV7C0MuV77Wo=
github.com/jtolds/gls v4.20.0+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU=
github.com/k0kubun/colorstring v0.0.0-20150214042306-9440f1994b88/go.mod h1:3w7q1U84EfirKl04SVQ/s7nPm1ZPhiXd34z40TNz36k=
github.com/k3a/html2text v1.2.1 h1:nvnKgBvBR/myqrwfLuiqecUtaK1lB9hGziIJKatNFVY=
github.com/k3a/html2text v1.2.1/go.mod h1:ieEXykM67iT8lTvEWBh6fhpH4B23kB9OMKPdIBmgUqA=
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/klauspost/compress v1.10.4/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.10.10/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
Expand Down Expand Up @@ -506,6 +512,8 @@ github.com/smartystreets/goconvey v1.6.4 h1:fv0U8FUIMPNf1L9lnHLvLhgicrIVChEkdzIK
github.com/smartystreets/goconvey v1.6.4/go.mod h1:syvi0/a8iFYH4r/RixwvyeAJjdLS9QV7WQ/tjFTllLA=
github.com/sourcegraph/conc v0.3.0 h1:OQTbbt6P72L20UqAkXXuLOj79LfEanQ+YQFNpLA9ySo=
github.com/sourcegraph/conc v0.3.0/go.mod h1:Sdozi7LEKbFPqYX2/J+iBAM6HpqSLTASQIKqDmF7Mt0=
github.com/spaolacci/murmur3 v0.0.0-20180118202830-f09979ecbc72 h1:qLC7fQah7D6K1B0ujays3HV9gkFtllcxhzImRR7ArPQ=
github.com/spaolacci/murmur3 v0.0.0-20180118202830-f09979ecbc72/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA=
github.com/spf13/afero v1.11.0 h1:WJQKhtpdm3v2IzqG8VMqrr6Rf3UYpEF239Jy9wNepM8=
github.com/spf13/afero v1.11.0/go.mod h1:GH9Y3pIexgf1MTIWtNGyogA5MwRIDXGUr+hbWNoBjkY=
github.com/spf13/cast v1.3.1/go.mod h1:Qx5cxh0v+4UWYiBimWS+eyWzqEqokIECu5etghLkUJE=
Expand Down
19 changes: 15 additions & 4 deletions internal/gtsmodel/filter.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ package gtsmodel
import (
"regexp"
"time"

"github.com/superseriousbusiness/gotosocial/internal/util"
)

// Filter stores a filter created by a local account.
Expand Down Expand Up @@ -61,14 +63,23 @@ type FilterKeyword struct {

// Compile will compile this FilterKeyword as a prepared regular expression.
func (k *FilterKeyword) Compile() (err error) {
var wordBreak string
if k.WholeWord != nil && *k.WholeWord {
wordBreak = `\b`
var (
wordBreakStart string
wordBreakEnd string
)

if util.PtrOrZero(k.WholeWord) {
VyrCossont marked this conversation as resolved.
Show resolved Hide resolved
// Either word boundary or
// whitespace or start of line.
wordBreakStart = `(?:\b|\s|^)`
// Either word boundary or
// whitespace or end of line.
wordBreakEnd = `(?:\b|\s|$)`
}

// Compile keyword filter regexp.
quoted := regexp.QuoteMeta(k.Keyword)
k.Regexp, err = regexp.Compile(`(?i)` + wordBreak + quoted + wordBreak)
k.Regexp, err = regexp.Compile(`(?i)` + wordBreakStart + quoted + wordBreakEnd)
return // caller is expected to wrap this error
}

Expand Down
29 changes: 20 additions & 9 deletions internal/typeutils/converter.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,37 @@
package typeutils

import (
"log"
"sync"
"time"

"codeberg.org/gruf/go-cache/v3"
"github.com/superseriousbusiness/gotosocial/internal/filter/interaction"
"github.com/superseriousbusiness/gotosocial/internal/filter/visibility"
"github.com/superseriousbusiness/gotosocial/internal/state"
)

type Converter struct {
state *state.State
defaultAvatars []string
randAvatars sync.Map
visFilter *visibility.Filter
intFilter *interaction.Filter
state *state.State
defaultAvatars []string
randAvatars sync.Map
visFilter *visibility.Filter
intFilter *interaction.Filter
statusHashesToFilterableText cache.TTLCache[string, string]
}

func NewConverter(state *state.State) *Converter {
statusHashesToFilterableText := cache.NewTTL[string, string](0, 512, 0)
statusHashesToFilterableText.SetTTL(time.Hour, true)
if !statusHashesToFilterableText.Start(time.Minute) {
log.Panic(nil, "failed to start statusHashesToFilterableText cache")
}

return &Converter{
state: state,
defaultAvatars: populateDefaultAvatars(),
visFilter: visibility.NewFilter(state),
intFilter: interaction.NewFilter(state),
state: state,
defaultAvatars: populateDefaultAvatars(),
visFilter: visibility.NewFilter(state),
intFilter: interaction.NewFilter(state),
statusHashesToFilterableText: statusHashesToFilterableText,
}
}
58 changes: 13 additions & 45 deletions internal/typeutils/internaltofrontend.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ import (
"github.com/superseriousbusiness/gotosocial/internal/language"
"github.com/superseriousbusiness/gotosocial/internal/log"
"github.com/superseriousbusiness/gotosocial/internal/media"
"github.com/superseriousbusiness/gotosocial/internal/text"
"github.com/superseriousbusiness/gotosocial/internal/uris"
"github.com/superseriousbusiness/gotosocial/internal/util"
)
Expand Down Expand Up @@ -939,8 +938,18 @@ func (c *Converter) statusToAPIFilterResults(
return nil, nil
}

// Extract text fields from the status that we will match filters against.
fields := filterableTextFields(s)
// Derive a hash of this status.
statusHash := StatusHash(s)

// Check if we have the filterable
// text stored already for this hash.
statusText, stored := c.statusHashesToFilterableText.Get(statusHash)
if !stored {
// We don't have this filterable text
// cached, calculate + cache it now.
statusText = filterableText(s)
c.statusHashesToFilterableText.Set(statusHash, statusText)
}

// Record all matching warn filters and the reasons they matched.
filterResults := make([]apimodel.FilterResult, 0, len(filters))
Expand All @@ -956,14 +965,7 @@ func (c *Converter) statusToAPIFilterResults(
// List all matching keywords.
keywordMatches := make([]string, 0, len(filter.Keywords))
for _, filterKeyword := range filter.Keywords {
var isMatch bool
for _, field := range fields {
tsmethurst marked this conversation as resolved.
Show resolved Hide resolved
if filterKeyword.Regexp.MatchString(field) {
isMatch = true
break
}
}
if isMatch {
if filterKeyword.Regexp.MatchString(statusText) {
keywordMatches = append(keywordMatches, filterKeyword.Keyword)
}
}
Expand Down Expand Up @@ -1001,40 +1003,6 @@ func (c *Converter) statusToAPIFilterResults(
return filterResults, nil
}

// filterableTextFields returns all text from a status that we might want to filter on:
// - content
// - content warning
// - media descriptions
// - poll options
func filterableTextFields(s *gtsmodel.Status) []string {
fieldCount := 2 + len(s.Attachments)
if s.Poll != nil {
fieldCount += len(s.Poll.Options)
}
fields := make([]string, 0, fieldCount)

if s.Content != "" {
fields = append(fields, text.SanitizeToPlaintext(s.Content))
}
if s.ContentWarning != "" {
fields = append(fields, s.ContentWarning)
}
for _, attachment := range s.Attachments {
if attachment.Description != "" {
fields = append(fields, attachment.Description)
}
}
if s.Poll != nil {
for _, option := range s.Poll.Options {
if option != "" {
fields = append(fields, option)
}
}
}

return fields
}

// filterAppliesInContext returns whether a given filter applies in a given context.
func filterAppliesInContext(filter *gtsmodel.Filter, filterContext statusfilter.FilterContext) bool {
switch filterContext {
Expand Down
25 changes: 17 additions & 8 deletions internal/typeutils/internaltofrontend_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1063,15 +1063,22 @@ func (suite *InternalToFrontendTestSuite) TestHideFilteredBoostToFrontend() {

// Test that a hashtag filter for a hashtag in Mastodon HTML content works the way most users would expect.
func (suite *InternalToFrontendTestSuite) testHashtagFilteredStatusToFrontend(wholeWord bool, boost bool) {
testStatus := suite.testStatuses["admin_account_status_1"]
testStatus := new(gtsmodel.Status)
*testStatus = *suite.testStatuses["admin_account_status_1"]
testStatus.Content = `<p>doggo doggin' it</p><p><a href="https://example.test/tags/dogsofmastodon" class="mention hashtag" rel="tag nofollow noreferrer noopener" target="_blank">#<span>dogsofmastodon</span></a></p>`
testStatus.Text = "doggo doggin' it\n\n#dogsofmastodon"
tsmethurst marked this conversation as resolved.
Show resolved Hide resolved

if boost {
// Modify a fixture boost into a boost of the above status.
boostStatus := suite.testStatuses["admin_account_status_4"]
boostStatus.BoostOf = testStatus
boostStatus.BoostOfID = testStatus.ID
testStatus = boostStatus
boost, err := suite.typeconverter.StatusToBoost(
context.Background(),
testStatus,
suite.testAccounts["admin_account"],
"",
)
if err != nil {
suite.FailNow(err.Error())
}
testStatus = boost
}

requestingAccount := suite.testAccounts["local_account_1"]
Expand Down Expand Up @@ -1103,9 +1110,11 @@ func (suite *InternalToFrontendTestSuite) testHashtagFilteredStatusToFrontend(wh
[]*gtsmodel.Filter{filter},
nil,
)
if suite.NoError(err) {
suite.NotEmpty(apiStatus.Filtered)
if err != nil {
suite.FailNow(err.Error())
}

suite.NotEmpty(apiStatus.Filtered)
}

func (suite *InternalToFrontendTestSuite) TestHashtagWholeWordFilteredStatusToFrontend() {
Expand Down
75 changes: 75 additions & 0 deletions internal/typeutils/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package typeutils

import (
"context"
"encoding/hex"
"fmt"
"math"
"net/url"
Expand All @@ -27,6 +28,8 @@ import (
"strconv"
"strings"

"github.com/cespare/xxhash/v2"
"github.com/k3a/html2text"
apimodel "github.com/superseriousbusiness/gotosocial/internal/api/model"
"github.com/superseriousbusiness/gotosocial/internal/config"
"github.com/superseriousbusiness/gotosocial/internal/gtsmodel"
Expand Down Expand Up @@ -284,3 +287,75 @@ func ContentToContentLanguage(

return contentStr, langTagStr
}

// StatusHash returns an xxhash of text
// from a status, taking account of:
//
// - content warning
// - content
// - media IDs + descriptions
// - poll options
func StatusHash(s *gtsmodel.Status) string {
tsmethurst marked this conversation as resolved.
Show resolved Hide resolved
hash := xxhash.New()

// Content warning / title.
hash.WriteString(s.ContentWarning) // nolint:errcheck

// Status content.
hash.WriteString(s.Content) // nolint:errcheck

// Media IDs + descriptions.
for _, attachment := range s.Attachments {
hash.WriteString(attachment.ID) // nolint:errcheck
hash.WriteString(attachment.Description) // nolint:errcheck
}

// Poll options.
if s.Poll != nil {
for _, option := range s.Poll.Options {
hash.WriteString(option) // nolint:errcheck
}
}

sum := hash.Sum(nil)
return hex.EncodeToString(sum)
}

// filterableText concatenates text from a
// status that we might want to filter on:
//
// - content warning
// - content (converted to plaintext from HTML)
// - media descriptions
// - poll options
func filterableText(s *gtsmodel.Status) string {
fields := []string{}

// Content warning / title.
fields = append(fields, s.ContentWarning)

// Status content; use raw text if available,
// else use text parsed from content HTML.
if s.Text != "" {
tsmethurst marked this conversation as resolved.
Show resolved Hide resolved
fields = append(fields, s.Text)
} else {
text := html2text.HTML2TextWithOptions(
s.Content,
html2text.WithLinksInnerText(),
html2text.WithUnixLineBreaks(),
)
fields = append(fields, text)
}

// Media descriptions.
for _, attachment := range s.Attachments {
fields = append(fields, attachment.Description)
}

// Poll options.
if s.Poll != nil {
fields = append(fields, s.Poll.Options...)
}

return strings.Join(fields, " ")
tsmethurst marked this conversation as resolved.
Show resolved Hide resolved
}
Loading