Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Detection of mentions and hashtags in markdown can be improved #789

Closed
blackle opened this issue Aug 31, 2022 · 1 comment · Fixed by #1406
Closed

[bug] Detection of mentions and hashtags in markdown can be improved #789

blackle opened this issue Aug 31, 2022 · 1 comment · Fixed by #1406
Labels
bug Something isn't working

Comments

@blackle
Copy link
Contributor

blackle commented Aug 31, 2022

Describe the bug with a clear and concise description of what the bug is.

When posting markdown:

_#hashtag_

Won't be converted to an italicized link (only italicized). However:

_#hashtag #hashtag #hashtag_

Will convert all three

What's your GoToSocial Version?

v0.4.0

GoToSocial Arch

amd64 binary

What happened?

posting _#hashtag_ doesn't convert the hashtag to a link

What you expected to happen?

posting _#hashtag_ converts the hashtag to a link

How to reproduce it?

post _#hashtag_ with markdown formatting turned on

Anything else we need to know?

No response

@autumnull
Copy link
Contributor

okay so, after #1267, the behavior of this is slightly different - specifically the _#hashtag_ example stays the same but _#hashtag #hashtag #hashtag_ only turns the latter two into hashtags (because util.IsHashtagBoundary doesn't include _). This issue is specific to a hashtag following an underscore - *#hashtag* works as suggested.

Note that this issue is now also relevant to mentions -- the new mention parser actually requires that mentions be preceded by whitespace. It might make sense to make the allowed preceding characters for mentions and hashtags be the same, for consistency.

Fixing this issue would require changing which characters can precede or close a hashtag.

Aside: there are some parts of util.IsHashtagBoundary that are no longer necessary with the goldmark setup, specifically the examples this line are no longer an issue. Also tab, newline, form feed etc. are already included in whitespace so it's not necessary to allow control chars, in my opinion, and # is punctuation so it doesn't need a special case.

So it would be sufficient to say that mentions/hashtags can be preceded only by whitespace or punctuation, and similarly for the character that comes after. That would mean that, after unicode normalisation (thanks @illfygli), anything that is not a number or letter, found before whitespace or punctuation, causes a hashtag to be invalid. Which seems sensible enough.

The catch: some of these changes may cause issues with the plaintext format. For which i have a proposal:

What if we used goldmark to create a parser for plaintext too? It allows completely custom parsers, and we could just translate the current plaintext parsing code into a simple goldmark parser which picks out links, mentions, hashtags, linebreaks etc. using the same functions as the markdown parser. It might neaten the code and make the behavior of plaintext vs markdown more consistent.

Summary of suggestions:

  • normalise unicode in hashtags before rendering
  • edit util.IsHashtagBoundary to just allow whitespace and punctuation
  • edit the mention parser use util.IsHashtagBoundary to check prior characters
  • translate the plaintext parser to be a simple goldmark parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants