URL detection result of find method changed in v3 ? #351

sunadoi · 2021-10-06T06:44:18Z

When using find method to detect URL, I found that the detection results were different in v3 when there were no spaces before and after the URL.

v2

// URL
foo http://example.com bar
foo http://example.combar
foohttp://example.com bar
foohttp://example.combar
テストhttp://example.comテスト

v3

// URL
foo http://example.com bar
foo http://example.combar
foohttp://example.com bar

// Not URL
foohttp://example.combar
テストhttp://example.comテスト

Is this expected behavior?
If so, I would like to see the following fix, even if it’s only for multi-byte characters, because we often write like this in Japanese.

// URL
foo http://example.com bar
foo http://example.combar
foohttp://example.com bar
テストhttp://example.comテスト

// Not URL
foohttp://example.combar

ref: #315

The text was updated successfully, but these errors were encountered:

nfrasser · 2021-10-07T02:27:12Z

Hi @sunadoi, thanks for reporting.

The reasons for this regression in v3 are a bit complex related to the extended parsing I added to support Internationalized Domain Names (IDN). The parser now recognizes テスト as words, where in v2 they were treated as unknown symbols. The parser is greedy (tries to identify the longest possible tokens without backtracking) and since there is no delimiting whitespace it treats テストhttp as a word and the rest as an invalid URL.

I believe I can fix this by making a distinction in the parser between ASCII words and non-ASCII words. Unfortunately, because of ambiguity in these types of examples, the best I can get with this plugin will be the following (I used {{}} to mark which portions of text will be identified as links):


foo {{http://example.com}} bar
foo {{http://example.combar}}
foohttp://{{example.com}} bar
テスト{{http://example.comテスト}}

I hope that works for you because I unfortunately I cannot think of a good strategy to cover all edge cases like this.

sunadoi · 2021-10-08T05:44:01Z

@nfrasser

Thank you for your kind explanation.
The fix you suggested works for me.
I’ll be happy to see it😄

nfrasser · 2022-09-19T01:19:05Z

Fixed in the latest v4 release.

nfrasser self-assigned this Oct 7, 2021

nfrasser added this to the 3.0 milestone Oct 7, 2021

nfrasser mentioned this issue Oct 11, 2021

Scanner token parsing refactor #353

Merged

nfrasser mentioned this issue Oct 11, 2021

v4.0 #354

Merged

nfrasser added the i18n Internationalization label Oct 14, 2021

nfrasser modified the milestones: 3.0, 4.0 Oct 14, 2021

nfrasser mentioned this issue Oct 19, 2021

the string 'fasfafhttp://google.com' recognition error #360

Closed

nfrasser added the pending-merge label Jan 31, 2022

nfrasser closed this as completed in e1237f7 Sep 19, 2022

nfrasser closed this as completed in #354 Sep 19, 2022

nfrasser mentioned this issue Nov 9, 2023

Question:I want to match a link from a string that does not contain spaces #446

Closed

nfrasser mentioned this issue Feb 5, 2024

test() and find() do not find the same strings as valid/invalid links #472

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL detection result of find method changed in v3 ? #351

URL detection result of find method changed in v3 ? #351

sunadoi commented Oct 6, 2021

nfrasser commented Oct 7, 2021 •

edited

Loading

sunadoi commented Oct 8, 2021

nfrasser commented Sep 19, 2022

URL detection result of find method changed in v3 ? #351

URL detection result of find method changed in v3 ? #351

Comments

sunadoi commented Oct 6, 2021

nfrasser commented Oct 7, 2021 • edited Loading

sunadoi commented Oct 8, 2021

nfrasser commented Sep 19, 2022

nfrasser commented Oct 7, 2021 •

edited

Loading