Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test() and find() do not find the same strings as valid/invalid links #472

Closed
rfgamaral opened this issue Feb 5, 2024 · 6 comments
Closed

Comments

@rfgamaral
Copy link

Hello there 👋.

Going straight to the point:

find()

linkify.find('1.nf3')

Output:

[{
  "type": "url",
  "value": "1.nf",
  "isLink": true,
  "href": "http://1.nf",
  "start": 0,
  "end": 4
}]

test()

linkify.test("1.nf3")

Output:

false

I understand that Linkify is not 100% spec compliant, but shouldn't test() and find() return the same output? More importantly, linkify.find('1.nf3') should return [] because .nf3 is not a valid TLD while .nf is (ref).

Is this a bug, or is this somehow intended?

@nfrasser
Copy link
Owner

nfrasser commented Feb 5, 2024

Hi @rfgamaral, thanks for the report. The output of find has 1.nf, which is a valid URL. Notice that it excludes the final 3. test returns false because while "1.nf3" does contain a valid URL, the full string is not a URL.

You can see this this clearly when you run linkify.tokenize:

linkify.tokenize("1.nf3").map(token => token.toObject())

Output:

[
    {
        "type": "url",
        "value": "1.nf",
        "isLink": true,
        "href": "http://1.nf",
        "start": 0,
        "end": 4
    },
    {
        "type": "text",
        "value": "3",
        "isLink": false,
        "href": "3",
        "start": 4,
        "end": 5
    }
]

linkify.test runs tokenize and expects exactly one token with isLink: true.

Hope that helps

@nfrasser nfrasser closed this as completed Feb 5, 2024
@rfgamaral
Copy link
Author

@nfrasser Thank you for the quick reply, however, this creates a problem on our side, that I'm not exactly sure how to fix.

This issue was first reported to one of our products in the context of a rich-text editor. A user typed 1.nf3, and ended up having the 1.nf string linked, while the 3 at the end was not. From our perspective (and the users', nothing should be linked in this string). Our rich-text editor uses Tiptap, which in turn uses Linkify and the find function to find links. While typing a string such as 1.nf3<space>, when pressing <space>, this is when the find function is executed and finds the link. From a rich-text editor perspective, typing something like 1.nf3 shouldn't auto link the string because .nf3 is not a valid TLD, which the test function seems to be aware of.

Do you have any suggestions on how to approach "fixing" or improving this behaviour based on the rationale explained above? The auto linking mechanism in Tiptap provides a validate function before doing the actual auto linking in the editor, I could use that and call linkify.test(str) to validate the link based on the test function. Would that make sense, or do you think that could cause issues down the line?

@nfrasser
Copy link
Owner

nfrasser commented Feb 5, 2024

@rfgamaral thanks for the additional explanation, I can definitely empathize with the expected behaviour here. Unfortunately I don't have an easy fix for this. Linkify's parser implementation has to support cases where there are no spaces around the URL. A fix for this would likely mean breaking other valid cases. It comes down to the inherent ambiguity of mixing plain-text with machine-readable strings.

Related issues to showcase the kinds of challenges I have to account for:

If you haven't already, I would suggest reporting this issue with Tiptap. Perhaps they can tweak their editor's implementation to avoid detecting links in the string immediately before the <space> when only more than one token is detected.

Another potential option: Try linkify v3 instead of v4, which has a slightly different parsing implementation. The drawback there is that since it's an older version, it might have other bugs or missing features.

Edit: Typo fix

@rfgamaral
Copy link
Author

If you haven't already, I would suggest reporting this issue with Tiptap.

I'm prepared to tackle this myself and open a PR against Tiptap, I'm just trying to understand what would be the best solution.

Perhaps they can tweak their editor's implementation to avoid detecting links in the string immediately before the <space> when only one token is detected.

I think that might effectively disable auto-linking for things like example.com, which is not the intended behaviour. Instead, I was thinking that I could use tokenize (instead of find, which is what Tiptap currently uses), and add some logic to validate links, perhaps comparing the end and start positions of multiple tokens, or, even better, do the same validation that test already does:

linkify.test runs tokenize and expects exactly one token with isLink: true.

My only concern is that docs mention that tokenize is an internal function, so I'm not sure if I should be using that directly. Right?

However, this feels like the type of link validation that I would expect to exist in a rich-text editor with auto-linking functionality. So, to summarize, I could open a PR in Tiptap to use tokenize instead of find and do the same validation that test does, or, alternatively, use the validation that Tiptap provides to call test to validate the link. The advantage of the latter would be that I could push this change to our users immediately, without waiting for PR approval on the Tiptap side (and they could not agree with the solution), and a new version to be published. Unless, there's some downside, I should be aware?

@rfgamaral
Copy link
Author

FYI, I ended up opening a PR on Tiptap replacing find with tokenize, and do a similar check as test (ref).

@razorshaman909
Copy link

razorshaman909 commented May 9, 2024

Similar situation here:

linkify.test("https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/")

returns as false, whereas

linkify.find("https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/")
returns

[
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 0,
        "end": 28
    },
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 32,
        "end": 60
    },
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 65,
        "end": 93
    },
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 97,
        "end": 125
    }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants