-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL unfurling (initial implementation) #3471
Conversation
Pull Request Checklist
|
Jenkins BuildsClick to see older builds (21)
|
@ilmotta amazing work, awesome quality!
Moving to a table would be a breaking change most likely, since we don't want to support both, and migrating would be painful.
I believe that's not necessary, we agreed with design/john that it would only be a on/off toggle
This could be important, depending on the size of images, but of course better in a separate PR
It could be, but probably is best to limit to a safe number, but only after we implement compression. Again, amazing work! |
@@ -129,7 +129,7 @@ func (s *ChatTestSuite) TestSerializeJSON() { | |||
|
|||
message.From = "0x04deaafa03e3a646e54a36ec3f6968c1d3686847d88420f00c0ab6ee517ee1893398fca28aacd2af74f2654738c21d10bad3d88dc64201ebe0de5cf1e313970d3d" | |||
message.Clock = 1 | |||
message.Text = "`some markdown text`" | |||
message.Text = "`some markdown text` https://status.im" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change relevant? we just want to check it doesn't choke on links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much, I guess it's one of those changes I did right at the beginning and it survived. Let me remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, I found a bug during development in the markdown parser. I still need to report it.
Extracted from the tests I wrote:
// There is a bug in the code that builds the AST from markdown text,
// because it removes the closing parenthesis, which means it won't be
// possible to unfurl this URL.
{args: "https://en.wikipedia.org/wiki/Status_message_(instant_messaging)", expected: []string{"https://en.wikipedia.org/wiki/Status_message_(instant_messaging"}},
protocol/linkpreview/linkpreview.go
Outdated
// UnfurlURLs assumes clients pass URLs verbatim that were validated and | ||
// processed by GetURLs. | ||
func UnfurlURLs(urls []string) ([]common.LinkPreview, error) { | ||
logger, err := zap.NewDevelopment() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally is passed around, so it logs in the right place, not a big deal though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand @cammellos. In this part I had to instantiate a zap logger and I passed it around. Should I do it differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the comment for reference, as @cammellos explained to me. It is important to get the logger instance from api.service.messenger
, that way, logs are written to geth.log
, otherwise they go to stdout only.
Done a4039c6
Good to know because just limiting is so much simpler also. I talked to John about this, but he hold the opinion that it would be better to avoid hardcoding a limit. We eventually agreed on the number 5. It's open for discussion how we want to limit. And I agree with you, an artificial limit is a safe choice for now.
I didn't quite understand @cammellos. In this part I had to instantiate a zap logger and I passed it around. Should I do it differently?
Cool, I got that wrong. So much simpler.
Yes, it would be painful. I saw a lot of serialized stuff in the user_messages table and I thought it be okayish since the implementation wouldn't be used yet (behind a feature flag). The broken window effect... I think the structure can accrete, for example when we need to store favicons. I was thinking I would just throw away the data in the migration since no user would be using the new unfurling yet. But now that you mentioned, I don't know, I could use a table for sure to avoid all this.
Thank you Andrea! |
c8315fd
to
0491a51
Compare
6d0a41b
to
5895354
Compare
@alexjba, @Samyoul, @caybro, @siphiuel, the implementation is mostly done and under scrutiny by QAs in I would prefer to merge this status-go PR with at least two approvals (but I only got one). Could one of you take a look at it please? Thanks a lot! |
5895354
to
f64e4ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lesson in PR-making
f64e4ec
to
54c6f0b
Compare
This is the introductory work to support the new requirements for unfurling URLs (while the message is a draft) and displaying link previews (after the message is sent). Refer to the related status-go PR for a lot more interesting details status-im/status-go#3471. Fixes #15469 ### Notes - The old link preview code will be removed separately, both in status-go and status-mobile. - I did the bulk of the work in status-go status-im/status-go#3471. If you want to understand how this is all implemented, do check out the status-go PR because I heavily documented the solution, rationale, next steps, etc. ### Performance Does the feature perform well? Yes, there's very little overhead because unfurling URLs happen in status-go and the event is debounced. I also payed special attention to use a simple caching mechanism to avoid doing unnecessary RPC requests to status-go if the URLs are cached in the client. I have some ideas on how to improve performance further, but not in this PR which is already screaming for reviews.
Summary
This PR is the initial implementation for the new URL unfurling requirements. The most important one is that only the message sender will pay the privacy cost for unfurling and extracting metadata from websites. Once the message is sent, the unfurled data will be stored at the protocol level and receivers will just profit and happily decode the metadata to render it.
There's a lot to share here because this is the first PR and I accumulated notes throughout development.
Further development of this URL unfurling capability will be mostly guided by issues created on clients. For the moment in
status-mobile
: https://github.com/status-im/status-mobile/labels/url-previewDemo
I have used this branch to implement the mobile experience you see in the video below. The mobile PR will be created once I'm confident (based on your reviews) that the solution is good enough, but
it will exist behind a dev-only feature flag, so that's why this status-go PR should be 100% backwards compatible.Edit: The feature will be available in mobile.link-previews.webm
Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the process of extracting metadata from a website, sort of lower level. I use "link preview" to mean a higher level structure which is enriched by unfurled data. "link preview" is also how designers refer to it.
User flows
Please, expand the Details section below to see detailed diagrams about user flows.
👇 See details
wakuext_sendChatMessages
. status-go assumes the link previews are good because it can't and shouldn't attempt to re-unfurl them.linkPreviews
, and the thumbnail URL will point to the local media server.Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on over the next couple weeks. In no order of importance:
Add support for the sender (only) to build up a dynamic allowlist.No allowlist will be necessary.publicsuffix.EffectiveTLDPlusOne
. This was essential, otherwise the default Gourl.Parse
approach would consider many invalid URLs valid, and thus the server would waste resources trying to unfurl the unfurleable.Other requirements
Decisions
Here are the important decisions I have made for this PR.
While the user typing in the input field, the client is constantly (debounced) asking status-go to parse the text and extract normalized URLs and then the client checks if they're already in its in-memory cache. If they are, no RPC call is made. I chose this approach to achieve the best possible performance in mobile and avoid the whole RPC overhead, since the chat experience is already not smooth enough. The mobile client uses URLs as cache keys in a hashmap, i.e. if the key is present, it means the preview is readily available (naive, but good enough for now). This decision also gave me more flexibility to find the best UX at this stage of the feature.
Due to the requirement that users should be able to see independent loading indicators for each link preview, when status-go can't unfurl a URL, it doesn't return it in the response.
As an initial implementation, I added the
BLOB
columnunfurled_links
to theuser_messages
table. The preview data is then serialized as JSON before being stored in this column. I felt that creating a separate table and the related code for this initial PR would be inconvenient. Is that reasonable to you? Once things stabilize I can create a proper table if we want to avoid this kind of solution with serialized columns.Don't we already have code to unfurl URLs?
Yes, in
protocol/urls/
, but I found it to be far from what we needed, so I created a new package inprotocols/linkpreview
. Once the new implementation stabilizes in the upcoming status-go PRs, I'll remove the old one. Nevertheless, the old implementation most certainly helped during development, since there are similarities.Currently, this PR should have no production impact, because the implementation in status-mobile is behind a feature flag, and status-desktop hasn't started the work yet. So we have time to gradually improve the implementation in follow-up PRs.Edit: the feature will be included in the next mobile release due to priority changes.