Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unfurling more websites #3530

Merged
merged 12 commits into from
Jun 5, 2023
Merged

Support unfurling more websites #3530

merged 12 commits into from
Jun 5, 2023

Conversation

ilmotta
Copy link
Contributor

@ilmotta ilmotta commented May 25, 2023

Summary

This PR adds support for unfurling a wider range of websites. Most code changes are related to the implementation of a new Unfurler, an OEmbedUnfurler, which is necessary to get metadata for Reddit URLs using oEmbed, since Reddit does not support OpenGraph meta tags. The new unfurler will also be useful for other websites, like Twitter. Also the user agent was changed, and now more websites consider status-go reasonably human.

Example hostnames that are now unfurleable: reddit.com, open.spotify.com, music.youtube.com

No breaking changes in this PR The PR in status-mobile will only bump the status-go version (not yet created).

Other improvements

  • Better error handling, especially because I wasn't wrapping errors correctly. I also removed the unnecessary custom error UnfurlErr.
  • I made tests truly deterministic by parameterizing the http.Client instance and by customizing its Transport field (except for some failing conditions where it's even good to hit the real servers). I think the solution is pretty decent and highly flexible (although I hardcoded one or two things for the scope of this PR). Do take a look at the StubTransport type in the tests and its usages. The goal was to help our test suite not get worse than it already is in terms of reliability and speed. I know there are other places in status-go doing something similar, and I've been wondering if a more unified HTTP stubbing solution would benefit us all. Feedback is more than welcomed! 🌟

For the future

There is a lot more to be done, like better timeout handling, performance improvements when unfurling multiple URLs. There are even scenarios where websites such as Reddit return 429s if more than one request is made in a 2s window, which will happen if users try to unfurl more than one Reddit URL in the same message. Edge case or not, unfurling URLs is like a can of worms.

Sooner or later we'll need to add support for even more websites, like Twitter, maybe Amazon, etc. In this PR I found out that Amazon sucks (why am I surprised?). They don't support OpenGraph, nor oEmbed. They really just want product sellers to paste their own auto-generated embeddable links. So for the moment, I ignored support for Amazon. Twitter is different and requires more handling behind the scenes, even though it supports OpenGraph and oEmbed. Reddit I only added basic unfurling support, because they don't offer any thumbnail URL in their oEmbed response and they do not support OpenGraph.

@ilmotta ilmotta self-assigned this May 25, 2023
@ghost
Copy link

ghost commented May 25, 2023

Pull Request Checklist

  • Have you updated the documentation, if impacted (e.g. docs.status.im)?
  • Have you tested changes with mobile?
  • Have you tested changes with desktop?

@status-im-auto
Copy link
Member

status-im-auto commented May 25, 2023

Jenkins Builds

Click to see older builds (27)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 3a5cb0b #1 2023-05-25 18:22:16 ~5 min ios 📦zip
✔️ 3a5cb0b #1 2023-05-25 18:26:36 ~9 min linux 📦zip
✔️ 3a5cb0b #1 2023-05-25 18:26:43 ~9 min android 📦aar
✖️ 3a5cb0b #1 2023-05-25 18:29:12 ~12 min tests 📄log
✔️ f555a48 #2 2023-05-25 18:54:40 ~2 min linux 📦zip
✔️ f555a48 #2 2023-05-25 18:55:54 ~4 min ios 📦zip
✔️ f555a48 #2 2023-05-25 18:56:19 ~4 min android 📦aar
✔️ f555a48 #2 2023-05-25 19:05:12 ~13 min tests 📄log
✔️ da6f2e7 #3 2023-05-26 12:19:50 ~2 min linux 📦zip
✔️ da6f2e7 #3 2023-05-26 12:21:17 ~3 min ios 📦zip
✔️ da6f2e7 #3 2023-05-26 12:21:30 ~3 min android 📦aar
✔️ da6f2e7 #3 2023-05-26 12:30:29 ~12 min tests 📄log
✔️ 015d88d #4 2023-05-26 12:21:48 ~1 min linux 📦zip
✔️ 015d88d #4 2023-05-26 12:24:07 ~2 min ios 📦zip
✔️ 015d88d #4 2023-05-26 12:24:43 ~3 min android 📦aar
✖️ 015d88d #4 2023-05-26 12:52:25 ~21 min tests 📄log
✔️ effd6b9 #5 2023-05-29 10:22:15 ~2 min ios 📦zip
✔️ effd6b9 #5 2023-05-29 10:28:02 ~8 min android 📦aar
✔️ effd6b9 #5 2023-05-29 10:29:10 ~9 min linux 📦zip
✖️ effd6b9 #5 2023-05-29 10:48:13 ~28 min tests 📄log
✔️ 20ca457 #6 2023-05-31 00:17:38 ~2 min linux 📦zip
✔️ 20ca457 #6 2023-05-31 00:19:29 ~4 min android 📦aar
✔️ 20ca457 #6 2023-05-31 00:19:57 ~4 min ios 📦zip
✔️ 823487b #7 2023-06-01 08:45:41 ~2 min linux 📦zip
✔️ 823487b #7 2023-06-01 08:47:09 ~4 min android 📦aar
✔️ 823487b #7 2023-06-01 08:47:10 ~4 min ios 📦zip
✖️ 823487b #7 2023-06-01 08:52:55 ~10 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 540e4ae #8 2023-06-02 18:08:55 ~2 min linux 📦zip
✔️ 540e4ae #8 2023-06-02 18:09:24 ~2 min ios 📦zip
✔️ 540e4ae #8 2023-06-02 18:10:46 ~4 min android 📦aar
✖️ 540e4ae #8 2023-06-02 18:19:18 ~12 min tests 📄log
✖️ 540e4ae #9 2023-06-02 19:25:33 ~21 min tests 📄log
✖️ 540e4ae #10 2023-06-02 19:53:15 ~21 min tests 📄log
✔️ 6481196 #9 2023-06-05 10:11:44 ~2 min linux 📦zip
✔️ 6481196 #9 2023-06-05 10:12:22 ~3 min ios 📦zip
✔️ 6481196 #9 2023-06-05 10:13:22 ~4 min android 📦aar
✔️ 6481196 #11 2023-06-05 10:22:58 ~13 min tests 📄log

@ilmotta ilmotta force-pushed the unfurl-more-websites branch from 3a5cb0b to f555a48 Compare May 25, 2023 18:51
Copy link
Contributor

@cammellos cammellos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, looks great!

protocol/linkpreview/linkpreview.go Outdated Show resolved Hide resolved
protocol/linkpreview/linkpreview.go Show resolved Hide resolved
protocol/linkpreview/linkpreview.go Outdated Show resolved Hide resolved
protocol/linkpreview/linkpreview.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cammellos cammellos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, looks great!

Copy link
Contributor

@cammellos cammellos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, looks great!

@ilmotta ilmotta force-pushed the unfurl-more-websites branch 2 times, most recently from 015d88d to effd6b9 Compare May 29, 2023 10:19
Copy link
Member

@Samyoul Samyoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely work @ilmotta.

Just a minor question, what impact if any does this functionality have on protocol/urls/urls.go?

Comment on lines 179 to 184
disabledStubs := false
if disabledStubs {
expected.Thumbnail.Width = 1280
expected.Thumbnail.Height = 720
expected.Thumbnail.DataURI = ""
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabledStubs is always false, the if statement can never be true.

Copy link
Contributor Author

@ilmotta ilmotta May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed @Samyoul, I used this flag during development, as a way of quickly switching on/off between stubbed and real HTTP calls. Since it's causing some confusion, I'll remove this.

Edit: Done in 20ca457

Comment on lines +165 to +166
url := "https://www.youtube.com/watch?v=lE4UXdJSJM4"
thumbnailURL := "https://i.ytimg.com/vi/lE4UXdJSJM4/maxresdefault.jpg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests need to consider segregating getting data from parsing data?

See #3529

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question @Samyoul, but I'm inferring a little bit what you mean, please correct me if anything is off. So far, I've focused on testing only the public function UnfurlURLs, but because it does basically everything under the hood, its tests are more sociable than solitary.

I think I could do a better job of testing some of the private functions too, since they have a narrower responsibility, and it would be easier to test this segregation of data fetching vs parsing you mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly off topic, I saw many examples in the code about using a mock server, but in retrospect, I don't remember anymore why I didn't go that route, which in my experience (in other languages) works really well too. In the end, I opted for direct control over a RoundTripper to give me more control (?). Maybe overkill? I must say I've had good results with both approaches.

And your comments on the issue about the flaky tests are on point :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I really have enjoyed reviewing this PR. Testing http requests are always tricky because the http client interface is generally embedded and segregating the getting concerns from the parsing concerns isn't a focus.

A mock client request verses a mock server response is basically removing the need for outbound calls. I've always focused on making the server mockable so that the client behaves as normally as possible during tests. But in your case I think that the server is not required because you pass in (inject) the client dependency and so you only really test the data parsing concern of your functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I may have just become a convert to client centric testing.

@ilmotta
Copy link
Contributor Author

ilmotta commented May 30, 2023

Lovely work @ilmotta.

Just a minor question, what impact if any does this functionality have on protocol/urls/urls.go?

No impact in theory @Samyoul. In the first PR for the new unfurling implementation, I decided to not touch the urls package because there were too many differences and I didn't want to cause a regression on the Desktop app. Now that the Desktop team will be starting the work on using the new endpoints, eventually we'll be able to remove the urls implementation.

Comment on lines +35 to +58
// RoundTrip returns a stubbed response if any matcher returns a non-nil
// http.Response. If no matcher is found and fallbackToDefaultTransport is true,
// then it executes the HTTP request using the default http transport.
//
// If StubTransport#disabledStubs is true, the default http transport is used.
func (t *StubTransport) RoundTrip(req *http.Request) (*http.Response, error) {
if t.disabledStubs {
return http.DefaultTransport.RoundTrip(req)
}

for _, matcher := range t.matchers {
res := matcher(req)
if res != nil {
return res, nil
}
}

if t.fallbackToDefaultTransport {
return http.DefaultTransport.RoundTrip(req)
}

return nil, fmt.Errorf("no HTTP matcher found")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@ilmotta ilmotta force-pushed the unfurl-more-websites branch 2 times, most recently from 823487b to 540e4ae Compare June 2, 2023 18:06
@ilmotta ilmotta force-pushed the unfurl-more-websites branch from 540e4ae to 6481196 Compare June 5, 2023 10:08
@ilmotta ilmotta merged commit 92b5d83 into develop Jun 5, 2023
@ilmotta ilmotta deleted the unfurl-more-websites branch June 5, 2023 10:46
ilmotta added a commit to status-im/status-mobile that referenced this pull request Jun 5, 2023
Bumps status-go's version to point to the status-go branch in PR
status-im/status-go#3530.

Fixes #15918
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants