Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SubstackBridge] Add Substack bridge #4174

Merged
merged 5 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions bridges/SubstackBridge.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?php

class SubstackBridge extends FeedExpander
{
const MAINTAINER = 'sqrtminusone';
const NAME = 'Substack Bridge';
const URI = 'https://substack.com/';
const CACHE_TIMEOUT = 3600; //1hour
const DESCRIPTION = 'Access Substack. Add full content for paywalled posts if you have a session cookie with an active subscription.';

const CONFIGURATION = [
'sid' => [
'required' => false,
]
];

const PARAMETERS = [
'' => [
'url' => [
'name' => 'Substack RSS URL',
'required' => true,
'type' => 'text',
'defaultValue' => 'https://newsletter.pragmaticengineer.com/feed',
'title' => 'Usually https://<blog-url>/feed'
]
]
];

public function collectData()
{
$headers = [];
if ($this->getOption('sid')) {
$url_parsed = parse_url($this->getInput('url'));
$authority = $url_parsed['host'];
$cookies = [
'ab_experiment_sampled=%22false%22',
'substack.sid=' . $this->getOption('sid'),
'substack.lli=1',
'intro_popup_last_hidden_at=' . (new DateTime())->format('Y-m-d\TH:i:s.v\Z')
];
$headers = [
'Authority: ' . $authority,
'Cache-Control: max-age=0',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',
'Cookie: ' . implode('; ', $cookies)
];
}
$this->collectExpandableDatas($this->getInput('url'), -1, $headers);
}
}
18 changes: 18 additions & 0 deletions docs/10_Bridge_Specific/Substack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# SubstackBridge

[Substack](https://substack.com) provides RSS feeds at `/feed` path, e.g., https://newsletter.pragmaticengineer.com/feed/. However, these feeds have two problems, addressed by this bridge:
- They use RSS 2.0 with the draft [content extension](https://web.resource.org/rss/1.0/modules/content/), which isn't supported by some readers;
- They don't have the full content for paywalled posts.

Retrieving the full content is only possible _with an active subscription to the blog_. If you have one, Substack will return the full feed if it's fetched with the right set of cookies. Figuring out whether it's the intended behaviour is left as an exercise for the reader.

To obtain the session cookie, authorize at https://substack.com/, open DevTools, go to Application -> Cookies -> https://substack.com, copy the value of `substack.sid` and paste it to the RSS bridge config:

```
[SubstackBridge]
sid = "<your-sid>"
```

Authorization sometimes requires CAPTCHA, hence this operation is manual. The cookie lives for three months.

After you've done this, the bridge should return full feeds for your subscriptions.
4 changes: 2 additions & 2 deletions lib/FeedExpander.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ abstract class FeedExpander extends BridgeAbstract
{
private array $feed;

public function collectExpandableDatas(string $url, $maxItems = -1)
public function collectExpandableDatas(string $url, $maxItems = -1, $headers = [])
{
if (!$url) {
throw new \Exception('There is no $url for this RSS expander');
Expand All @@ -17,7 +17,7 @@ public function collectExpandableDatas(string $url, $maxItems = -1)
$maxItems = 999;
}
$accept = [MrssFormat::MIME_TYPE, AtomFormat::MIME_TYPE, '*/*'];
$httpHeaders = ['Accept: ' . implode(', ', $accept)];
$httpHeaders = array_merge(['Accept: ' . implode(', ', $accept)], $headers);
$xmlString = getContents($url, $httpHeaders);
if ($xmlString === '') {
throw new \Exception(sprintf('Unable to parse xml from `%s` because we got the empty string', $url), 10);
Expand Down
Loading