Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: get Xiaohongshu fulltext #17075

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

fix: get Xiaohongshu fulltext #17075

wants to merge 11 commits into from

Conversation

rien7
Copy link
Contributor

@rien7 rien7 commented Oct 10, 2024

Involved Issue / 该 PR 相关 Issue

Close #16300

Example for the Proposed Route(s) / 路由地址示例

/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext

New RSS Route Checklist / 新 RSS 路由检查表

  • New Route / 新的路由
  • Anti-bot or rate limit / 反爬/频率限制
    • If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
  • Date and time / 日期和时间
    • Parsed / 可以解析
    • Correct time zone / 时区正确
  • New package added / 添加了新的包
  • Puppeteer

Note / 说明

Need to add XIAOHONGSHU_COOKIE

@github-actions github-actions bot added the Route label Oct 10, 2024
lib/routes/ipswdev/index.ts Fixed Show fixed Hide fixed
lib/routes/xiaohongshu/notes.ts Fixed Show fixed Hide fixed
@github-actions github-actions bot added the Auto: Route Test Complete Auto route test has finished on given PR label Oct 10, 2024
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: Could not get user information and note list
TargetCloseError: Page closed!
Route: /xiaohongshu/user/:user_id/notes/fulltext
Full Route: /xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext
Node Version: v22.9.0
Git Hash: 7c52019d

Copy link
Contributor

Successfully generated as following:

http://localhost:1200/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: Could not get user information and note list
TargetCloseError: Page closed!
Route: /xiaohongshu/user/:user_id/notes/:fulltext
Full Route: /xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext
Node Version: v22.9.0
Git Hash: ccb76194

Copy link
Contributor

Successfully generated as following:

http://localhost:1200/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: Could not get user information and note list
TargetCloseError: Page closed!
Route: /xiaohongshu/user/:user_id/notes/:fulltext
Full Route: /xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext
Node Version: v22.9.0
Git Hash: 06343cfa

Copy link
Contributor

Successfully generated as following:

http://localhost:1200/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: Could not get user information and note list
TargetCloseError: Page closed!
Route: /xiaohongshu/user/:user_id/notes/:fulltext
Full Route: /xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext
Node Version: v22.9.0
Git Hash: 7ad18389

@zucchiniEvader

This comment was marked as off-topic.

@rien7
Copy link
Contributor Author

rien7 commented Oct 14, 2024

有希望吗

需要自建

@zucchiniEvader
Copy link

有希望吗

需要自建

好的,我拉下你的代码试下

Copy link
Contributor

Successfully generated as following:

http://localhost:1200/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: Could not get user information and note list
TargetCloseError: Page closed!
Route: /xiaohongshu/user/:user_id/notes/:fulltext
Full Route: /xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext
Node Version: v22.9.0
Git Hash: 7e168ae9

@dddaniel1
Copy link
Contributor

有希望吗

需要自建

代码似乎还有点问题,我拉你仓库改了下,能用了。

@zucchiniEvader
Copy link

会合并吗 @DIYgod

async function getUser(url, cookie) {
const res = await got(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSSHub will use a randomised user agent of Chrome on mac by default. Does the site only work with this fixed version of Chrome on Windows?

const data = (await cache.tryGet(link, async () => {
const res = await got(link, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSSHub will use a randomised user agent of Chrome on mac by default. Does the site only works with this fixed version of Chrome on Windows?

Comment on lines +67 to +72
const res = await got(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
Cookie: cookie,
},
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request does not require cookie.

Comment on lines +75 to +82
let script = $('script')
.filter((i, script) => {
const text = script.children[0]?.data;
return text?.startsWith('window.__INITIAL_STATE__=');
})
.text();
script = script.slice('window.__INITIAL_STATE__='.length);
script = script.replaceAll('undefined', 'null');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let script = $('script')
.filter((i, script) => {
const text = script.children[0]?.data;
return text?.startsWith('window.__INITIAL_STATE__=');
})
.text();
script = script.slice('window.__INITIAL_STATE__='.length);
script = script.replaceAll('undefined', 'null');
const script = $("script:contains('__INITIAL_STATE__')")
.text()
.match(/window\.__INITIAL_STATE__=(.*)/)?.[1]
?.replaceAll('undefined', 'null');

Comment on lines +87 to +105
async function renderNotesFulltext(notes, url) {
const data: any[] = [];
const promises = notes.flatMap((note) =>
note.map(async ({ noteCard }) => {
const link = `${url}/${noteCard.noteId}`;
const { title, description, pubDate } = await getFullNote(link);
return {
title,
link,
description,
author: noteCard.user.nickName,
guid: noteCard.noteId,
pubDate,
};
})
);
data.push(...(await Promise.all(promises)));
return data;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary spreading and push

Suggested change
async function renderNotesFulltext(notes, url) {
const data: any[] = [];
const promises = notes.flatMap((note) =>
note.map(async ({ noteCard }) => {
const link = `${url}/${noteCard.noteId}`;
const { title, description, pubDate } = await getFullNote(link);
return {
title,
link,
description,
author: noteCard.user.nickName,
guid: noteCard.noteId,
pubDate,
};
})
);
data.push(...(await Promise.all(promises)));
return data;
}
function renderNotesFulltext(notes, url) {
const promises = notes.flatMap((note) =>
note.map(async ({ noteCard }) => {
const link = `${url}/${noteCard.noteId}`;
const { title, description, pubDate } = await getFullNote(link);
return {
title,
link,
description,
author: noteCard.user.nickName,
guid: noteCard.noteId,
pubDate,
};
})
);
return Promise.all(promises);
}

Comment on lines +117 to +124
let script = $('script')
.filter((i, script) => {
const text = script.children[0]?.data;
return text?.startsWith('window.__INITIAL_STATE__=');
})
.text();
script = script.slice('window.__INITIAL_STATE__='.length);
script = script.replaceAll('undefined', 'null');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor

Successfully generated as following:

http://localhost:1200/xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: Could not get user information and note list
TargetCloseError: Page closed!
Route: /xiaohongshu/user/:user_id/notes/:fulltext
Full Route: /xiaohongshu/user/5b55f1534eacab0302da1f02/notes/fulltext
Node Version: v22.11.0
Git Hash: dca14f95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Auto: Route Test Complete Auto route test has finished on given PR Route
Projects
None yet
Development

Successfully merging this pull request may close these issues.

小红书文章列表加载不出来
4 participants