storyseedling.com no content because adding protection #1560

rizkiv1 · 2024-10-28T01:07:59Z

Bug:
Storyseedling seems update the site. Chapters are need to load a few secs for english content to appear. And before english words seems they load chinese(?) content.

To Reproduce
Steps to reproduce the behavior:
sample TOC: https://storyseedling.com/series/177491/
sample chapter: https://storyseedling.com/series/177491/v1/1/
When finished, you cant find content of chapter.

Expected behavior
Containing content of chapter.

Desktop:

OS: Windows 10
Browser Firefox 131.0.3
Version 1.0.0.0

Smartphone:

Device: Realme C51s
OS: Android 13
Browser Kiwi Browser 124.0.6327.4
Version 1.0.0.0

rizkiv1 · 2024-10-28T10:57:03Z

Seems they add encryption using custom font. Idk how they do that, because copying text directly resulted in chinese text and alphanumeric words.

dnshipit · 2024-11-01T06:04:09Z

They seems to be doing character replacement for the content in the back end and just use a custom font that's generated directly from code to render everything in English.

All the characters of the content is just a character shift from alphanumeric into some random unicode character.

I think there is only a few options left to scrape this site:

decrypt their content using character frequency analytic (as long as they only use direct character replacement)
render the whole page, take screenshot of the content pixel by pixel and then use some OCR software to translate it back to text

Elthara · 2024-11-07T09:24:41Z

It seems like it's just a character swap, similar to second life translations, just not using English characters.

a:⽜
A:⽂
b:⽝
B:⽃
c:⽞
C:⽄
d:⽟
D:⽅
e:⽠
E:⽆
f:⽡
F:⽇
g:⽢
G:⽈
h:⽣
H:⽉
i:⽤
I:⽊
j:⽥
J:⽋
k:⽦
K:⽌
l:⽧
L:⽍
m:⽨
M:⽎
n:⽩
N:⽏
o:⽪
O:⽐
p:⽫
P:⽑
q:⽬
Q:⽒
r:⽭
R:⽓
s:⽮
S:⽔
t:⽯
T:⽕
u:⽰
U:⽖
v:⽱
V:⽗
w:⽲
W:⽘
x:⽳
X:⽙
y:⽴
Y:⽚
z:⽵
Z:⽛

dnshipit · 2024-11-08T23:55:29Z

Yeah, It's a very simple character swap scheme. However, they can update the character swap mapping anytime and dynamically for every document too. That's why in the long run it would be safer to do a cryptographic decoding or OCR approach.

One thing that can help with decoding is the HTML meta tag. I notice in all the chapters they have at least a few sentences of the content available in normal English for SEO purpose. Those sentences are then encoded in the full content part. For simple character swap cypher like that, it would eliminate quite a few characters in advance.

dteviot · 2024-11-09T05:32:37Z

They're not just doing character replacement. The content isn't on the initial page that is downloaded. Instead a second call is made to get the content.

e.g. For first chapter with URL of https://storyseedling.com/series/177491/v1/1/, content is obtained with a POST to https://storyseedling.com/series/177491/v1/1/content

Unfortunately, I'm having some problems reproducing the call.

Time taken: 42 minutes

yuyu-cloud · 2024-11-10T12:23:45Z

I apologise if I maybe oversimplified this issue, but, given the blank output Webtoepub now generates (due to the new protection scheme), would it be possible to be able to still extract the encrypted source content (that presently has the Chinese characters and other html miscellany in the html data), and then use an in-built script in ePubEditor to decrypt the character swap scheme using the cryptographic decoding method a user above mentioned so any new changes to the site can still possibly be undone (like add an additional button for Story Seedling, the same as Chrysanthemum Garden)? Since the extension wouldn't be attempting to directly bypass these protections and therefore not violate Chrome policy.

Just putting my two cents in hope there's some good workaround to still extract the source content from this site, since it has such a wide selection of novels... 😭🥺😢

dnshipit · 2024-11-11T00:01:12Z

e.g. For first chapter with URL of storyseedling.com/series/177491/v1/1, content is obtained with a POST to storyseedling.com/series/177491/v1/1/content

According to my analysis of their website script. They are using cloudflare turnstile as the captcha method for the content call. Instead of doing a fetch directly, this might require a slow full page load and extract the content after captcha have been cleared.

bonnetchuu · 2024-11-11T08:07:40Z

Also came for a resolution to this site no longer working (but didn't want to open a duplicate issue)... TT_TT

norabelle101 · 2024-12-09T02:52:08Z

Hello, came to ask if there's maybe any updates on this issue... Will Webtoepub be able to make it work for this site somehow despite the new copy protection being on the trickier side? 😭😭 Aside from the character swap scheme, the actual chapter content being located elsewhere seems similar to [https://readhive.org/] in how not clicking a chapter directly on the TOC shows only a few paragraphs of the chapter with the "Continue Reading (3...2...1)" prompt 😔 Though this latter site works well with Webtoepub by being able to grab all text + footnotes (so maybe there's some way for storyseedling too despite the copy security...? hopefully...) 🥺🥲😭

sebbu2 · 2024-12-23T12:32:58Z

They already have a long timeout before continuing if you try to make an epub for a novel with several dozen chapters, so the "slow" full page load won't actually make the whole epub generation slower in any really noticeable manner, it'll all be amortized.

Even if it did, it would still be an accepted solution to me, make the epub, do something else, come back after watching an episode, playing a game or reading a book to get that epub.

Edit: even if it was just the content in chinese alphabet and the custom font to make it look correctly and be readable, it'ld be an acceptable (temporary?) solution to me as long as i can read all the chapters of the epub in my epub reader on my tablet.

dteviot · 2024-12-30T01:41:16Z

@norabelle101

came to ask if there's maybe any updates on this issue.

Short answer: no.
Long answer: The problem (which I'm seeing more and more) is sites where the content isn't in the HTML of the page. Instead, the page uses javascript to make JSON calls to fetch the content, which it then modifies and inserts into the page. The way WebToEpub works is I have to reverse engineer the javascript and JSON, then put the resulting code into WebToEpub. Obviously, this is a lot of work.
The (obvious, simple, hah!) solution to this is to change WebToEpub, to instead open each page as a tab in the browser, so that the browser will run the Javascript, make the JSON call, etc. and then have WebToEpub extract the content for the finished web page. Unfortunately, doing this is not a small job. And every time I look at doing it, I see how big it is and, well, decide I want to do other things with that much time. I was hoping I would find some time to start this Christmas, but well, RL got in the way so far.

Jokakun · 2025-01-01T10:20:33Z

I know that you ain't paid for this, but i hope you'll come back for this one day as this site is one of the many sites that have a lot of novels that is updated daily.

bookimp · 2025-01-15T01:59:56Z

I know that you ain't paid for this, but i hope you'll come back for this one day as this site is one of the many sites that have a lot of novels that is updated daily.

Agreed, but I'm glad I searched first before opening an issue.

dteviot mentioned this issue Jan 26, 2025

Issue on story seedling #1648

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storyseedling.com no content because adding protection #1560

storyseedling.com no content because adding protection #1560

rizkiv1 commented Oct 28, 2024

rizkiv1 commented Oct 28, 2024

dnshipit commented Nov 1, 2024

Elthara commented Nov 7, 2024

dnshipit commented Nov 8, 2024

dteviot commented Nov 9, 2024

yuyu-cloud commented Nov 10, 2024 •

edited

Loading

dnshipit commented Nov 11, 2024

bonnetchuu commented Nov 11, 2024

norabelle101 commented Dec 9, 2024

sebbu2 commented Dec 23, 2024 •

edited

Loading

dteviot commented Dec 30, 2024

Jokakun commented Jan 1, 2025

bookimp commented Jan 15, 2025

storyseedling.com no content because adding protection #1560

storyseedling.com no content because adding protection #1560

Comments

rizkiv1 commented Oct 28, 2024

rizkiv1 commented Oct 28, 2024

dnshipit commented Nov 1, 2024

Elthara commented Nov 7, 2024

dnshipit commented Nov 8, 2024

dteviot commented Nov 9, 2024

yuyu-cloud commented Nov 10, 2024 • edited Loading

dnshipit commented Nov 11, 2024

bonnetchuu commented Nov 11, 2024

norabelle101 commented Dec 9, 2024

sebbu2 commented Dec 23, 2024 • edited Loading

dteviot commented Dec 30, 2024

Jokakun commented Jan 1, 2025

bookimp commented Jan 15, 2025

yuyu-cloud commented Nov 10, 2024 •

edited

Loading

sebbu2 commented Dec 23, 2024 •

edited

Loading