Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store multiple versions pages for the index & a WBM style copy if site goes down. #71

Open
blackforestboi opened this issue Jan 17, 2017 · 4 comments

Comments

@blackforestboi
Copy link
Member

blackforestboi commented Jan 17, 2017

I got a message from Sahil via email, this is his question/proposal:

Can we save multiple copies of the index (i.e. what happens when a page goes down, or link has 404? Will users be redirected to Wayback Machine?

My answer:
Not yet, up to that point we do not store the page completely so that you can re-visit it without a connection or take multiple versions into account. We only store the text so you can search it again.
But this is a feature planned for the future. (also excellent idea to connect it to the WBM!! :)

Additional to answer:
I think it's future stuff for now, but we definitely should consider it to store a readable version of a website and/or connect feed into WBM.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@Droyk
Copy link

Droyk commented Jan 17, 2017

I think the size is the problem then size of the database file will be pretty big even if you remove the all the elements except text one.

We are talking about storing the data of 100's of webpages on my work I view 200 to 300 sites daily.

@blackforestboi
Copy link
Member Author

Well, the text itself is not so big, Its basically what we store in the DB. But storing a complete HTML to make it retrievable as it was is definitely big.

A reader version is thinkable, as it would pull the text that is already in the DB.

@Droyk
Copy link

Droyk commented Jan 17, 2017

or just add an option in setting menu or in extension icon menu to store the page in your servers like Wayback machine does... the size won't be that much of a problem then or I think it will solve most of the problem the only disadvantage is privacy though ;(

@blackforestboi
Copy link
Member Author

good idea!

Will keep it in mind. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants