SOSSE 🦦

SOSSE (Selenium Open Source Search Engine) is a web archiving software, crawler, and search engine. It’s hosted on both GitLab and GitHub. Feel free to use either platform to submit feature requests, bug reports, merge requests, or start a discussion.

Key Features

🌍 Web Page Search: Search the content of web pages, including dynamically rendered ones, with advanced queries. (doc)
🕑 Recurring Crawling: Crawl pages at fixed intervals or adapt the rate based on content changes. (doc)
🔖 Web Page Archiving: Archive HTML content, adjust links for local use, download required assets, and support dynamic content. (doc)
📂 File Downloads: Batch download binary files from web pages. (doc)
🔔 Atom Feeds: Generate content feeds for websites that don’t have them, or receive updates when a new page containing a keyword is published. (doc)
🔒 Authentication: The crawler can authenticate to access private pages and retrieve content. (doc)
👥 Permissions: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously. (doc)
👤 Search Features: Includes private search history (doc), and external search engine shortcuts (doc), etc.

Explore the 📚 documentation and check out some 📷 screenshots.

SOSSE is written in Python and is distributed under the GNU AGPLv3 license. It uses browser-based crawling with Mozilla Firefox or Google Chromium alongside Selenium to index pages that rely on JavaScript. For faster crawling, Requests can also be used. SOSSE uses PostgreSQL for data storage.

Try It Out

To quickly try the latest version with Docker:

docker run -p 8005:80 biolds/sosse:latest

Then, open http://127.0.0.1:8005/ and log in with the username admin and password admin.

For persistence of Docker data or alternative installation methods, please refer to the installation guide.

Stay Connected

Join the Discord server to get help, share ideas, or discuss SOSSE!

Name		Name	Last commit message	Last commit date
Latest commit History 888 Commits
.github		.github
.gitlab		.gitlab
debian		debian
doc		doc
docker		docker
se		se
sosse		sosse
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sosse-admin		sosse-admin
swagger-initializer.js		swagger-initializer.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOSSE 🦦

Key Features

Try It Out

Stay Connected

About

Releases

Languages

License

biolds/sosse

Folders and files

Latest commit

History

Repository files navigation

SOSSE 🦦

Key Features

Try It Out

Stay Connected

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages