The Topic Engine is an open-source distributed content intelligence system that enables autonomous discovery, monitoring, and processing of topic-specific content across the internet. It transforms unstructured web content into structured, actionable intelligence by understanding content patterns, site structures, and topic relevance.
- Intelligent Content Discovery: Autonomous identification and monitoring of relevant content sources
- Smart Content Processing: Advanced content extraction with fallback strategies
- Topic Classification: Efficient few-shot learning for content categorization
- Structured Data Extraction: Turn unstructured content into actionable data
- Geographic Context: Location-aware content processing and analysis
- Extensible Architecture: Plugin-based design for easy customization
The Topic Engine aims to be more than just a content processing system - it's a step towards democratizing content intelligence and enabling personal AI systems. Our goals include:
- Creating an open, collaborative platform for content processing and analysis
- Enabling individuals and small teams to build sophisticated content monitoring systems
- Fostering a community-driven approach to content intelligence
- Supporting the development of personal AI assistants and tools
The Topic Engine is built with:
- Django for the core framework
- SetFit for few-shot learning and classification
- Playwright for reliable content extraction
- PostgreSQL/PostGIS for data storage
- Modern async architecture for efficient processing
Key components include:
- Content source management and monitoring
- Multi-strategy content fetching system
- Topic hierarchy and classification
- Content processing pipelines
- Geographic context integration
- Python 3.12+
- PostgreSQL 15+ with PostGIS
- Redis (for caching and async tasks)
- Clone the repository:
git clone https://github.com/NimbleMachine-andrew/topic_engine.git
cd topic_engine
- We use UV to Manage dependencies:
uv venv
uv sync
-
Create a .env file.
Copy the env-example to .env, edit for your values.
-
Create a postgres/postgis database for the application:
Create a User:
sudo -u postgres psql << EOF
CREATE USER topic_engine WITH PASSWORD 'mypassword';
EOF
sudo -u postgres psql << EOF
DROP DATABASE IF EXISTS topic_engine;
CREATE DATABASE topic_engine ENCODING='UTF-8' OWNER topic_engine;
\c topic_engine
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;
ALTER DATABASE topic_engine OWNER TO topic_engine;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO topic_engine;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO topic_engine;
GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO topic_engine;
EOF
- Set up the database:
uv run manage.py migrate
- Pre-load the data:
./manage.py loaddata initial_topics
./manage.py loaddata initial_model_configs
./manage.py train_setfit_model personal_ai_medium
./manage.py predict_topics (also takes a batch-size parameter: --batch-size=50)
- Run the development server:
uv run manage.py runserver
We welcome contributions of all kinds! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.
- 🐛 Report bugs and suggest features
- 📝 Improve documentation
- 🔧 Submit pull requests
- 🎨 Help with UI/UX design
- 🧪 Add tests and improve coverage
- 🌍 Help with internationalization
See CONTRIBUTING.md for detailed guidelines.
The Topic Engine is more than a software project - it's an experiment in cooperative development and community-driven innovation. We're exploring ways to:
- Build a sustainable cooperative business model
- Create opportunities for contributors
- Foster collaboration and knowledge sharing
- Develop ethical approaches to AI and content processing
Join our community:
- Discord (Coming soon)
- Matrix Chat (Coming soon)
- Forum (Coming soon)
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.
We chose the AGPL to:
- Ensure the software remains open source
- Protect community contributions
- Support cooperative business models
- Maintain transparency and trust
Check our Project Board for current development priorities.
Near-term goals:
- Improve documentation and examples
- Add more content source types
- Enhance topic classification
- Implement plugin system
- Build community tools and resources
The Topic Engine builds on many excellent open source projects and ideas. Special thanks to:
- The SetFit team for few-shot learning
- Django community
- Playwright developers
- PostGIS contributors
Feel free to open an issue or join our community channels. We're here to help!