The Real Time Social Media Content Retrieval System is a platform designed to retrieve real-time posts from LinkedIn based on user queries. Users can input their query, and the system will fetch relevant posts from LinkedIn in real-time. It allows users to fetch multiple posts and select how many similar results they want to retrieve from the database. While currently limited to LinkedIn posts, the system can be extended to include other social media platforms, enabling users to find similar posts across various social media channels.
Currently, the system only supports live LinkedIn posts retrieval. However, users can extend the functionality to other social media platforms by fetching data and storing it in a specific format in the backend. The fetched data should be stored in JSON files within the data
folder with the following format:
{
"Name": "<account_name>",
"Posts": {
"<Post_ID>": {
"text": "<fetched_data>",
"post_owner": "<account_name>",
"source": "<social media handle name like Linkedin>"
}
}
}
The project utilizes several technologies to create a streamlined pipeline:
- Bytewax: Used to create a fully streamlined pipeline.
- Qdrant: Serves as the vectordatabase, internally built using Rust for faster data processing.
- Pydantic: Used for data validation and models.
- Streamlit: Provides a simple user interface for the system, developed in Python.
- Selenium: To create automation workflow using Python.
- BeautifulSoup: Scrape the data from the HTML pages.
To run this project on your machine, follow these steps:
- Create a virtual environment:
python3 -m venv venv
- Activate the environment:
- Windows:
venv\Scripts\activate
- macOS and Linux:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Ensure Docker is installed and run the Qdrant container:
sudo docker run -d -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant
- Run the Streamlit app:
streamlit run app.py
- Access the UI:
Open your web browser and navigate to localhost:8501
to start using the Real-Time Social Media Content Retrieval System.
To use this app, follow the steps below:
- Add your LinkedIn username, password, and the account username from which you want to fetch the posts.
- Click on the "Fetch Details" button.
- Wait for some time as the app will automatically open LinkedIn and fetch the posts.
- If you have already fetched data and stored it in the Data folder in JSON format, you can directly migrate the data.
- Ensure that your custom data follows the mentioned JSON format.
- Use the provided migration tool or script to migrate the data to Vectordb. Follow any instructions or guidelines provided with the migration tool to complete the process.
- Once the migration is completed successfully, you can start searching in the database.
- Access the user interface of the application.
- From the left-side panel, select the number of results you want to fetch from the database.
- Enter your query in the search bar and initiate the search.
- The application will retrieve and display relevant posts from the database based on your query.
Developers are welcome to contribute to this project. Here's how:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and ensure clean code.
- Write tests for your changes (if applicable).
- Commit your changes with a descriptive message.
- Push your branch to your forked repository.
- Create a pull request with a detailed description of your changes.
For any suggestions, comments, or inquiries, please contact [email protected]
or reach out via LinkedIn: https://www.linkedin.com/in/manthanbhikadiya/
. Your inputs are highly appreciated and will contribute to making this project more beneficial for users.
- Many thanks to Paul Lusztin for generously providing the code and an efficient pipeline for the Retrieval Data System. This project wouldn't have been possible without your contribution. I strongly encourage everyone to subscribe to the newsletter.
- Github Repo: https://github.com/decodingml/articles-code/tree/main/articles/large_language_models/real_time_retrieval_system_for_social_media_data
- Newsletter: https://decodingml.substack.com/