topSEARCH is a tool for efficiently retrieving, organizing, and storing various types of online resources, including news, apps, videos, and podcasts.
- Features
- Resource sources
- Screenshots
- Demo Video
- Dependencies
- Quickstart
- Project Structure
- License
- Authors
- Contact
The main features of topSEARCH are listed below:
-
Selection of Resource Types
Users can choose from up to four different resource types, allowing for customized resource selection. -
Advanced Filtering Options
Each resource type comes with its own set of filters, giving users full control to narrow down their selections based on specific criteria. -
Resource Count Visualization
A dynamic visualization displays the total number of selected resources, giving users a clear overview of their selection. This feature helps in tracking resource volume and ensures that users are aware of the total resources they're working with at any time. -
Export to ZIP
Users can export their selected resources in a convenient ZIP format, simplifying data transfer and storage. This feature makes it easy to download and share all chosen resources in one organized package.
topSEARCH is a versatile tool designed to aggregate various types of resources from popular platforms, helping users access diverse content with ease. Whether you're looking for videos, news articles, podcasts, or apps, topSEARCH simplifies the process by pulling resources from multiple sources, reducing the need to search different platforms manually. Below there is a table showcasing the types of resources topSEARCH retrieves and the platforms they come from:
Resource Type | Source Platform | Webpage |
---|---|---|
Videos | YouTube | youtube.com |
News | Google news | news.google.com |
Podcasts | Spotify | spotify.com |
Apple Podcasts | podcasts.apple.com | |
Apps | Google Play | play.google.com |
Apple App Store | apps.apple.com |
The output is displayed as follows:
- streamlit: an app framework for Machine Learning and Data Science
- pandas: data manipulation and analysis library
- seaborn: statistical data visualization library
- matplotlib: plotting library for Python
- PIL: Python Imaging Library for opening, manipulating, and saving images
- squarify: library for plotting treemaps
- nltk: Natural Language Toolkit for working with human language data
- altair: declarative statistical visualization library
- unidecode: ASCII transliterations of Unicode text
- langid: language identification tool
- itertools: functions creating iterators for efficient looping
- googleapiclient: Google API Client Library for Python
- youtube_scraping_api: API for scraping YouTube data
- joblib: set of tools to provide lightweight pipelining in Python
- googlenews: library for scraping Google News
- newspaper: library for scraping and parsing newspaper articles
- spotipy: lightweight Python library for the Spotify Web API
- bs4: Beautiful Soup library for web scraping
- google_play_scraper: library for scraping Google Play Store data
Get started with the topSEARCH tool.
Ensure you have the following installed:
- Python 3.7 or higher
- Pip (Python package installer)
- API keys from Google Cloud Instructions here
- API keys from SPOTIFY Instructions here
-
Clone the repository:
git clone https://github.com/Vicomtech/topSEARCH.git cd topSEARCH
-
Create a virtual environment:
- In the project directory, create a virtual environment to manage dependencies. Use the following command:
python -m venv env
- This will create a folder named env containing all the necessary files for the virtual environment.
- On Windows:
.\env\Scripts\activate
- On macOS/Linux:
source env/bin/activate
- On Windows:
- After activating the virtual environment, you should see the environment name in your terminal prompt.
- In the project directory, create a virtual environment to manage dependencies. Use the following command:
-
Install the required dependencies:
pip install -r requirements.txt
To run the application, you have two options: using the terminal or a run configuration file.
- Add your API keys:
- Copy the
.env.example
file and rename it to.topsearch.env
. - Open the
.topsearch.env
file and fill in the required API keys. - Save the file. These keys are necessary for the application to interact with external services.
- Copy the
- Start the Streamlit server:
python -m streamlit run app/topsearch/topsearch.py
- A browser window will automatically open, displaying the running Streamlit app. If it doesn't open automatically, you can manually navigate to the following URL.
- Set up the run configuration file:
- Copy Run_configuration_example.xml and specify the Environment Variables: Ensure your environment variables are properly set for the API keys (as in the .env file).
- Open your IDE (e.g., PyCharm, VS Code) and create a new Run Configuration for the application.
- Run the application:
- Use the configured run file to start the Streamlit app by selecting the run configuration and clicking "Run."
- A browser window will automatically open, displaying the running Streamlit app. If it doesn't open automatically, you can manually navigate to the following URL.
-
Go to the Google Cloud Console:
- Visit the Google Cloud Console to manage your API credentials.
-
Sign in with your Google Account:
- Log in using your Google credentials, or create a new account if you don’t have one.
-
Create a New Project:
-
Click Create credentials, then select API key from the menu.
-
Activate the API key
- Click this link
- Click Enable
-
Create a Spotify Developer Account
- Visit the Spotify Developer Dashboard.
- If you don't have a Spotify account, sign up for one. Otherwise, log in with your existing account and go to the dashboard or click in the link above.
-
Create a New App
-
Get Your Client ID and Client Secret
-
Using the API Key
- Use your Client ID and Client secret in your app to interact with Spotify’s API.
Note: Store your API keys securely and do not share them publicly.
The repository is structured as follows:
topSEARCH
│
├── app
│ ├── deploy/
│ └── Dockerfile
│ ├── gui/
│ ├── config
│ └── ...
│ ├── modules
│ └── ...
│ ├── utils
│ └── ...
│ └── web.py
│ ├── notebook/
│ └── ...
│ ├── scripts/
│ └── ...
│ └── topsearch/
│ ├── pages
│ └── ...
│ ├── utils
│ └── ...
│ └── topsearch.py
|
├── docs
│ └── source/
│ └── ...
|
├── resourcescraper
│ ├── filter
│ └── ...
│ ├── load
│ └── ...
│ ├── resource-scraper
│ └── ...
│ ├── source-scraper
│ └── ...
│ ├── utils
│ └── ...
│ └── __init__.py
│
├── tests
│ ├── filter
│ └── ...
│ ├── load
│ └── ...
│ ├── scrap
│ └── ...
│ ├── utils
│ └── ...
│ ├── __init__.py
│ └── test.py
│
│
│
├── setup.py
├── README.md
├── CONTRIBUTORS.txt
├── STYLE_GUIDE.md
├── requirements.txt
├── dockerfile
├── makefile
├── config.py
├── .gitignore
├── .dockerignore
└── .gitlab-ci.yml
The tool license is provided in LICENSE.docx
. As this tool includes serveral python packages, details of package sublicenses are specified in the LICENSE.md
file. If You intend to make a commercial use of the Software, You shall contact Vicomtech ([email protected]) in order to sign the proper agreement to this effect.
- Ander Cejudo
- Teresa Garcia-Navarro
- Yone Tellechea
- Amaia Calvo
- Garazi Artola
For any inquiries, feedback, or suggestions regarding the topSEARCH project, please address them to:
Name: Ander Cejudo
Email: [email protected]