GitHub - Vicomtech/topSEARCH: topSEARCH: Comprehensive Tool for Online Resource Retrieval

Description

topSEARCH is a tool for efficiently retrieving, organizing, and storing various types of online resources, including news, apps, videos, and podcasts.

Features

The main features of topSEARCH are listed below:

Selection of Resource Types
Users can choose from up to four different resource types, allowing for customized resource selection.
Advanced Filtering Options
Each resource type comes with its own set of filters, giving users full control to narrow down their selections based on specific criteria.
Resource Count Visualization
A dynamic visualization displays the total number of selected resources, giving users a clear overview of their selection. This feature helps in tracking resource volume and ensures that users are aware of the total resources they're working with at any time.
Export to ZIP
Users can export their selected resources in a convenient ZIP format, simplifying data transfer and storage. This feature makes it easy to download and share all chosen resources in one organized package.

Resource sources

topSEARCH is a versatile tool designed to aggregate various types of resources from popular platforms, helping users access diverse content with ease. Whether you're looking for videos, news articles, podcasts, or apps, topSEARCH simplifies the process by pulling resources from multiple sources, reducing the need to search different platforms manually. Below there is a table showcasing the types of resources topSEARCH retrieves and the platforms they come from:

Resource Type	Source Platform	Webpage
Videos	YouTube	youtube.com
News	Google news	news.google.com
Podcasts	Spotify	spotify.com
	Apple Podcasts	podcasts.apple.com
Apps	Google Play	play.google.com
	Apple App Store	apps.apple.com

Screenshots

The home page is as follows:

The output is displayed as follows:

Demo Video:

Dependencies

streamlit: an app framework for Machine Learning and Data Science
pandas: data manipulation and analysis library
seaborn: statistical data visualization library
matplotlib: plotting library for Python
PIL: Python Imaging Library for opening, manipulating, and saving images
squarify: library for plotting treemaps
nltk: Natural Language Toolkit for working with human language data
altair: declarative statistical visualization library
unidecode: ASCII transliterations of Unicode text
langid: language identification tool
itertools: functions creating iterators for efficient looping
googleapiclient: Google API Client Library for Python
youtube_scraping_api: API for scraping YouTube data
joblib: set of tools to provide lightweight pipelining in Python
googlenews: library for scraping Google News
newspaper: library for scraping and parsing newspaper articles
spotipy: lightweight Python library for the Spotify Web API
bs4: Beautiful Soup library for web scraping
google_play_scraper: library for scraping Google Play Store data

Quickstart

Get started with the topSEARCH tool.

Prerequisites

Ensure you have the following installed:

Python 3.7 or higher
Pip (Python package installer)
API keys from Google Cloud Instructions here
API keys from SPOTIFY Instructions here

Preparing the Application

Clone the repository:

 git clone https://github.com/Vicomtech/topSEARCH.git
 cd topSEARCH

Create a virtual environment:
- In the project directory, create a virtual environment to manage dependencies. Use the following command:
```
python -m venv env
```
- This will create a folder named env containing all the necessary files for the virtual environment.
  - On Windows:
```
.\env\Scripts\activate
```
  - On macOS/Linux:
```
source env/bin/activate
```
- After activating the virtual environment, you should see the environment name in your terminal prompt.
Install the required dependencies:
```
 pip install -r requirements.txt
```

Running the Application

To run the application, you have two options: using the terminal or a run configuration file.

1. Terminal option

Add your API keys:
- Copy the .env.example file and rename it to .topsearch.env.
- Open the .topsearch.env file and fill in the required API keys.
- Save the file. These keys are necessary for the application to interact with external services.

Start the Streamlit server:

 python -m streamlit run app/topsearch/topsearch.py

A browser window will automatically open, displaying the running Streamlit app. If it doesn't open automatically, you can manually navigate to the following URL.

2. Configuration file option

Set up the run configuration file:
- Copy Run_configuration_example.xml and specify the Environment Variables: Ensure your environment variables are properly set for the API keys (as in the .env file).
- Open your IDE (e.g., PyCharm, VS Code) and create a new Run Configuration for the application.
Run the application:
- Use the configured run file to start the Streamlit app by selecting the run configuration and clicking "Run."
A browser window will automatically open, displaying the running Streamlit app. If it doesn't open automatically, you can manually navigate to the following URL.

Creating API keys:

Google Cloud:

Go to the Google Cloud Console:
1. Visit the Google Cloud Console to manage your API credentials.
Sign in with your Google Account:
1. Log in using your Google credentials, or create a new account if you don’t have one.
Create a New Project:
1. In the Cloud Console, click on the Select a project dropdown at the top of the page, then click New Project.
2. Name your project and choose your organization if needed, then click Create.
Click Create credentials, then select API key from the menu.
1. Click Create Credentials at the top of the page, then choose API Key from the dropdown menu.
Activate the API key
1. Click this link
2. Click Enable

Spotify:

Create a Spotify Developer Account
- Visit the Spotify Developer Dashboard.
- If you don't have a Spotify account, sign up for one. Otherwise, log in with your existing account and go to the dashboard or click in the link above.
Create a New App
- Once logged in, go to the Dashboard and click Create an App.
- Provide an app name and description. Agree to the Developer Terms of Service. Select Web API.
- Click Save to finish.
Get Your Client ID and Client Secret
- After creating your app, navigate to the app’s dashboard. Click in Settings.
- Your Client ID and Client Secret (your API keys) will be displayed here. These are required to authenticate your app with the Spotify Web API.
Using the API Key
- Use your Client ID and Client secret in your app to interact with Spotify’s API.

Note: Store your API keys securely and do not share them publicly.

Project Structure

The repository is structured as follows:

topSEARCH
│
├── app                                     
│   ├── deploy/
│       └── Dockerfile
│   ├── gui/
│       ├── config
│           └── ...
│       ├── modules
│           └── ...
│       ├── utils
│           └── ...
│       └── web.py
│   ├── notebook/
│       └── ...
│   ├── scripts/
│       └── ...
│   └── topsearch/
│       ├── pages
│           └── ...
│       ├── utils
│           └── ...
│       └── topsearch.py
|
├── docs  
│   └── source/  
│       └── ...
|
├── resourcescraper  
│       ├── filter
│           └── ...
│       ├── load
│           └── ...
│       ├── resource-scraper
│           └── ...
│       ├── source-scraper
│           └── ...
│       ├── utils
│           └── ...
│       └── __init__.py
│
├── tests                                   
│   ├── filter
│       └── ...
│   ├── load
│       └── ...
│   ├── scrap
│       └── ...
│   ├── utils
│       └── ...
│   ├── __init__.py
│   └── test.py
│
│
│
├── setup.py                                
├── README.md   
├── CONTRIBUTORS.txt
├── STYLE_GUIDE.md
├── requirements.txt                        
├── dockerfile
├── makefile
├── config.py                              
├── .gitignore                              
├── .dockerignore                              
└── .gitlab-ci.yml

License

The tool license is provided in LICENSE.docx. As this tool includes serveral python packages, details of package sublicenses are specified in the LICENSE.md file. If You intend to make a commercial use of the Software, You shall contact Vicomtech ([email protected]) in order to sign the proper agreement to this effect.

Authors

Ander Cejudo
Teresa Garcia-Navarro
Yone Tellechea
Amaia Calvo
Garazi Artola

Contact

For any inquiries, feedback, or suggestions regarding the topSEARCH project, please address them to:

Name: Ander Cejudo

Email: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Description

Table of Contents

Features

Resource sources

Screenshots

Demo Video:

Dependencies

Quickstart

Prerequisites

Preparing the Application

Running the Application

1. Terminal option

2. Configuration file option

Creating API keys:

Google Cloud:

Spotify:

Project Structure

License

Authors

Contact

About

Licenses found

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
docs		docs
experiments_outputs		experiments_outputs
resourcescraper		resourcescraper
tests		tests
.dockerignore		.dockerignore
.example.env		.example.env
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.txt		CONTRIBUTORS.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE.docx		LICENSE.docx
Makefile		Makefile
README.md		README.md
Run_configuration_example.run.xml		Run_configuration_example.run.xml
config.py		config.py
requirements.txt		requirements.txt
setup.py		setup.py

License

Licenses found

Vicomtech/topSEARCH

Folders and files

Latest commit

History

Repository files navigation

Description

Table of Contents

Features

Resource sources

Screenshots

Demo Video:

Dependencies

Quickstart

Prerequisites

Preparing the Application

Running the Application

1. Terminal option

2. Configuration file option

Creating API keys:

Google Cloud:

Spotify:

Project Structure

License

Authors

Contact

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages