University of Porto Scrapper - UPS!

Python solution to extract the courses schedules from the different Faculties of University of Porto.

Requirements

docker-ce
docker-compose or
python >=3.8
make

If you don't have docker, you can use python locally in your machine or create a virtual environment. In this case make sure python version is >=3.8.

Local python

pip install -r ./src/requirements.txt   # Install dependencies

Virtual environment

python -m venv venv_scrapper            # Create virtual environment
source ./venv_scrapper/bin/activate        # Activate virtual environment
pip install -r ./src/requirements.txt   # Install dependencies

Quick start

🔧 Configure

Create a .env example from the .env.example file

cd src && cp .env.example .env

Change the following fields in the .env file:

TTS_SCRAPY_USER: replace with your up number (e.g up201812345).
TTS_SCRAPY_YEAR: replace with the year you want to scrap (e.g. 2022 is for the 2022/2023 school year).
TTS_SCRAPY_PASSWORD: replace with your sigarra password.

TTS_SCRAPY_YEAR=2023
TTS_SCRAPY_USER=username
TTS_SCRAPY_PASSWORD=password

💨 Run

Gathering data:

docker-compose run scrapper make
# or 
cd ./src && make

Dumping data:

docker-compose run scrapper make dump
# or 
cd ./src && make dump

Upload data to temporary online storage:

docker-compose run scrapper make upload
# or 
cd ./src && make upload

Clean database:

docker-compose run scrapper make clean
# or
cd ./src && make clean

🔍 Inspect

To inspect the scrapy engine, use scrapy shell "url"

Example:

root@00723f950c71:/scrapper# scrapy shell "https://sigarra.up.pt/fcnaup/pt/cur_geral.cur_planos_estudos_view?pv_plano_id=2523&pv_ano_lectivo=2017&pv_tipo_cur_sigla=D&pv_origem=CUR"
2017-10-24 20:51:35 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapper)
...
>>> open('dump.html', 'wb').write(response.body)
63480
>>> response.xpath('//*[@id="anos_curr_div"]/div').extract()

📐 Database design

📃 More information

This repository contains useful scripts. Check the ./src/scripts folder.
For some information of how the sqlite3 database is generated check the ./src/scrapper/database/dbs folder.
Configurations can be done in the ./src/config.ini file.

Name	Name	Last commit message	Last commit date
Latest commit tomaspalma Merge pull request #118 from NIAEFEUP/fix/sqldump-and-teacher-id-as-s… Aug 14, 2024 f55a788 · Aug 14, 2024 History 265 Commits
docs	docs	Fixed uml	Mar 12, 2024
src	src	Merge pull request #118 from NIAEFEUP/fix/sqldump-and-teacher-id-as-s…	Aug 14, 2024
.gitignore	.gitignore	refactor: put year, password and user in env file	Jan 26, 2024
LICENSE	LICENSE	1 Version of scrapper gets to turmas page for each mestrado	Jul 16, 2017
README.md	README.md	Merge pull request #107 from NIAEFEUP/refactor/states-refactor	Apr 6, 2024
docker-compose.yml	docker-compose.yml	feat: added percentage bar	Sep 8, 2022
dump.html	dump.html	Created professors table and professors class	Mar 5, 2023
package-lock.json	package-lock.json	Merge branch 'master' into refactor/switch-year-password-user-to-env-…	Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

University of Porto Scrapper - UPS!

Requirements

Local python

Virtual environment

Quick start

🔧 Configure

💨 Run

🔍 Inspect

📐 Database design

📃 More information

About

Releases

Contributors 21

Languages

License

NIAEFEUP/uporto-schedule-scrapper

Folders and files

Latest commit

History

Repository files navigation

University of Porto Scrapper - UPS!

Requirements

Local python

Virtual environment

Quick start

🔧 Configure

💨 Run

🔍 Inspect

📐 Database design

📃 More information

About

Resources

License

Stars

Watchers

Forks

Releases

Contributors 21

Languages