Skip to content

Python solution to extract the courses schedules from the different faculties of UPorto. Used to feed our timetable selection platform for students, TTS.

License

Notifications You must be signed in to change notification settings

NIAEFEUP/uporto-schedule-scrapper

Folders and files

NameName
Last commit message
Last commit date
Mar 12, 2024
Aug 14, 2024
Jan 26, 2024
Jul 16, 2017
Apr 6, 2024
Sep 8, 2022
Mar 5, 2023
Jan 26, 2024

Repository files navigation

University of Porto Scrapper - UPS!

Python solution to extract the courses schedules from the different Faculties of University of Porto.

Requirements

  • docker-ce
  • docker-compose or
  • python >=3.8
  • make

If you don't have docker, you can use python locally in your machine or create a virtual environment. In this case make sure python version is >=3.8.

Local python

pip install -r ./src/requirements.txt   # Install dependencies

Virtual environment

python -m venv venv_scrapper            # Create virtual environment
source ./venv_scrapper/bin/activate        # Activate virtual environment
pip install -r ./src/requirements.txt   # Install dependencies

Quick start

πŸ”§ Configure

  1. Create a .env example from the .env.example file
cd src && cp .env.example .env
  1. Change the following fields in the .env file:
  • TTS_SCRAPY_USER: replace with your up number (e.g up201812345).
  • TTS_SCRAPY_YEAR: replace with the year you want to scrap (e.g. 2022 is for the 2022/2023 school year).
  • TTS_SCRAPY_PASSWORD: replace with your sigarra password.
TTS_SCRAPY_YEAR=2023
TTS_SCRAPY_USER=username
TTS_SCRAPY_PASSWORD=password

πŸ’¨ Run

  • Gathering data:
docker-compose run scrapper make
# or 
cd ./src && make
  • Dumping data:
docker-compose run scrapper make dump
# or 
cd ./src && make dump
  • Upload data to temporary online storage:
docker-compose run scrapper make upload
# or 
cd ./src && make upload
  • Clean database:
docker-compose run scrapper make clean
# or
cd ./src && make clean

πŸ” Inspect

To inspect the scrapy engine, use scrapy shell "url"

Example:

root@00723f950c71:/scrapper# scrapy shell "https://sigarra.up.pt/fcnaup/pt/cur_geral.cur_planos_estudos_view?pv_plano_id=2523&pv_ano_lectivo=2017&pv_tipo_cur_sigla=D&pv_origem=CUR"
2017-10-24 20:51:35 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapper)
...
>>> open('dump.html', 'wb').write(response.body)
63480
>>> response.xpath('//*[@id="anos_curr_div"]/div').extract()

πŸ“ Database design

Image

πŸ“ƒ More information

  • This repository contains useful scripts. Check the ./src/scripts folder.
  • For some information of how the sqlite3 database is generated check the ./src/scrapper/database/dbs folder.
  • Configurations can be done in the ./src/config.ini file.

About

Python solution to extract the courses schedules from the different faculties of UPorto. Used to feed our timetable selection platform for students, TTS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages