Fotocasa Web Scrapper

A fotocasa Web Scrapper using Selenium and Scrappy with Splash.

Install

pip install -r requirements.txt
Start Splash service using docker: docker run --net="host" -p 8050:8050 scrapinghub/splash

Selenium

Selenium library requires a webdriver file, used for issuing commands to the host browser. I'm providing a google chrome 99.0 Win32 webdriver inside /drivers folder

Chrome web driver: https://chromedriver.chromium.org/home

How does it work?

The program entrypoint is located inside spiders/fotocasa.py. Using Selenium we navigate to the url indicated in start_urls array, it is expected to be a results search url, like: https://www.fotocasa.es/es/alquiler/viviendas/barcelona-capital/todas-las-zonas/l/1

Then, we navigate through the page, collecting all the advertisements urls, that will be scrapped using scrappy.

Finally, Selenium navigates to the next page, and the process is repeated, until no more pages are available.

Run

scrapy crawl fotocasa -o output.json

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
drivers		drivers
web_scrapper		web_scrapper
.gitignore		.gitignore
README.md		README.md
main.py		main.py
output-02-04-2022.json		output-02-04-2022.json
output-06-03-2022.json		output-06-03-2022.json
output-31-03-2022.json		output-31-03-2022.json
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
selenium_provider.py		selenium_provider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fotocasa Web Scrapper

Install

Selenium

How does it work?

Run

About

Releases

Packages

Languages

TheMatrix97/Fotocasa-Web-Scrapper

Folders and files

Latest commit

History

Repository files navigation

Fotocasa Web Scrapper

Install

Selenium

How does it work?

Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages