This is basically a webscraper that is capable of scraping a mobile websites for all details about mobile phones listed on their website.
The code has 3 parts :
- script
- spider
- convert
This is the main file that processes the list of phones and get their respective links. As the site is dymanic, the script checks if a specific row number is loaded, if not it then it waits untillt the row number is loaded and then it scrapes the page for all links that lead to the phones specification page. Finally saves the data as a json file.
This uses the saved json file. It is of the format phone_name:phone_relative_link
. The spider uses this data to crawl into the various websites and saves the data as a dictionary. Finally, the dicts of all the phones is made into a list and saved it into a file
This converts the saved file into a csv so that it can be used with more ease
Note the URL used is a public website, but is saved as a variable in the
secret.py
file
- Run
script.py
- Run
spider.py
- Run
convert.py
If you are really interested in contributing to the please follow the below steps and rules.
- Fork the project 🍴 (Star ⭐ the repo before that 😛)
- Clone it.
https://github.com/<username>/ScrapeIt.git
- Look for any issues clicking the issues tab. Go through it and assign take one. Make sure you get assigned or atleast say that you are gonna work on it.
- Always create a new branch and work on the feature or bug. Check this if you are not that familiar with branching, Git Branching.
- If you are using any other module for implementing any new features, please install the modules in the virtual environment and update it in the
requirements.txt
by using the below command.
pip freeze > requirements.txt
If you have any doubts or issues, let the maintainers know about it. They would be ready to help.