python_scrapyy

My working env is Ubunut 18.0.4

url.txt has URL which Scrapper is going to scrap. It can only scrapp Company Name, Job Title, and Location.

To get job title there is a file (titles_combined.txt) and it contains almost 77k jobs. Install package find_title_job from here https://pypi.org/project/find-job-titles//

To get location from text, used python GeoText package. Install package from here https://geotext.readthedocs.io/en/latest/installation.html.

Install requirements

To run the script add all files in your root directory "python htmlparser.py"

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
HtmlParser.py		HtmlParser.py
README.md		README.md
requirements.txt		requirements.txt
scrapyydb		scrapyydb
titles_combined.txt		titles_combined.txt
url.txt		url.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

python_scrapyy

About

Releases

Packages

Languages

mrsahabu/python_scrapyy

Folders and files

Latest commit

History

Repository files navigation

python_scrapyy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages