Skip to content

Scraper is an internal tool used to collect data from the Hacker News API for inclusion in the database.

License

Notifications You must be signed in to change notification settings

Scheming-Lion/Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web scraper

submits get requests to Hacker News' API endpoints, and then appends the data to a text file, which we will later parse and enter into our data base

How to use

  1. Modify scraper.js

  2. Change the filename on line 37 * The 'items' numbers should be the same as your beginning and end values from index.html. * For example: 'items-1-n.txt' for start of 1.

  3. Start a node server

  4. Open your terminal

  5. Navigate to your root folder for Scraper.

  6. Enter 'node scraper.js' in the command line to start a local node server that will write data to your file.

  7. Disable power conservation settings on your mac.

  8. Plug your computer into a charger.

  9. Click the  in the upper left bar on your home screen.

  10. Select system preferences

  11. Select energy saver.

  12. Check * Prevent computer from sleeping automatically when the display is off * Wake for Wi-Fi network access * Kindly close pop-ups that warn you you're going to waste power

  13. Uncheck * Put hard disks to sleep when possible * Enable Power Nap while plugged into a power adapter

  14. Open index.html from the scraper folder in your browser.

  • Enter a start and end value in the boxes that are 2000 apart.
  • For example, 1-2001.
  1. When you're done downloading:
  2. Close the browser.
  3. Navigate to the scraper_data folder.
  4. Open the text file to find the last item downloaded.
  5. Rename the file by replacing n with the last item downloaded. * If item 5555 was the last item you downloaded the file would be renamed items-1-5555.txt

Current index of data

  1. 1 - 2,832,730 items completed (Justin)
  2. 4,879,476 - 8,474,817 items completed (Adam)
  • Total on Oct. 23: 6,428,071

About

Scraper is an internal tool used to collect data from the Hacker News API for inclusion in the database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published