Skip to content

Latest commit

 

History

History
40 lines (36 loc) · 1.69 KB

README.md

File metadata and controls

40 lines (36 loc) · 1.69 KB

Web scraper

submits get requests to Hacker News' API endpoints, and then appends the data to a text file, which we will later parse and enter into our data base

How to use

  1. Modify scraper.js

  2. Change the filename on line 37 * The 'items' numbers should be the same as your beginning and end values from index.html. * For example: 'items-1-n.txt' for start of 1.

  3. Start a node server

  4. Open your terminal

  5. Navigate to your root folder for Scraper.

  6. Enter 'node scraper.js' in the command line to start a local node server that will write data to your file.

  7. Disable power conservation settings on your mac.

  8. Plug your computer into a charger.

  9. Click the  in the upper left bar on your home screen.

  10. Select system preferences

  11. Select energy saver.

  12. Check * Prevent computer from sleeping automatically when the display is off * Wake for Wi-Fi network access * Kindly close pop-ups that warn you you're going to waste power

  13. Uncheck * Put hard disks to sleep when possible * Enable Power Nap while plugged into a power adapter

  14. Open index.html from the scraper folder in your browser.

  • Enter a start and end value in the boxes that are 2000 apart.
  • For example, 1-2001.
  1. When you're done downloading:
  2. Close the browser.
  3. Navigate to the scraper_data folder.
  4. Open the text file to find the last item downloaded.
  5. Rename the file by replacing n with the last item downloaded. * If item 5555 was the last item you downloaded the file would be renamed items-1-5555.txt

Current index of data

  1. 1 - 2,832,730 items completed (Justin)
  2. 4,879,476 - 8,474,817 items completed (Adam)
  • Total on Oct. 23: 6,428,071