Skip to content

oskarolaszczyk/Data-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hacker News - data processing

Hacker News

Dataset is described in this section

Using this data we will try to answer next questions:

  • When is the best time to post stories?
  • Are similar stories similarly scored and commented?

Built With

Getting Started

Installation

  1. Clone the repo
    git lfs clone https://github.com/Problem-Workshop/Data-Processing.git
  2. Maven clean and install
     _JAVA_OPTIONS="-Xmx10G" mvn clean install
  3. Run "similar stories" main class
    _JAVA_OPTIONS="-Xmx10G" mvn exec:java@similar-stories
  4. Run "best time to post" main class
    _JAVA_OPTIONS="-Xmx10G" mvn exec:java@best-time-to-post

Plot results

  1. Install requirements
    pip install -r requirements.txt
  2. Plot "similar stories" results
    python similar_stories
  3. Plot "best time to post" results
    python best_time_to_post

Software organization into modules/packages/classes

  • knn - package with all necessary classes for k nearest neighbors. There are metrics here and classes for computing text distance.
  • timestamp_analysis - package to analysis best time to post
  • model - package with all model files (for the moment we have only class for Story)
  • dao - data access object, intended for reading and downloading files
  • utils - package with utilities
  • exceptions - package for all exceptions

Usage

Image 1 Image 2 Image 3 Image 4

For more examples, please refer to the Documentation

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Organization - @Problem-Workshop

Contributors

Oskar Olaszczyk - @oskarolaszczyk
Julia Szymańska - @JuliaSzymanska
Przemysław Zdrzalik - @ZdrzalikPrzemyslaw
Szymon Jacoń - @bruderooo
Michał Majchrowski - @DevWithoutKnowledge
Kamil Kiszko-Zgierski - @KiszczixIsCoding
Hubert Gawłowski - @hubertgaw
Martyna Piasecka - @MartynaCys

Project Link: https://github.com/Problem-Workshop/Data-Processing

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published