Dataset is described in this section
Using this data we will try to answer next questions:
- When is the best time to post stories?
- Are similar stories similarly scored and commented?
- Clone the repo
git lfs clone https://github.com/Problem-Workshop/Data-Processing.git
- Maven clean and install
_JAVA_OPTIONS="-Xmx10G" mvn clean install
- Run "similar stories" main class
_JAVA_OPTIONS="-Xmx10G" mvn exec:java@similar-stories
- Run "best time to post" main class
_JAVA_OPTIONS="-Xmx10G" mvn exec:java@best-time-to-post
- Install requirements
pip install -r requirements.txt
- Plot "similar stories" results
python similar_stories
- Plot "best time to post" results
python best_time_to_post
- knn - package with all necessary classes for k nearest neighbors. There are metrics here and classes for computing text distance.
- timestamp_analysis - package to analysis best time to post
- model - package with all model files (for the moment we have only class for Story)
- dao - data access object, intended for reading and downloading files
- utils - package with utilities
- exceptions - package for all exceptions
For more examples, please refer to the Documentation
Distributed under the MIT License. See LICENSE
for more information.
Organization - @Problem-Workshop
Oskar Olaszczyk - @oskarolaszczyk
Julia Szymańska - @JuliaSzymanska
Przemysław Zdrzalik - @ZdrzalikPrzemyslaw
Szymon Jacoń - @bruderooo
Michał Majchrowski - @DevWithoutKnowledge
Kamil Kiszko-Zgierski - @KiszczixIsCoding
Hubert Gawłowski - @hubertgaw
Martyna Piasecka - @MartynaCys
Project Link: https://github.com/Problem-Workshop/Data-Processing