Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 2.43 KB

README.md

File metadata and controls

54 lines (33 loc) · 2.43 KB

data set

The data set is generated by the players of a game. The data itself is in compressed CSV format split in multiple files. We have two datasets data/profiles and data/activity in their distinct folders. Data does not have a header row. The profiles dataset contains user profiles with following columns:

  • player_id (integer) - unique identifier of the player
  • registration_date (yyyy-MM-dd) - date when the player 1st played the game
  • country code (integer) - country of the user
  • operating system (integer) - operating system of the user
  • device type (integer) - type of device used by the player

The activity contains the information on players' daily visits in the game. E.g. if player with ID 123 plays the game at least once on 2018-09-02 then there is a row with those values in the data set. Complete schema of activity dataset contains columns:

  • event_date (yyyy-MM-dd)
  • player_id (integer) - unique identifier of the player
  • money_spent (float) - Total money spent during the day
  • session_count (integer) - Number of game sessions for the day
  • purchase_count (integer) - Number of purchases during the day
  • time_spent_seconds (integer) - Total time spent playing during the day
  • ads_impressions (integer) - Total number of seen ads during the day
  • ads_clicks (integer) - Total number of clicked ads during the day

problem

The target of this task is to build a machine learning model to identify the churns. Churns are players are not seen after 7th day from the registration

This is a test of end-to-end complete life-cycle of a machine learning model building. The following items are suggested to be included in the deliverable:

  • data example generation

  • label and feature engineering

  • splitting of training/validation/test set

  • model selection and parameter tuning

  • model training and evaluation

  • model deployment and service

submission

you are supposed to submit the following items:

  1. jupyter notebooks of data processing, model training, and model evaluation

  2. performance metrics of model training and evaluation

  3. a docker image containing the model files and service of the model. The docker image should be available at https://hub.docker.com/ ready for docker pull

  4. a document describing the process of modeling training and how to use the service of the model, and

  5. a writeup detailing your choice of performance metrics & methods of model evaluation