In the data directory, you will find 3 separate csv files which contain some fictional data, as follows:
- transactions.csv - card transactions data, including transaction amount and currency, type and status of the transaction, device information and whether the transaction is marked as fraudulent or not
- customers.csv - customer data, including country, state and year of birth
- transactions_customers.csv - mapping of transactions and customers
- Provide exploratory analysis of the datasets - summarise and explain the key trends seen in the data, which factors appear to be most important in predicting fraud
- Perform relevant feature engineering and construct a model (or multiple models, if you want) to predict if a transaction is fraudulent or not
- Report on the model performance and show what features are most important in that model. Describe, high level, how would your model be used in a production setting
Please use Python for this exercise. You can use whatever external software libraries you think are appropriate.
- High quality analysis and modelling
- Readable code
- Clear explanations of the analysis and conclusions you reached
Please don't spend more than 3-4 hours on this task. You will not be judged on model performance, but rather on your approach on solving the problem. If there are aspects of the data that you can’t investigate in the given time, mention them as next steps for improvement at the end.
Answer the questions in the FOLLOW-UP.md file.
Once you've completed the test, please compress your files (via zip or tar) and return them as a link or email attachment in reply to your test invite.
Once we receive it, a member of our team will review and we'll get back to you as soon as possible.
Thanks!