Skip to content

Latest commit

 

History

History
executable file
·
130 lines (92 loc) · 10.5 KB

README.md

File metadata and controls

executable file
·
130 lines (92 loc) · 10.5 KB

Crime and political corruption analysis using data mining, machine learning and complex networks

There has been a remarkable increasing in the amount of stored data by private and public companies. On one hand, these huge amounts of data enable a detailed historical review of the processes under investigation; on the other hand, this excess of data makes harder to extract summarized information and also to make good decisions supported by well-established empirical facts. This modern phenomenon has been called a big data and understanding these systems and extracting patterns from these data requires a multidisciplinary approach. In this sense, during the course at the School of Applied Mathematics in the Institute of Mathematics and Computer Science at University of São Paulo we will address topics that involve computer science, statistics, and physics to understand these systems. Among the topics, we will focus on the following ones:

  • Introduction to Python;
  • Web scraping;
  • Data mining;
  • Machine learning;
  • Complex networks.

Using these tools, we will focus on two issues that are of great relevance in Brazil: predicting homicides in cities and describing the mechanism behind political corruption networks. In the first topic, we will use machine learning techniques to predict the number of crimes in Brazilian cities. In the second topic, we will use complex networks to describe the interaction between politicians investigated in corruption scandals in Brazil from 1987 to 2014.

Any comments, questions, or concerns can be directed to:

Course Syllabus

This course is broken up into several modules with each module having a set of Jupyter notebooks to help teach concepts.

Basics, Collections and Files (Day 1)

  1. Jupyter Notebook
  2. Basic Data Types
  3. Flow Control
  4. Errors
  5. Lists, Tuples, and Sets
  6. File I/O
  7. Section Review (Optional)

Imports, Plots, Functions, Dictionaries, and Web Scraping (Day 2)

  1. The Python Standard Library
  2. Data Visualization
  3. Functions
  4. Review (Optional)
  5. Dictionaries
  6. Review (Optional)
  7. Mini-Project
  8. Web Scraping

Data Mining, Statistics, and Data Analysis (Day 3)

  1. Statistical analysis with Python
  2. Bootstrapping MC chains
  3. More stats with Python
  4. The Bootstrap
  5. Structured Data Analysis Pt1
  6. Structured Data Analysis Pt2

Machine Learning Part I (Day 4)

  1. Data Loading
  2. Introduction to Scikit Learn
  3. Unsupervised Transforms
  4. Cross-validation and Grid Search
  5. Preprocessing

Machine Learning Part II (Day 5)

  1. Linear Models for Regression
  2. Linear Models for Classification
  3. Trees
  4. Random Forests
  5. Gradient Boosting
  6. Homicides Prediction

Complex Network and Analysis of Corruption Networks (Day 6)

  1. Network Basics
  2. Analysis of Structural Properties
  3. Network Vizualization and Queries on Networks
  4. Network Analysis from Data
  5. Corruption Network

Social Network Analysis Using igraph and leidenalg (Extra)

  1. Network Basics
  2. Social Networks
  3. Complex Networks Models
  4. Community Detection

Software Installation

This bootcamp uses the Anaconda Python 3.7 distribution

You must have Anaconda Python 3.7 installed before the first day of class

Downloading Course Materials

The course materials can be downloaded from the repository's github page. Just download the zip file, unzip it onto your Desktop, and rename the directory school-of-applied-math.

Usage of Course Materials

This text and the majority of the course will conducted with Jupyter Notebook http://jupyter.org. Jupyter Notebook is a 'web-based interactive computational environment', meaning that it allows to write and execute python code in a web page from your own computers. Jupyter Notebook is a relatively new tool and we believe that is an excellent way to teach the basics of python programming and computational data analysis.

Jupyter Notebook is installed by default with the Anaconda Python distribution and can be laucnhed from the Anaconda Navigator program.

Location and period of the course:

Period: July 1 to July 6, 2019.

Hours: 08:00 to 12:00

Location: (Institute of Mathematics and Computer Science at University of São Paulo) / University of São Paulo (rooms of block 3).

Approval Criteria: 85% of attendance and performance of proposed activities.

Target Audience: Senior year students and postgraduate students in applied mathematics, statistics, computer science and physics interested in data science.

Number of vacancies: 20

Enrollment Period: 04/15/2019 to 05/30/2019.

References