README.Rmd

---
output:
  md_document:
    variant: markdown_github
---

# MLDay18: Random Forests and Gradient Boosting Machines in R

Slides for **Machine Learning Day '18**. This talk provides an overview of the following topics, as well as some of their implementations in the R programming language:

* [Decision trees](https://en.wikipedia.org/wiki/Decision_tree_learning)

* [Random forests](https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm)

* [Gradient boosting machines](https://projecteuclid.org/euclid.aos/1013203451)

[Launch slides](https://bgreenwell.github.io/MLDay18/MLDay18.html#1)


# Abstract

Good modeling tools should be universally applicable in classification and regression, have state-of-the-art accuracy, scale well to large data sets, and handle missing values effectively. Additionally, it would be nice for these tools to be able to automatically discover which variables are important, how they interact, and whether there are any novel cases or outliers. In this presentation, we discuss two such modeling tools: random forests and gradient boosting machines. The talk will cover a brief background of both methodologies (including decision trees) as well as various implementations of each in the R software environment for statistical computing. The pros and cons of each implementation will also be covered.

```{r img, echo=FALSE, fig.align="center", out.width="80%"}
knitr::include_graphics("docs/figures/MLDay18.jpg")
```