This lecture:
For sufficiently wide nets with one hidden layer GD learns correct predictions on train set in time independent from number of parameters (https://openreview.net/forum?id=S1eK3i09YQ).
Next lecture announcement:
Information bottleneck method (https://arxiv.org/abs/physics/0004057).
Phases of learning (https://arxiv.org/abs/1703.00810), critics (https://openreview.net/forum?id=ry_WPG-A-¬eId=ry_WPG-A-).
Representation learning, cross-entropy decomposition (https://arxiv.org/abs/1706.01350).