http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
https://stats.stackexchange.com/questions/268820/gradient-backpropagation-through-resnet-skip-connections
https://stackoverflow.com/questions/44512126/how-to-calculate-gradients-in-resnet-architecture
https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0c
https://www.researchgate.net/post/How_backpropagation_works_for_learning_filters_in_CNN
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
https://stackoverflow.com/questions/41990250/what-is-cross-entropy
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
https://stackoverflow.com/questions/34240703/what-is-logits-softmax-and-softmax-cross-entropy-with-logits/39499486#39499486
https://stats.stackexchange.com/questions/253632/why-is-newtons-method-not-widely-used-in-machine-learning
https://stackoverflow.com/questions/12066761/what-is-the-difference-between-gradient-descent-and-newtons-gradient-descent
https://dinh-hung-tu.github.io/gradient-descent-vs-newton-method/
http://sofasofa.io/forum_main_post.php?postid=1001010
https://blog.csdn.net/lsgqjh/article/details/79168095