|
Contents at a Glance |
5 |
|
|
Contents |
6 |
|
|
About the Author |
11 |
|
|
About the Technical Reviewer |
12 |
|
|
Acknowledgments |
13 |
|
|
Chapter 1: Introduction to Deep Learning |
14 |
|
|
Historical Context |
14 |
|
|
Advances in Related Fields |
16 |
|
|
Prerequisites |
16 |
|
|
Overview of Subsequent Chapters |
17 |
|
|
Installing the Required Libraries |
18 |
|
|
Chapter 2: Machine Learning Fundamentals |
19 |
|
|
Intuition |
19 |
|
|
Binary Classification |
19 |
|
|
Regression |
20 |
|
|
Generalization |
21 |
|
|
Regularization |
26 |
|
|
Summary |
28 |
|
|
Chapter 3: Feed Forward Neural Networks |
29 |
|
|
Unit |
29 |
|
|
Overall Structure of a Neural Network |
31 |
|
|
Expressing the Neural Network in Vector Form |
32 |
|
|
Evaluating the output of the Neural Network |
33 |
|
|
Training the Neural Network |
35 |
|
|
Deriving Cost Functions using Maximum Likelihood |
36 |
|
|
Binary Cross Entropy |
37 |
|
|
Cross Entropy |
37 |
|
|
Squared Error |
38 |
|
|
Summary of Loss Functions |
39 |
|
|
Types of Units/Activation Functions/Layers |
39 |
|
|
Linear Unit |
40 |
|
|
Sigmoid Unit |
40 |
|
|
Softmax Layer |
41 |
|
|
Rectified Linear Unit (ReLU) |
41 |
|
|
Hyperbolic Tangent |
42 |
|
|
Neural Network Hands-on with AutoGrad |
45 |
|
|
Summary |
45 |
|
|
Chapter 4: Introduction to Theano |
46 |
|
|
What is Theano |
46 |
|
|
Theano Hands-On |
47 |
|
|
Summary |
72 |
|
|
Chapter 5: Convolutional Neural Networks |
73 |
|
|
Convolution Operation |
73 |
|
|
Pooling Operation |
80 |
|
|
Convolution-Detector-Pooling Building Block |
82 |
|
|
Convolution Variants |
86 |
|
|
Intuition behind CNNs |
87 |
|
|
Summary |
88 |
|
|
Chapter 6: Recurrent Neural Networks |
89 |
|
|
RNN Basics |
89 |
|
|
Training RNNs |
94 |
|
|
Bidirectional RNNs |
101 |
|
|
Gradient Explosion and Vanishing |
102 |
|
|
Gradient Clipping |
103 |
|
|
Long Short Term Memory |
105 |
|
|
Summary |
106 |
|
|
Chapter 7: Introduction to Keras |
107 |
|
|
Summary |
121 |
|
|
Chapter 8: Stochastic Gradient Descent |
122 |
|
|
Optimization Problems |
122 |
|
|
Method of Steepest Descent |
123 |
|
|
Batch, Stochastic (Single and Mini-batch) Descent |
124 |
|
|
Batch |
125 |
|
|
Stochastic Single Example |
125 |
|
|
Stochastic Mini-batch |
125 |
|
|
Batch vs. Stochastic |
125 |
|
|
Challenges with SGD |
125 |
|
|
Local Minima |
125 |
|
|
Saddle Points |
126 |
|
|
Selecting the Learning Rate |
127 |
|
|
Slow Progress in Narrow Valleys |
128 |
|
|
Algorithmic Variations on SGD |
128 |
|
|
Momentum |
129 |
|
|
Nesterov Accelerated Gradient (NAS) |
130 |
|
|
Annealing and Learning Rate Schedules |
130 |
|
|
Adagrad |
130 |
|
|
RMSProp |
131 |
|
|
Adadelta |
132 |
|
|
Adam |
132 |
|
|
Resilient Backpropagation |
132 |
|
|
Equilibrated SGD |
133 |
|
|
Tricks and Tips for using SGD |
133 |
|
|
Preprocessing Input Data |
133 |
|
|
Choice of Activation Function |
133 |
|
|
Preprocessing Target Value |
134 |
|
|
Initializing Parameters |
134 |
|
|
Shuffling Data |
134 |
|
|
Batch Normalization |
134 |
|
|
Early Stopping |
134 |
|
|
Gradient Noise |
134 |
|
|
Parallel and Distributed SGD |
135 |
|
|
Hogwild |
135 |
|
|
Downpour |
135 |
|
|
Hands-on SGD with Downhill |
136 |
|
|
Summary |
141 |
|
|
Chapter 9: Automatic Differentiation |
142 |
|
|
Numerical Differentiation |
142 |
|
|
Symbolic Differentiation |
143 |
|
|
Automatic Differentiation Fundamentals |
144 |
|
|
Forward/Tangent Linear Mode |
145 |
|
|
Reverse/Cotangent/Adjoint Linear Mode |
149 |
|
|
Implementation of Automatic Differentiation |
152 |
|
|
Source Code Transformation |
152 |
|
|
Operator Overloading |
153 |
|
|
Hands-on Automatic Differentiation with Autograd |
154 |
|
|
Summary |
157 |
|
|
Chapter 10: Introduction to GPUs |
158 |
|
|
Summary |
167 |
|
|
Chapter 11: Introduction to Tensorflow |
168 |
|
|
Summary |
203 |
|
|
Chapter 12: Introduction to PyTorch |
204 |
|
|
Summary |
217 |
|
|
Chapter 13: Regularization Techniques |
218 |
|
|
Model Capacity, Overfitting, and Underfitting |
218 |
|
|
Regularizing the Model |
219 |
|
|
Early Stopping |
219 |
|
|
Norm Penalties |
221 |
|
|
Dropout |
222 |
|
|
Summary |
223 |
|
|
Chapter 14: Training Deep Learning Models |
224 |
|
|
Performance Metrics |
224 |
|
|
Data Procurement |
227 |
|
|
Splitting Data for Training/Validation/Test |
228 |
|
|
Establishing Achievable Limits on the Error Rate |
228 |
|
|
Establishing the Baseline with Standard Choices |
229 |
|
|
Building an Automated, End-to-End Pipeline |
229 |
|
|
Orchestration for Visibility |
229 |
|
|
Analysis of Overfitting and Underfitting |
229 |
|
|
Hyper-Parameter Tuning |
231 |
|
|
Summary |
231 |
|
|
Index |
232 |
|