L1 Regularization
Mechanics
layers = tf.layers.Dense(
100, activation="elu",
kernel_initializer="he_normal",
kernel_regularizer=tf.keras.regularizers.l1())
Pros
- It will be more precise than its congener L1.
- Avoid only the overfitting in big dataset.
Cons
- Don’t avoid usually overfitting.
- It is slower than its L2 congener.
L2 Regularization
Mechanics
layers = tf.layers.Dense(
100, activation="elu",
kernel_initializer="he_normal",
kernel_regularizer=tf.keras.regularizers.l2(0.01))
Pros
- It is faster than its L2 congener.
- Avoid the overfitting.
Cons
- It will be less precise than its congener L1.
If you would like to use both of them, you could use
tf.keras.regularizers.l1_l2()
function.
Dropout
Mechanics
The principle of the dropout consists in being able to stop any neurons of any layer (except the output layer).
The extinction rate (p) is generally between 20 and 30 percent for recurrent neural networks. It can go up between 40 and 50 in convolutional neural networks.
Pros
- More robust to fluctuation.
Cons
- Overfitting can be seen when a large network is used with a small dataset.
- More neuron’s excitation (increase weights).
Data Augmentation
Mechanics
The principle of data augmentation is to artificially augment the dataset by generating many realistic variants of each training instance.
The data generated must be as close as possible to the realistic data while being different. A shift (to the right, down, etc.) for example.
tf.nn.local_response_normalization()
Pros
- Allow to train the model with less usable data
Cons
- Increase training time
Early Stopping
Mechanics
The early stopping principle for gradient descent consists in interrupting the search for the minimum when the validation error reaches a certain threshold.
In the above image, we could see the error decrease until the early stopping point before growing again. If we don’t apply early stopping principle, at this point, the model begins to overfit.
Pros
- Avoid the overfitting
Cons
- If the threshold is not well defined, this regularization will stop the learning phase too early. Otherwise, it may be useless.