Regularization in neural network

am2701
3 min readJun 6, 2022

L1 Regularization

Mechanics

L1 simple regularization formula
L1 detailed regularization formula
layers = tf.layers.Dense(
100, activation="elu",
kernel_initializer="he_normal",
kernel_regularizer=tf.keras.regularizers.l1())

Pros

  • It will be more precise than its congener L1.
  • Avoid only the overfitting in big dataset.

Cons

  • Don’t avoid usually overfitting.
  • It is slower than its L2 congener.

L2 Regularization

Mechanics

L2 simple regularization formula
L2 detailed regularization formula
layers = tf.layers.Dense(
100, activation="elu",
kernel_initializer="he_normal",
kernel_regularizer=tf.keras.regularizers.l2(0.01))

Pros

  • It is faster than its L2 congener.
  • Avoid the overfitting.

Cons

  • It will be less precise than its congener L1.

If you would like to use both of them, you could use tf.keras.regularizers.l1_l2() function.

Dropout

Mechanics

The principle of the dropout consists in being able to stop any neurons of any layer (except the output layer).

The extinction rate (p) is generally between 20 and 30 percent for recurrent neural networks. It can go up between 40 and 50 in convolutional neural networks.

Dropout bernoulli formula

Pros

  • More robust to fluctuation.

Cons

  • Overfitting can be seen when a large network is used with a small dataset.
  • More neuron’s excitation (increase weights).

Data Augmentation

Mechanics

The principle of data augmentation is to artificially augment the dataset by generating many realistic variants of each training instance.

The data generated must be as close as possible to the realistic data while being different. A shift (to the right, down, etc.) for example.

tf.nn.local_response_normalization()

Pros

  • Allow to train the model with less usable data

Cons

  • Increase training time

Early Stopping

Mechanics

The early stopping principle for gradient descent consists in interrupting the search for the minimum when the validation error reaches a certain threshold.

In the above image, we could see the error decrease until the early stopping point before growing again. If we don’t apply early stopping principle, at this point, the model begins to overfit.

Pros

  • Avoid the overfitting

Cons

  • If the threshold is not well defined, this regularization will stop the learning phase too early. Otherwise, it may be useless.

--

--