C1_W1.pdf
Let’s see with an example
Changing for the sigmoid function to the RELU function improves the gradient descend because the gradient of RELU is always 1 and the sigmoid sometimes is near to 0
The process of training a NN is iterative