Here in this post we'll be implementing popular deep learning activation functions from the ground up using numpy.

So for you to follow this post you need to things:

`numpy`

and- Some free time of yours.

If you have written any deep learning code before you likely have used these activations:

- Softmax
- ReLU, LeakyReLU
- and the good-old Sigmoid activation.

In this post I'll implement all these activation functions with numpy and also the derivative of these for the back-propagation.

## ReLU

Let's get the easy one out first. ReLU is called rectified linear unit, where:

```
class ReLU:
"""Applies the rectified linear unit function element-wise.
ReLU operation is defined as the following
"""
def __call__(self, x):
return np.where(x >= 0, x, 0)
def gradient(self, x):
"""
Computes Gradient of ReLU
Args:
x: input tensor
Returns:
Gradient of X
"""
return np.where(x >= 0, 1, 0)
```

Usage:

```
relu = ReLU()
z = np.array([0.1, -0.4, 0.7, 1])
print(relu(z)) # ---> array([0.1, 0. , 0.7, 1. ])
print(relu.gradient(z)) # ---> array([1, 0, 1, 1])
```

## Sigmoid

Sigmoid function is defined as the following

The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice. The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points. The function is monotonic but function’s derivative is not. The logistic sigmoid function can cause a neural network to get stuck at the training time. The softmax function is a more generalized logistic activation function which is used for multi-class classification.

The element wise `exp`

can be done like the following, and if you calculate the derivative you can find that `d/dx sigmoid(x) = sigmoid(x) *(1- sigmoid(x))`

.

So let's write up the activation function for sigmoid operation:

```
class Sigmoid:
"""
Applies the element-wise function
Shape:
- Input: :math:`(N, *)` where `*` means, any number of additional
dimensions
- Output: :math:`(N, *)`, same shape as the input
"""
def __call__(self, x):
return 1 / (1 + np.exp(-x))
def gradient(self, x):
r"""Computes Gradient of Sigmoid
.. math::
\frac{\partial}{\partial x} \sigma(x) = \sigma(x)* \left ( 1- \sigma(x)\right)
Args:
x: input tensor
Returns:
Gradient of X
"""
return self.__call__(x) * (1 - self.__call__(x))
```

and the usage would be like this:

```
import numpy as np
z = np.array([0.1, 0.4, 0.7, 1])
sigmoid = Sigmoid()
return_data = sigmoid(z)
print(return_data) # -> array([0.52497919, 0.59868766, 0.66818777, 0.73105858])
print(sigmoid.gradient(z)) # -> array([0.24937604, 0.24026075, 0.22171287, 0.19661193])
```

## LeakyReLU

LeakyReLU operation is similar to the ReLU operation also called the Leaky version of a Rectified Linear Unit. It essentially instead of putting zeros everywhere it sees < 0, it puts an predefined -ve slope like -0.1 or -0.2 etc. You mention an alpha, and it'll put -alpha where X < 0.

The following image shows the difference between ReLU and LeakyReLU

```
class LeakyReLU:
"""Applies the element-wise function:
Args:
- alpha: Negative slope value: controls the angle of the negative slope in the :math:`-x` direction. Default: ``1e-2``
"""
def __init__(self, alpha=0.2):
self.alpha = alpha
def __call__(self, x):
return np.where(x >= 0, x, self.alpha * x)
def gradient(self, x):
"""
Computes Gradient of LeakyReLU
Args:
x: input tensor
Returns:
Gradient of X
"""
return np.where(x >= 0, 1, self.alpha)
```

`tanH`

or hyperbolic tangent activation

The sigmoid maps the output between 0-1 but here `tanH`

maps the output to -1 and 1. The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph.

The function is differentiable. And the function is monotonic while its derivative is not monotonic. The `tanH`

function is mainly used classification between two classes.

Let's implement this in code

```
class TanH:
def __call__(self, x):
return 2 / (1 + np.exp(-2 * x)) - 1
def gradient(self, x):
return 1 - np.power(self.__call__(x), 2)
```

## Softmax

Softmax loss is used when multi-class classifications are performed. So here is the code:

```
class Softmax:
def __call__(self, x):
e_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
return e_x / np.sum(e_x, axis=-1, keepdims=True)
def gradient(self, x):
p = self.__call__(x)
return p * (1 - p)
```