**Perceptron(Artificial Neuron) – Weighted sum of inputs(SUM(Wi*xi)) + bias(b)** , feed into an **activation function.** Output from fn will determine if neuron is fire or not. This is a binary classifier. The network need to determine the value of W and b and to quantify the error.

### Gradient Descent

Difference between input and output are call loss and to improve model prediction capability, we want to minimise loss. **Gradient Descent is an optimizing algorithm**, will iterate different combinations of W and b to find the combi to **minimise the loss.**

### The XOR (Exclusive OR) problem

Linear function some times cannot separate some classes of data. Sigmoid and Tanh(hyperbolic) functions have vanishing gradient issues hence affect performance.** **They will saturate, which means, large inputs will tend to become 1 and small values jump towards -1(Tanh) and 0(sigmoid). They are **only good and sensitive for inputs near their mean. **

**ReLU (Rectified Linear Unit) function is the most popular one to solve the XOR problem.** It is a piecewise linear non-saturating activation function make famous by AlexNet’s creator.

## Hyperparameters

Hyperparameters are parameters that are set before the learning process (training). Such as: **what activation function to use, how many neurons in each layer, learning rate, how many hidden layers. **

## Visualization tool for ML

this site provide cool visualization for you models : playground.tensorflow.org