# Machine Learning Glossary

A collection of commonly used machine learning terms, and what they mean.

## B #

### Back Propagation #

The algorithm used to find the gradient of the loss function with respect to the individual weights. It works by repeatedly applying the chain rule.

### Bias #

Statistical bias is the systematical difference in a models hypothesis and and the true distribution. In neural networks, bias can also refer to the bias unit which always takes the value of 1, and functions as threshold for a neuron's activation.

In machine learning, having a high bias is a sign of underfitting.

### Binomial Distribution (Normal Distribution) #

A common distribution of probabilities that follows a bell shaped curve. The distribution is often found in nature.

_{Synonyms: Normal Distribution, Gaussian Distribution, Laplace-Gauss Distribution}

## C #

### Chain Rule #

A mathematical theorem that lets you find the gradient of composite functions. It is defined as:

/$f(g(x))' = f'(g) \cdot g'(x)/$

### Classification #

Predicting descrete values such as of which group an object is a member. An example might be predicting which species an animal belongs to.

### Convolutional Neural Network (CNN) #

A type of neural network modelled after the visual cortex. It has been found especially powerful in image recognition tasks, but also natural language problems.

_{Synonyms: CNN}

## D #

### Deep Learning (DL) #

Neural networks with more than 1 hidden layers.

Read more about deep learning.

_{Synonyms: DL}

## F #

### F1 Score #

A single measurement that combines precision and recall. It is defined as:

/$F_1 = 2 \cdot \frac{1}{\frac{1}{\text{precision}}+\frac{1}{\text{recall}}}/$

_{Source: David M W Powers}

## G #

### Gradient #

A vector of the derivatives of all the inputs to a multivariate function. The gradient of a function at a given point will point towards the direction of steepest ascent.

### Gradient Descent #

The primary algorithm used to train neural networks.It works by iteratively taking small steps towards a lower error value using the gradient of the error function with respect to individual weights.

## H #

### Hadamard Product (/$\odot/$) #

Elementwise multiplication defined for two matrices of the same dimensions.

The resulting matrix is found by multiplying each value in one matrix by the value at the same index in the other matrx.

/$$ A \odot B = \begin{bmatrix} a_{(1,1)} b_{(1,1)} & \cdots & a_{(1,c)} b_{(1,c)} \\ \vdots & \ddots & \vdots \\ a_{(r,1)} b_{(r,1)} & \cdots & a_{(r,c)} b_{(r,c)} \end{bmatrix} /$$

_{Synonyms: /$\odot/$, Schur product, Entrywise product}

## L #

### Loss Function (Error function) #

The criteria on which a supervised model is evaluated. The goal is to minimize this.

_{Synonyms: Error function}

## M #

### Machine Learning (ML) #

A class of algorithms that learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

Read more about machine learning.

_{Synonyms: ML, Machine Intelligence}

_{Source: Tom M. Mitchell}

## N #

### Neural Network (NN) #

A type of machine learning algorithm broadly inspired by the biological brain. Neural networks are designed around layers of neurons doing simple operations. Fitting a neural network is a matter of finding a set of weights of the connections between the neurons that result in a low error.

Read more about neural network.

_{Synonyms: NN, ANN, Artificial Neural Network}

## O #

### Overfitting #

When a model fits a data distribution too closely, and thus fails to generalize to new data.

Overfitting is measurable as having a high variance, and a large difference in accuracy between the training and test set.

## P #

## R #

### Regression #

Predicting continous values such as where on a spectrum an object falls. An example might be predicting the price of a house.

### Reinforcement Learning (RL) #

Learning through reinforcement; punishment and reward. Inspired by behavioural psychology.

_{Synonyms: RL}

## S #

### Standard Deviation (/$\sigma/$) #

A measure that is used to quantify the amount of variation or dispersion of a set of data values.

_{Synonyms: /$\sigma/$}

_{Source: Bland J.M. Altman}

### Stochastic Gradient Descent (SGD) #

The same as gradient descent, but where you update the weights before having gone through all the training data. This is a very common technique.

_{Synonyms: SGD}