# Machine Learning Glossary

A collection of commonly used machine learning terms, and what they mean.

## A #

### Artificial Intelligence (AI) #

Systems that exhibit intelligent behaviour.

Synonyms: AI

## B #

### Back Propagation #

The algorithm used to find the gradient of the loss function with respect to the individual weights. It works by repeatedly applying the chain rule.

### Bias #

Statistical bias is the systematical difference in a models hypothesis and and the true distribution. In neural networks, bias can also refer to the bias unit which always takes the value of 1, and functions as threshold for a neuron's activation.
In machine learning, having a high bias is a sign of underfitting.

### Binomial Distribution (Normal Distribution) #

A common distribution of probabilities that follows a bell shaped curve. The distribution is often found in nature.

Synonyms: Normal Distribution, Gaussian Distribution, Laplace-Gauss Distribution

## C #

### Chain Rule #

A mathematical theorem that lets you find the gradient of composite functions. It is defined as:

/$f(g(x))' = f'(g) \cdot g'(x)/$

### Classification #

Predicting descrete values such as of which group an object is a member. An example might be predicting which species an animal belongs to.

### Convolutional Neural Network (CNN) #

A type of neural network modelled after the visual cortex. It has been found especially powerful in image recognition tasks, but also natural language problems.

Synonyms: CNN

## D #

### Deep Learning (DL) #

Neural networks with more than 1 hidden layers.

Synonyms: DL

### Derivative #

The slope of a line at a given point.

## F #

### F1 Score #

A single measurement that combines precision and recall. It is defined as:

/$F_1 = 2 \cdot \frac{1}{\frac{1}{\text{precision}}+\frac{1}{\text{recall}}}/$

Source: David M W Powers

### False Negative Prediction #

Falsely believing that an event has not occured.

### False Positive Prediction #

Falsely believing that an event has occured.

### Feature Engineering #

Computing new features from existing features in a dataset.

### Feature Space #

The /$n/$-dimensions where the variables, or features, of a model live.

## G #

A vector of the derivatives of all the inputs to a multivariate function. The gradient of a function at a given point will point towards the direction of steepest ascent.

The primary algorithm used to train neural networks.It works by iteratively taking small steps towards a lower error value using the gradient of the error function with respect to individual weights.

## H #

### Hadamard Product (/$\odot/$) #

Elementwise multiplication defined for two matrices of the same dimensions.
The resulting matrix is found by multiplying each value in one matrix by the value at the same index in the other matrx.

/$$A \odot B = \begin{bmatrix} a_{(1,1)} b_{(1,1)} & \cdots & a_{(1,c)} b_{(1,c)} \\ \vdots & \ddots & \vdots \\ a_{(r,1)} b_{(r,1)} & \cdots & a_{(r,c)} b_{(r,c)} \end{bmatrix} /$$

Synonyms: /$\odot/$, Schur product, Entrywise product

### Hidden Layer #

A layer in a neural network that is neither the input layer, or the output layer.

## L #

### Loss Function (Error function) #

The criteria on which a supervised model is evaluated. The goal is to minimize this.

Synonyms: Error function

## M #

### Machine Learning (ML) #

A class of algorithms that learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

Synonyms: ML, Machine Intelligence

Source: Tom M. Mitchell

## N #

### Negative Prediction #

Predicting that an event has not occured.

### Neural Network (NN) #

A type of machine learning algorithm broadly inspired by the biological brain. Neural networks are designed around layers of neurons doing simple operations. Fitting a neural network is a matter of finding a set of weights of the connections between the neurons that result in a low error.

Synonyms: NN, ANN, Artificial Neural Network

## O #

### Overfitting #

When a model fits a data distribution too closely, and thus fails to generalize to new data.
Overfitting is measurable as having a high variance, and a large difference in accuracy between the training and test set.

## P #

### Positive Prediction #

Predicting that an event has occured.

### Precision #

How many of the positive predictions were correct?

Source: David M W Powers

## R #

### Recall #

How many of the positive events were correctly predicted?

Source: David M W Powers

### Recursion #

A function that calls itself.

### Regression #

Predicting continous values such as where on a spectrum an object falls. An example might be predicting the price of a house.

### Reinforcement Learning (RL) #

Learning through reinforcement; punishment and reward. Inspired by behavioural psychology.

Synonyms: RL

## S #

### Standard Deviation (/$\sigma/$) #

A measure that is used to quantify the amount of variation or dispersion of a set of data values.

Synonyms: /$\sigma/$

Source: Bland J.M. Altman

### Stochastic Gradient Descent (SGD) #

The same as gradient descent, but where you update the weights before having gone through all the training data. This is a very common technique.

Synonyms: SGD

### Supervised Learning #

Learning from labeled data. This is the most common learning method.

## T #

### True Negative Prediction #

Correctly believing that an event has not occured.

### True Positive Prediction #

Correctly believing that an event has occured.

## U #

### Underfitting #

When a model fits a data distribution too loosely, and thus fails to capture patterns in the data.
Underfitting is measurable as having a high bias.

### Unsupervised Learning #

Learning from unlabeled data.

## V #

### Variance #

Statistical variance is a measure of how spread-out a dataset is. It's formally defined as /$\sigma^2/$.
In machine learning, having a high variance is a sign of overfitting.