Classification Algorithms

1 minute read

Published: March 11, 2024

Logistic Regression

Description

$z = w_0+w_1x_1+w_2x_2+...++w_nx_n$ Then, the sigmoid function is used to convert z to probability $p$: $p = \frac{1}{1+e^{-z}}$ $p$ can later be converted to binary classification.

Pros

Easy to implement
Suitable for linear separable cases
Result explanable

Cons

Hard to solve non-linear problems
Performance impaired when dealing highly correlated data
Sensitive to noisy data
Not suitable for classification of more than 2 classes

Decision Tree

Description

Feature selection
Tree construction: recursively

Pros

Easy and explanation
Capable of handling both numerical and categorical data
No need to prepare data, e.g. normalization

Cons

Easy to overfit, especially on data with high dimensionality
Sensitive to outliers
Unstable, small changes of data could result in large changes of the tree structure
Not suitable of handling complicated relationship

Suitable for

Cases where explanability is of essence, e.g. medical diagnosis
Small to medium data size
Data easy to be splited
Feature selection

Random Forest

Naive Bayes

Description

Assumption:

All features are independent on each other.

$$P(C_k	x) = \frac{P(x	C_k)P(C_k)}{P(x)}$$
$$P(x	C_k) = P(x_1	C_k)P(x_2	C_k)…P(x_n	C_k)$$

K-Nearest Neighbors (KNN)

Classification: majority vote
Regression: weighted average

Distance Measurements

Euclidean Distance
Manhattan Distance
Chebyshev Distance
Minkowski Distance
Mahalanobis Distance

Cons:

Calculation cose
Space complexity
Imbalanced classes
Limited performance on data with high dimensionality
Sensitive to noisy data
Hard to decide the correct value of k

Support Vector Machine

Linear SVM (Hard Margin) —— $f(x) = sign(w\cdot x+b)$

Non-linear (kernel trick)

$f(x) = sign(\sum_{i=1}^{n}\alpha_{i}y_iK(x_i, x)+b)$

SVM for imbalanced class (Soft Margin)

this is very easy to accomplish using the aligned environment from amsmath:

\[\begin{equation} \begin{aligned} \min_{w,b,\xi} \quad & \frac{1}{2}w^{t}w+C\sum_{i=1}^{N}{\xi_{i}}\\ \textrm{s.t.} \quad & y_{i}(w\phi(x_{i}+b))+\xi_{i}-1\\ &\xi\geq0 \\ \end{aligned} \end{equation}\]

Pros:

Can be used in various cases: text classification, image recognition, etc.
Robust to noisy data points
Avoid stucking at local optima
Still works in high dimensional space (kernel trick)
Avoid over-fitting (by regularization)

Hudanyun Sheng

Classification Algorithms

Logistic Regression

Description

Pros

Cons

Decision Tree

Description

Pros

Cons

Suitable for

Random Forest

Naive Bayes

Description

Assumption:

K-Nearest Neighbors (KNN)

Distance Measurements

Cons:

Support Vector Machine

Non-linear (kernel trick)

SVM for imbalanced class (Soft Margin)

Pros:

Cons:

AdaBoost

Gradient Boosting Trees

Multilayer Perceptrons

Share on

You May Also Enjoy

Boruta Feature Selection

Purpose

Kernel Density Estimation

Financial Credit Risk Management

信贷业务模型维度

Fully Convolutional Networks