Classification Algorithms
Published:
Classification Algorithms
Logistic Regression
Description
\(z = w_0+w_1x_1+w_2x_2+...++w_nx_n\) Then, the sigmoid function is used to convert z to probability $p$: \(p = \frac{1}{1+e^{-z}}\) $p$ can later be converted to binary classification.
Pros
- Easy to implement
- Suitable for linear separable cases
- Result explanable
Cons
- Hard to solve non-linear problems
- Performance impaired when dealing highly correlated data
- Sensitive to noisy data
- Not suitable for classification of more than 2 classes
Decision Tree
Description
- Feature selection
- Tree construction: recursively
Pros
- Easy and explanation
- Capable of handling both numerical and categorical data
- No need to prepare data, e.g. normalization
Cons
- Easy to overfit, especially on data with high dimensionality
- Sensitive to outliers
- Unstable, small changes of data could result in large changes of the tree structure
- Not suitable of handling complicated relationship
Suitable for
- Cases where explanability is of essence, e.g. medical diagnosis
- Small to medium data size
- Data easy to be splited
- Feature selection
Random Forest
Naive Bayes
Description
Assumption:
All features are independent on each other.
$$P(C_k | x) = \frac{P(x | C_k)P(C_k)}{P(x)}$$ | ||
$$P(x | C_k) = P(x_1 | C_k)P(x_2 | C_k)…P(x_n | C_k)$$ |
K-Nearest Neighbors (KNN)
- Classification: majority vote
- Regression: weighted average
Distance Measurements
- Euclidean Distance
- Manhattan Distance
- Chebyshev Distance
- Minkowski Distance
- Mahalanobis Distance
Cons:
- Calculation cose
- Space complexity
- Imbalanced classes
- Limited performance on data with high dimensionality
- Sensitive to noisy data
- Hard to decide the correct value of k
Support Vector Machine
Linear SVM (Hard Margin) —— \(f(x) = sign(w\cdot x+b)\)
Non-linear (kernel trick)
\(f(x) = sign(\sum_{i=1}^{n}\alpha_{i}y_iK(x_i, x)+b)\)
SVM for imbalanced class (Soft Margin)
this is very easy to accomplish using the aligned environment from amsmath:
\[\begin{equation} \begin{aligned} \min_{w,b,\xi} \quad & \frac{1}{2}w^{t}w+C\sum_{i=1}^{N}{\xi_{i}}\\ \textrm{s.t.} \quad & y_{i}(w\phi(x_{i}+b))+\xi_{i}-1\\ &\xi\geq0 \\ \end{aligned} \end{equation}\]Pros:
- Can be used in various cases: text classification, image recognition, etc.
- Robust to noisy data points
- Avoid stucking at local optima
- Still works in high dimensional space (kernel trick)
- Avoid over-fitting (by regularization)
Cons:
- Only work for binary classification case
- High computation complexity
- Sensitive to choice of parameter
- Sensitive to missing data
AdaBoost
Gradient Boosting Trees
Multilayer Perceptrons
Artificial Neural Network