ML - Lecture Notes Summary
Lecture Info
Objectives of ML
Example: Text Recognition
The ML Process
Types of Problems
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Lecture Info
Problem Description
Example
Linear Models
Least Squares
Overfitting
Evaluating the Generalization
Regularization
Lecture Info
Discrete Random Variables
Continuous Random Variables
Expectation
Variance
Probability Distributions
Multivariate Distributions
Covariance
Random Vectors
Covariance Matrix
Correlation Matrix
Multinomial Distribution
Dirichlet Distribution
Gaussian Distribution
Univariate
Multivariate
Covariance Matrix
Spectral Properties of \(\Sigma\)
Linear Transformation
Marginal and conditional
Bayes formula
Lecture Info
Frequentists vs Bayesians
Bayesian Inference
Conjucate Distributions
Beta-Bernoulli
Beta-Binomial
Dirichlet-Multinomial
Text modeling
Lecture Info
Definition of Entropy
Properties
Conditional Entropy
KL Divergence
Convexity
Jensen's Inequality
Applying KL
Mutual Information
Lecture Info
Model Inference
Subproblems
Bayesian Learning
Over Model Space
Over Parameters Space
Point Estimate
Maximum Likelihood Estimate
Example (Bernoulli):
ML and Overfitting
Maximum a Posteriori Estimate
Example (Beta-Bernoulli)
Problems with MAP
Bayesian Estimate
Model Selection
Validation Process
Test set/Training set
Cross validation
Information Measures
Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
Lecture Info
Fitting in Terms of Probability
Frequentist Approach
Bayesian Approach
Bayesian Fitting
Quando Tutto รจ Gaussiano
Lecture Info
Linear Models
Base Functions
Examples
Training Set
ML and Least Squares
Maximizing the Likelihood
Geometric Interpretation
Regularized Least Squares
Basic Form
General Form
Gradient Descent
Kernel Equivalent
Impostazione Duale/Primale
Kernel as Distance Functioon
Lecture Info
What is LC?
Difference with Regression
Approaches to LC
Generalized Linear Models
Linear Discriminant Functions
Binary Classification
Multiclass Classification
First Approach
Second Approach
Third Approach
Generalized Discriminant Functions
Lecture Info
LC Using Regression
Why it Makes Sense?
Learning Functions
Coefficient Matrix
Prediction Matrix
Residual Matrix
Closed Form Solution
Considerations
Fisher Linear Discriminant
Example
Basic Approach
Measuring Separation
Deriving the Direction
Solution
Refinment
Measuring Separation
Formula for Within-Class Variance
Fisher Criterion
Deriving the Direction
Solution
Deriving a Treshold
Perceptron
Problems
Definition
Cost Function
Gradient Optimization
Basic Gradient Descent
Stochastic Gradient Descent
Convergence Theorem
Structure
Lecture Info
Naive Bayes Classifiers
Language Models
Bayesian Classifiers
Computing \(P(C_k)\)
Computing \(P(d|C_k)\)
Generative Models
Binary Case (Sigmoid)
General Case (Softmax)
Gaussian Discriminant Analysis
Same Covariance Matrix
Binary Case
Discriminant Function
Multiple Classes
Decision Boundaries
Different Covariance Matrices
Estimation with ML
Estimating \(\pi\)
Estimating \(\mu_1, \mu_2\)
Estimating \(\mathbf{\Sigma}\)
Lecture Info
With Discrete Featues
Exponential Family
Generalized Linear Models
Hypothesis
GLM and Normal
GLM and Bernoulli
GLM and Categorial
Additional Regressions
Poisson
Exponential
Lecture Info
Discriminative Approach
Logistic Regression
Degrees of Freedom
ML Estimation
Gradient Ascent
Newton-Raphson Method
Linear Regression
Logistic Regression
Iterated Reweighted Least Squares
Lecture Info
Logistic Regression and GDA
Softmax Regression
Calcolo Gradiente
Probit Regression
Stochastic Treshold Model
Probit Activation Function
Bayesian Logistic Regression
Computing Posterior
Managing Intractability
Lecture Info
The Basic Problem
Sampling General Distributions
Easy Case
Example 1: Exponential
Rejection Sampling
Importance Sampling
Markov Chain Montecarlo
Markov Chains
MCMC Idea
How to Use it
Metropolis Algorithm
Existence of Stationary Distribution
Uniqueness of Stationary Distribution
Why it Works
Metropolis-Hastings Algorithm
Gibbs Sampling
MCMC and Bayesian Models
Sampling the Evidence
Sampling the Predictive Distribution
Lecture Info
Parametric Approach
Estimating Parameters
Maximum Likelihood (ML)
Maximum a Posteriori (MAP)
Bayesian Approach
Non Parametric Approach
Histograms
Kernel Density Estimators
Parzen Windows
Drawbacks
Smooth Kernel Functions
Gaussian Kernel Examples
Classification with Parzen Windows
K-Nearest Neighbors (kNN)
Classification with kNN
Performance
Lecture Info
Modello Nadaraya-Watson
Locally Weighted Regression
Local Logistic Regression
Lecture Info
Partitions of Gaussian Distributions
Marginal density
Conditional Density
Distributions over Functions (Finite Domains)
Gaussian Distributions
Gaussian Processes (Infinite Domains)
Sampling from GP
RBF Kernel
Gaussian Process Regression
Estimating Kernel Parameters
Lecture Info
Main Idea
Binary Classifiers
Optimal Margin Classifiers
Functional Margins
Geometric Margin
Maxium Margin Hyperplanes
Classification Details
Computing The Optimal Margin
Duality
Lecture Info
Recap
Lagrangian Method
Karush-Kuhn-Tucker Theorem
Applying Karush-Kuhn-Tucker Theorem
Defining the Dual Problem
Solving the Dual Problem
Classification with SVM
Non Separability Case
Slack Variables
KKT Conditions
Dual Formulation
Item Characterization
Classification
Extensions
Lecture Info
Computational Issues
Loss Functions in Classification
0/1 Loss
Regularization
Problems
Surrogate Smooth Loss Functions
Convex Loss Functions
Convex Surrogate Loss Functions
Hinge Loss
Subgradient
Perceprton
Logistic Loss
Regularization Terms
SVM and Gradient Descent
Lecture Info
Kernels Functions
Why Kernels?
Definition
Verification
Construction
Relevant Kernels
Kernels and SGD in SVM
Lecture Info
Multilayer Networks
Multi-Layered Percerpton
First Layer
Second Layer
Inner Layers
Output Layer
3-Layer Networks
Lecture Info
Approximating Functions
Training with ML
Regression
Binary Classification
Multiclass Classification
Iterative Methodos to Minimize Loss
Gradient Descent
On-line (stochastic) Gradient Descent
Batch Gradient Descent
Backpropagation
Example: 3-layered network
Computational Efficiency
Lecture Info
Deep Networks
Types
Learning
Loss Functions
Regularization
Vanishing Gradient
Exploding Gradient
Lecture Info
Convolutional Neural Networks
MLP Problems with Images
Convolution Operation
Local Connections
ConvNet Structure
Types of Layers
Example: ConvNet for CIFAR-10
Convolutional Layer
Depth and Stride
Connections Between Layers
Real-World Example
Summary
Pooling Layer
RELU Layer
Layer Patterns
Case Studies
Lecture Info
Recurrent Neural Networks
Sequential Data
RNN Network Structure
Computing Recurrent States
Computing Output Value
Folded/Unfolded Rappresentations
Learning
LSTM Networks
First Version
Input Layer
Output Layer
With Forget Layer
Variants
Peephole
GRU
Topologies
Lecture Info
Structure
Usage
Example: Iris Dataset
Pros/Cons
Construction
Partinioning
Impurity Measure
Goodness of split
Gini Index
Entropy as Impurity Measure
Other Measures of Impurity
When to Stop
Pruning
Lecture Info
Ensemble Methods
Bagging
Bootstrap Sample
Usage
Variant
Why it Works?
Classification
Regressione
Out-of-Bag Error
Random Forest
Bootsting
Adaboost
Binary Classification
Example
Additive Models
Fitting Additive Models
Forward Stagewise Additive Modeling
Adaboost as Additive Model
Gradient Boosting
Link to Gradient Descent
Algorithm
Regression
Classification
Lecture Info
Curse of Dimensionality
Dimensionality Reduction
PCA
Caso \(d^{'} = 0\)
Caso \(d^{'} = 1\)
Best way to project
Best direction
Caso \(d^{'} > 1\)
Example
Choosing \(d^{'}\)
Lecture Info
SVD
Why it Exists?
PCA and SVD
Co-occurence data
Latent Semantic Analysis
Assumption
Model
Problems
Solution (SVD)
Interpretation
Lecture Info
Clustering
Types
Partitional Clustering
Brute Force
Clustering Cost (Sum of Squares)
K-Means
Algorithm
Example
How to Choose \(K\)
Hierarchical Clusterting
By Aggregation
Dendogram
Cluster Similarity
Lecture Info
Mixtures of distributions
Example
Mixture Parameters Estimation
Respect to \(\pi\)
Respect to \(\lambda\)
Combining
Respect to \(\theta\)
Analytical Intractability
Mixtures as Generative Process
Example
Probabilistic Clustering
Distributions with latent variables
Mixtures of Gaussian Distribution
Lecture Info
Complete Dataset
Gaussian Mixtures
With Complete Dataset
Log-Likelihood of Complete Dataset
Dealing with Latent Variables
M-step
E-Step
Expectation Maximization Algorithm