WMR - Lecture Notes Summary

Lecture Info
Data, Content and Interpretation
What is Web Mining?
What is Machine Learning?
- Supervised Learning
- Unsupervised Learning

Lecture Info
Machine Learning
Classification
- Formalization
- Classificators Functions
Regression
- Formalization
- Regression Functions
Model Selection
- Model Family Selection
- Model Parametrization
- Searching for the Optimal Function
- Linear Models
- Probabilistic Models
  - Graphical models
  - Weighted Grammars
  - Hidden Markov Model
Summary

Lecture Info
Vector Spaces
- Basic Operations
- Linear Indipendence
- Basis
- Dot Product
- Norm
- Distance
  - Euclidean Distance
  - Cosine Distance
- Orthogonality
Text Classification
- Vector Space Model
- Task Definition
- Rocchio's Algorithm
  - Documents Vectors
  - Classes Vectors
  - Example
  - Limitations
- Memory Based Learning
- kNN
  - Example
  - Algorithm
Dimensionality Reduction
- Clustering
  - k-Mean
  - Distance and Similarity

Lecture Info
Dimensionality Reduction
- Statistical Techniques
- Reconstruction Techniques (w/ Clustering)
- Linear Algebra Techniques
Distance, Similarity and Clustering
- Fuzzy sets
- Pearson Correlation
- Jaccard Similarity
- Dice Coefficient
- Clustering
The Importance of Representation
Questions

Lecture Info
Text Classification
Possible Approaches
- Manual classification
- Automatic Classification
Bayesian Methods
- Baye's Rule
- Maximum a Posteriori Hypothesis
- Naive Bayes Classifiers
Multivariate Binomial Model
- Learning the Model
- Example
- Applying the Model
- Constructing the Vocabulary
Problems with Naive Bayes
- Independence Assumption
  - Laplace Smoothing
- Underflow Prevention

Lecture Info
Multivariate Multinomial Model
Stochastic Language Models
- Unigram Model
- Bigram Model
- N-Gram Model
Language Models and Naive Bayes
- Learning the Model
- Applying the Model
- Time Complexity
Summary of the two Models
- Example
Feature Selection
- Mutual Information
- How it is Done
Evaluation
- WebKB Experiment
- Problems to be Solved
- Most Common Category
Violation of NB Assumptions
Positive Properties of NB
References

Lecture Info
Evaluation
- Types of Metrics
Evaluating Classification
- Confusion Matrix
- Accuracy and Error Rate
  - Problem with Accuracy
- Precision, Recall, F-Measure
  - Trade-Off Between Precision and Recall
  - Break-Event Point (EBP)
  - Combining Precision and Recall
- Combining Measurements for Multiple Classes
  - Microaverages
Parameter Tuning
- Cross-Validation (Fixed-split)
- N-Fold Cross Validation
- Tuning a Classifier
The Complete ML Process
Example: Reuters Classification
- Parameter Estimation Procedure
- References

Lecture Info
Natural Language
- What's in a Document?
- Levels of Interpretation
- Ambiguity
Information Retrieval
- IR Models
- IR Tasks
- Learning and IR

Lecture Info
The Problem
Decision Tree (J48)
- Basic Idea
- Entropy
- Construction
  - Example
- Discretization of Features
- Algorithm
Weka
- Format ARFF
- Weka Interface
- Tree Visualization

Lecture Info
Ambiguity
The NLP Process
- Syntactic Analysis
  - Parse Tree
  - Dependency/Constituency Reations
  - Dependency Parsing
  - Ambiguity in Syntactical Parsing
  - Modern Parsers
- Semantic Analysis

Lecture Info
RevNLT
Semantinc Analysis (cont.)
- Compositionality
- Towards Lambda-Calculus
- Lambda-Calculus
- World Model
- Meaning
- Lexical Semantic
WordNet
Esercizi
Semantic Parsing
- Frame Net
- SRL Pipeline

Lecture Info
Standford CoreNLP
- How to Use it
- The CONNL Tabular Format
Spacy
- Anaconda
- Basic Example
Example (Wikipedia words)
- Extracting Triples (Subject, Verb, Object)
- Generalizing
Exercise: Q/A

Lecture Info
Outline
Linguistic Structures
Language Modelling
- N-Gram Models
- Stochastic Taggers/Grammars
- Advantages
Markov Model
- Visible Markov Model
- Hidden Markov Model
  - Problems solved by HMM
PoS Tagging

Lecture Info
HMM for Pos Tagging
- Questions in POS tagging
- Advantages of using HMM
Forward Algorithm
- Formal Description
Viterbi Algorithm
- Formal Description
Parameter Estimation
- Supervised Methods
- Unsupervised Methods
- Baum-Welch Method
  - Forward-Backward Algorithm
  - Expectation Step (E-step)
  - Maximization Step (M-step)
  - Example of Baum-Welch
References
Exercise

Lecture Info
Review
- Types of HMM problems
- Viterbi Algorithm
Baum-Welch Method
- Overall Scheme
- Forward/Backward Probabilities
- Updating Step
- Final Considerations
Example of HMM
Use Cases for HMMs in NLP
- HMM Decoding for NLP
Exercise

Lecture Info
On Learning
- Training Set
- Learning Class \(C\)
- Version Space
PAC Learning
- PAC-Learnability
- Example
VC-Dimension
- Axies Aligned Rectangles
- Lines
- Circles

Lecture Info
Model Selection
- Triple Trade-Off
- Expected and Empirical Error
- Learning and VC-Dimension
- How to Select a Model
  - Example (Structural Risk Minimization)
Learning Machines
Using VC-Dimensionality
Recap
Exams Questions
- MidTerm Topics
- Open Questions
  - Example 1
  - Example 2
  - Example 3
  - Example 4
  - Example 5
- Closed Questions

Lecture Info
Linear Classifiers
Perceprton
- Functional Margin
- Geometric Margin
- On-Line Algorithm
- Novikoff Theorem
- Duality
Limitations of Linear Classifiers
Support Vector Machines
- Maximum Marign Hyperplane
- Support Vectors
- How to Compute the Maximum Margin
- The Lagrangian
Questions

Lecture Info
Recap
Solving the Dual Problem
Khun-Tucker Theorem
Dealing with Non-Linearly Separable Data
- Soft-Margin SVM
Soft vs Hard Margin SVMs

Lecture Info
Clustering and Unsupervised Learning
- Hierarchical Clustering
- Direct Clustering
- Aspects of Clustering
- Agglomerative/Divise Hierarchical Clustering
Hierarchical Agglomerative Clustering (HAC)
- Chaining Effect
- Complete Link Example
- Computational Complexity
Non-Hierarchical Clustering
- K-Means
  - Time Complexity
  - Problems
- QT K-Means
Distance Metrics
Data Standardization
- Interval-Scaled Attributes
Cluster Evaluation
- External Criteria
  - Purity
  - Entropy
Soft Clustering
Subspace Clustering (LAC)
Example Application (Text Clustering)

Lecture Info
SVMs and Scalar Product
Kernel Function
- Example
- Feature Spaces
- Gram-Matrix
- Kernelixed Perceprton
Kernel in SVMs
Finding Kernels
Kernel Examples
- The Polynomial Kernel

Lecture Info
Conjuction of Features
String Kernel
- Formal Definition
- Exercise
Tree Kernels

Lecture Info
SVM
- Example
- Multiple Classes
- False Positives vs False Negatives
Tree Kernel
RBF Kernel
KELP

Lecture Info
Intro to Deep Learning
- Types of Neural Networks
- Dimensions of a Task
- Symbols, Rules and Observations
- Connectionism and Deep Learning
- What we want
- History
Vector Spaces, Functions and Learning
- On Representation
- The Role of Depth
Multilayer Perceptron
Neural Networks
- Single Neuron View
- Sigmoid

Lecture Info
Keras
- Batch, Steps and Epochs
- API Examples
  - The Sequential API
  - The functional API
  - Sequential NN examples
Example 1
Example 2: MNIST Dataset

WMR - Lecture Notes Summary

01 - Introduzione al WMR ▿

02 - Maching Learning Methods ▿

03 - Geometrical Models I ▿

04 - Geometrical Models II ▿

05 - Naive Bayes Classifier I ▿

06 - Naive Bayes Classifier II ▿

07 - Machine Learning Metrics ▿

08 - Natural Language Processing I ▿

09 - Lab I ▿

10 - Natural Language Processing II ▿

11 - Natural Language Processing III ▿

12 - Lab II ▿

13 - Hidden Markov Models I ▿

14 - Hidden Markov Models II ▿

15 - Hidden Markov Models III ▿

16 - PAC Learnability ▿

17 - Model Selection ▿

18 - Support Vector Machines I ▿

19 - Support Vector Machines II ▿

20 - Clustering ▿

21 - Kernel Methods I ▿

22 - Kernel Methods II ▿

23 - Lab III ▿

24 - Neural Networks I ▿

25 - Lab IV ▿