IR - Lecture Notes Summary
Lecture info
Classical Search Model
Precision and Recall
Boolean Retrieval Model
Boolean-Query Problem
Term-Doc Incidence Matrix
Inverted-Index
Stages of text-processing
Index Construction
Query Processing
Final observations
Lecture Info
Phrase Queries
Biwords Indexes
Positional Indexes
Combination Schemes
Extended Biwords
Statistical Approach
Index Construction
Dataset Reuters RCV1
Hardware basics
BSBI
SPIMI
Distributed Indexing
Lecture Info
Distributed Indexing
Types of Tasks
TF Computation
Index Transformation
MapReduce
Dynamic Indexing
Aux and Main
Logarithmic Merge
Pseudo-Code
Complexity
Final Observations
Lecture Info
Compression in IR
Lossless vs lossy compression
Empirical laws
Heap's Law
Zipf's Law
Dictionary Compression
Dictionary-as-a-String
Blocking
Dictionary Search
Front coding
Postings Compression
Grap Compression
Lecture Info
Bit Codes
Unary Code
Gamma Code
VB Codes
VSEncoding
Lecture Info
Problems with Boolean Search
Ranking Retrieval
Guidelines on How to Rank
Jaccard Coefficient
Term-Frequency
tf-matching-score
Document Frequency
\(\text{tf-idf}\) Weighting
Weight Matrix Usage
Vector Space Model
Lecture Info
Cosine Similarity
Normalization
Basic Computation
How is it Actually Computed?
\(\text{tf-idf}\) Variants
Probabilistic Approach
Document Ranking Problem
Probabilistic Ranking Principle (PRP)
Binary Indipendence Model (BIM)
Ranking Function
Estimation
Lecture Info
Valutare un Sistema di IR
Labeled Document Collections (Gold Standard)
Precision and Recall
Combinare Precision e Recall
Rank-Based Measures
Precision@K (P@K)
Mean Average Precision
Beyond Binary Relevance
Discounted Cumulative Gain (DCG)
Mean Reciprocal Rank
User Behavior
Lecture Info
Probability Ranking (Cont.)
Stimare \(u_t\) e \(p_t\)
Ad-Hoc Retrieval
Okapi BM25
Term Frequency
Length Normalization
Term Frequency for Queries
Modelli di Linguaggio
Unigram Language Model
Lecture Info
Text Classification
Naïve Bayes Classifier
Come stimare i parametri
Perché Naïve Bayes?
Smoothing
Evaluating Classification
Feature Selection
Modelli di Linguaggio
Smoothing in Language Models
Dirichlet Smoothing
Lecture Info
Unigram Inverted Index
Imports
Data Extraction
Pre-Processing
Remove Header
Convert Lower Case
Convert Numbers
Remove Punctuaction
Remove Stop Words
Remove Apostrophe
Remove Single Characters
Stemming
Inverted Index Construction
Implementation of Query Language
Examples
Exercises
Positional Index
Positional Index Construction
Implementation of Query Language
Examples
Exercises
Lecture Info
Recap: Ranking Documents
Selection vs Sorting
Safe vs Non-Safe Ranking
Speeding Cosine Computation
Index Elimiation
Champion Lists
Query-Independent Document Scores
Cluster Pruning
Tiered Indexes
Impact-ordered postings
WAND Scoring
Lecture Info
Vector Space Model Construction with TF-IDF
Ranking with Cosine Similarity
Lecture Info
Relevance Feedback
Algoritmo di Rocchio
Query Expansion
Lecture Info
Anchor Text
Citation Analysis
Page Rank
Hubs and Authorities