IR - Lecture Notes Summary

Lecture info
Classical Search Model
- Precision and Recall
Boolean Retrieval Model
- Boolean-Query Problem
- Term-Doc Incidence Matrix
- Inverted-Index
  - Stages of text-processing
  - Index Construction
  - Query Processing
Final observations

Lecture Info
Phrase Queries
- Biwords Indexes
- Positional Indexes
- Combination Schemes
- Extended Biwords
- Statistical Approach
Index Construction
- Dataset Reuters RCV1
- Hardware basics
- BSBI
- SPIMI
Distributed Indexing

Lecture Info
Distributed Indexing
- Types of Tasks
- TF Computation
- Index Transformation
- MapReduce
Dynamic Indexing
- Aux and Main
- Logarithmic Merge
  - Pseudo-Code
  - Complexity
Final Observations

Lecture Info
Compression in IR
- Lossless vs lossy compression
- Empirical laws
  - Heap's Law
  - Zipf's Law
Dictionary Compression
- Dictionary-as-a-String
- Blocking
  - Dictionary Search
- Front coding
Postings Compression
- Grap Compression

Lecture Info
Bit Codes
- Unary Code
- Gamma Code
VB Codes
VSEncoding

Lecture Info
Problems with Boolean Search
Ranking Retrieval
- Guidelines on How to Rank
- Jaccard Coefficient
- Term-Frequency
- tf-matching-score
- Document Frequency
- \(\text{tf-idf}\) Weighting
Weight Matrix Usage
- Vector Space Model

Lecture Info
Cosine Similarity
- Normalization
- Basic Computation
- How is it Actually Computed?
\(\text{tf-idf}\) Variants
Probabilistic Approach
- Document Ranking Problem
- Probabilistic Ranking Principle (PRP)
- Binary Indipendence Model (BIM)
  - Ranking Function
  - Estimation

Lecture Info
Valutare un Sistema di IR
Labeled Document Collections (Gold Standard)
Precision and Recall
- Combinare Precision e Recall
Rank-Based Measures
- Precision@K (P@K)
- Mean Average Precision
Beyond Binary Relevance
- Discounted Cumulative Gain (DCG)
- Mean Reciprocal Rank
User Behavior

Lecture Info
Probability Ranking (Cont.)
- Stimare \(u_t\) e \(p_t\)
- Ad-Hoc Retrieval
Okapi BM25
- Term Frequency
- Length Normalization
- Term Frequency for Queries
Modelli di Linguaggio
- Unigram Language Model

Lecture Info
Text Classification
- Naïve Bayes Classifier
  - Come stimare i parametri
  - Perché Naïve Bayes?
  - Smoothing
- Evaluating Classification
- Feature Selection
Modelli di Linguaggio
- Smoothing in Language Models
  - Dirichlet Smoothing

Lecture Info
Unigram Inverted Index
- Imports
- Data Extraction
- Pre-Processing
  - Remove Header
  - Convert Lower Case
  - Convert Numbers
  - Remove Punctuaction
  - Remove Stop Words
  - Remove Apostrophe
  - Remove Single Characters
  - Stemming
- Inverted Index Construction
- Implementation of Query Language
  - Examples
- Exercises
Positional Index
- Positional Index Construction
- Implementation of Query Language
  - Examples
- Exercises

Lecture Info
Recap: Ranking Documents
- Selection vs Sorting
- Safe vs Non-Safe Ranking
Speeding Cosine Computation
- Index Elimiation
- Champion Lists
- Query-Independent Document Scores
- Cluster Pruning
- Tiered Indexes
- Impact-ordered postings
- WAND Scoring

Lecture Info
Vector Space Model Construction with TF-IDF
Ranking with Cosine Similarity

Lecture Info
Relevance Feedback
- Algoritmo di Rocchio
Query Expansion

Lecture Info
Anchor Text
Citation Analysis
Page Rank
Hubs and Authorities