IR - Lecture Notes Summary


  • Lecture info

  • Classical Search Model

    • Precision and Recall

  • Boolean Retrieval Model

    • Boolean-Query Problem

    • Term-Doc Incidence Matrix

    • Inverted-Index

      • Stages of text-processing

      • Index Construction

      • Query Processing

  • Final observations

  • Lecture Info

  • Phrase Queries

    • Biwords Indexes

    • Positional Indexes

    • Combination Schemes

    • Extended Biwords

    • Statistical Approach

  • Index Construction

    • Dataset Reuters RCV1

    • Hardware basics

    • BSBI

    • SPIMI

  • Distributed Indexing

  • Lecture Info

  • Distributed Indexing

    • Types of Tasks

    • TF Computation

    • Index Transformation

    • MapReduce

  • Dynamic Indexing

    • Aux and Main

    • Logarithmic Merge

      • Pseudo-Code

      • Complexity

  • Final Observations

  • Lecture Info

  • Compression in IR

    • Lossless vs lossy compression

    • Empirical laws

      • Heap's Law

      • Zipf's Law

  • Dictionary Compression

    • Dictionary-as-a-String

    • Blocking

      • Dictionary Search

    • Front coding

  • Postings Compression

    • Grap Compression

  • Lecture Info

  • Bit Codes

    • Unary Code

    • Gamma Code

  • VB Codes

  • VSEncoding

  • Lecture Info

  • Problems with Boolean Search

  • Ranking Retrieval

    • Guidelines on How to Rank

    • Jaccard Coefficient

    • Term-Frequency

    • tf-matching-score

    • Document Frequency

    • \(\text{tf-idf}\) Weighting

  • Weight Matrix Usage

    • Vector Space Model

  • Lecture Info

  • Cosine Similarity

    • Normalization

    • Basic Computation

    • How is it Actually Computed?

  • \(\text{tf-idf}\) Variants

  • Probabilistic Approach

    • Document Ranking Problem

    • Probabilistic Ranking Principle (PRP)

    • Binary Indipendence Model (BIM)

      • Ranking Function

      • Estimation

  • Lecture Info

  • Valutare un Sistema di IR

  • Labeled Document Collections (Gold Standard)

  • Precision and Recall

    • Combinare Precision e Recall

  • Rank-Based Measures

    • Precision@K (P@K)

    • Mean Average Precision

  • Beyond Binary Relevance

    • Discounted Cumulative Gain (DCG)

    • Mean Reciprocal Rank

  • User Behavior

  • Lecture Info

  • Probability Ranking (Cont.)

    • Stimare \(u_t\) e \(p_t\)

    • Ad-Hoc Retrieval

  • Okapi BM25

    • Term Frequency

    • Length Normalization

    • Term Frequency for Queries

  • Modelli di Linguaggio

    • Unigram Language Model

  • Lecture Info

  • Text Classification

    • Naïve Bayes Classifier

      • Come stimare i parametri

      • Perché Naïve Bayes?

      • Smoothing

    • Evaluating Classification

    • Feature Selection

  • Modelli di Linguaggio

    • Smoothing in Language Models

      • Dirichlet Smoothing

  • Lecture Info

  • Unigram Inverted Index

    • Imports

    • Data Extraction

    • Pre-Processing

      • Remove Header

      • Convert Lower Case

      • Convert Numbers

      • Remove Punctuaction

      • Remove Stop Words

      • Remove Apostrophe

      • Remove Single Characters

      • Stemming

    • Inverted Index Construction

    • Implementation of Query Language

      • Examples

    • Exercises

  • Positional Index

    • Positional Index Construction

    • Implementation of Query Language

      • Examples

    • Exercises

  • Lecture Info

  • Recap: Ranking Documents

    • Selection vs Sorting

    • Safe vs Non-Safe Ranking

  • Speeding Cosine Computation

    • Index Elimiation

    • Champion Lists

    • Query-Independent Document Scores

    • Cluster Pruning

    • Tiered Indexes

    • Impact-ordered postings

    • WAND Scoring

  • Lecture Info

  • Vector Space Model Construction with TF-IDF

  • Ranking with Cosine Similarity

  • Lecture Info

  • Relevance Feedback

    • Algoritmo di Rocchio

  • Query Expansion

  • Lecture Info

  • Anchor Text

  • Citation Analysis

  • Page Rank

  • Hubs and Authorities