# Modern Information Retrieval

## the concepts and technology behind search

### Erratas for the first printing

Page xix: "Wed 2.0" -> "Web 2.0"

Page 82: p_ir –> P_iR

Page 92: the computation of "sim(d_1,q)" should be restricted to just the termsets with minimum frequency of 2. Thus, the correct formula is as follows:
* sim (d_1,q) = (W_{d,1} x W_{d,q} + W_{ad,1} x W_{ad,q} + W_{bd,1} x W_{bd,q}) / |vec(d)_1|
* sim (d_1,q) = (2.00 x 1.00 + 3.17 x 1.58 + 2.44 x 1.22) / 7.35 = 1.35

Page 101: "The matrix D^T is the matrix of eigenvectors derived from the transpose of the document-document matrix given by M^T · M" --> "The matrix D^T is the transpose of the matrix eigenvectors derived from the document-document matrix given by M^T · M"

Page 107: Section 3.5.2 on Language Models: subsection on "Language Model based on a Bernoulli Process" should come *before* subsection on "Language Model based on a Multinomial Process"

Page 123: the second on(i,k) –> on(j,k)

Page 141: the R-precision value for R2 –> the R-precision value for q2

Page 143: Formula 4.8: instead of: $MRR(Q) = sum_{i=1}^{N_q} \frac {1} {S_{correct} ({\cal R}_i )$ it should be: $MRR(Q) = \frac{1}{N_q} sum_{i=1}^{N_q} \frac {1} {S_{correct} ( {\cal R}_i )$

Page 171: "turkey" -> "turker" (twice)

Page 192: Cv_ym –> since Su and Sv is of the same length, is it Cy_yn?

Page 192: s_u = (c_{u,x1} , s_{u, x2} ...s_{u, xn} ) --> s_u = (c_{u,x1} , c_{u, x2} ...c_{u, xn} )

Page 196: defined by equation 5.10 –> defined by equation 5.12

Page 199: only the lower frequency documents –> only the lower frequency terms

Page 202: "used to estimate an initial query using relevance feedback techniques" --> "used to estimate an expanded query using relevance feedback techniques"

Page 223: Notice that very similar documents will have a similarity value close to 0 while very different documents will have similarity close to 1 –> 1,0

Page 315: Because the number of classifiers increases exponentially with the number of classes –> quadratically

Page 318: whose weights are the error estimates of each classifier –> whose weights depend on the error estimates of each classifier

Page 319: The class that receives the highest sum of weights –> the highest score

Page 323: the maximum term information –> the maximum term mutual information

Page 343: The full inversion –> Addressing words

Page 345: "being n the text collection size" --> "being n the vocabulary size"

Page 347: "explained at the end of section 9.2.3" --> "explained before in this subsection".

Page 349, rows 10 and -6: Figure 3.3 --> Table 3.3

Page 353: "next takes O(n/M) I/O time" --> "next takes O(n) I/O time"

Page 356: q=1 and r=1 –> q=1 and r=3

Page 371: "left child" --> "right child"

Page 377: "the i-th bit set" --> "see Figure 9.20 where B is complemented"

Page 381: We do in line(6) –> line(8)

Page 460: Each search cluster … –> Change "In this figure, we show an index partitioned into n clusters with m replicas." by "In this figure, we show an indexpartitioned into m servers forming a cluster with n replicas."

Page 461: Fig. 11.7 "n replicas of the whole index" --> "m"

Page 471: H(b) --> H(p)

Page 479: "90% percent" --> "90%"

Page 489: WEST and EAST are switched in Figure 11.12

Page 491: "SiteMonkey" -> "SearchMonkey"

Page 513: "Lui" --> "Liu"

Page 517: "to to" --> "to"

Page 519: "It is critical than…" --> "It is critical that..."

Page 521: "by issuing a query" --> "or by issuing a query"

Page 523: Figure 12.6: "there should be a wait icon besides S3"

Page 525: "of they way Web..." --> "of the way Web..."

Page 530: "A possible predictor are..." --> "A possible predictor is..."

Page 576: "red" --> "red wine"

Page 637: to build tress or forests –> trees

Page 742: where is (g)? –> Disregard label (g)

Page 751: Test D –> A.4.4 Test D