(Solved) : W1 Suppose Document Collection Extremely Small Vocabulary 6 Words W1 W2 W6 Following Tabl Q42764150 . . .

W1 Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, W2, ..., W6. The following

W1 Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, W2, …, W6. The following table shows the estimated background language model p(WC) using the whole collection of documents (2nd column) and the word counts for document d1 (3rd column) and d2 (4th column), where c(w,di) is the count of word w in document dį. Let Q = {w1, W2, W3, W4, W5, W6} be a query. | Word | P(wC) c(w,d1) | c(w, d2) || 0.800 2 W2 0.100 3 W3 0.025 1 0.025 2 0.025 0 W6 0.025 (1) Suppose we do not smooth the language model for dį and d2. Compute the likeli- hood of the query for both dy and d2, i.e., p(Q]dı) and p(Q|d2) (Do not compute the log-likelihood. You should use the scientific notation (e.g., 0.0061 should be 6.1 x 10–3) Which document would be ranked higher? W4 W5 0 (2) Suppose we now smooth the language model for dį and d2 using the Jelinek-Mercer smoothing method with 1 = 0.8 (i.e., p(w|d) = •Pmle (w|Md)+(1-1). Pmle (W|M.)). Recompute the likelihood of the query for both dị and d2, i.e., p(Q|dı) and p(Q|d2) (Do not compute the log-likelihood. You should use the scientific notation) Which document would be ranked higher? Show transcribed image text W1 Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, W2, …, W6. The following table shows the estimated background language model p(WC) using the whole collection of documents (2nd column) and the word counts for document d1 (3rd column) and d2 (4th column), where c(w,di) is the count of word w in document dį. Let Q = {w1, W2, W3, W4, W5, W6} be a query. | Word | P(wC) c(w,d1) | c(w, d2) || 0.800 2 W2 0.100 3 W3 0.025 1 0.025 2 0.025 0 W6 0.025 (1) Suppose we do not smooth the language model for dį and d2. Compute the likeli- hood of the query for both dy and d2, i.e., p(Q]dı) and p(Q|d2) (Do not compute the log-likelihood. You should use the scientific notation (e.g., 0.0061 should be 6.1 x 10–3) Which document would be ranked higher? W4 W5 0 (2) Suppose we now smooth the language model for dį and d2 using the Jelinek-Mercer smoothing method with 1 = 0.8 (i.e., p(w|d) = •Pmle (w|Md)+(1-1). Pmle (W|M.)). Recompute the likelihood of the query for both dị and d2, i.e., p(Q|dı) and p(Q|d2) (Do not compute the log-likelihood. You should use the scientific notation) Which document would be ranked higher?

Expert Answer


Answer to W1 Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, W2, …, W6. The follo…

Leave a Comment

About

We are the best freelance writing portal. Looking for online writing, editing or proofreading jobs? We have plenty of writing assignments to handle.

Quick Links

Browse Solutions

Place Order

About Us

× How can I help you?