CS726 Assignment No 1 Solution IDEA Spring 2019

CS726 Assignment No 1 for Spring 2019 is comprised of two question. Solution Idea for CS726 Assignment will show you exact same examples. Solutions can easily be made using these examples. These question are for computing Computing TF-IDF — An Example and Computing Cosine similarity between Given Term frequencies and weighted queries.

Cs726 Assignment No 1 Questions Spring 2019

Question# 01 (30 marks) Consider a document given below containing terms and their associated frequencies: extracted (350), recited (200), plagiarized (100) The document belongs to collection of 50,000 documents and document frequencies of these terms are: extracted (50), recited (1300), plagiarized (250) Compute the following: 1.

Normalized TF 2. IDF 3. TF-IDF Note: Please use log base 2 only.

Question# 02 (20 marks) Consider the Term Frequency and weighted query terms: better late than never (BLN), bite the bullet (BB) and beat around the bush (BAB)

Compute the following: 1. Cosine similarity between BLN-BBWT ; BB-BABWT 2. Inner product between BB-BBWT

Q.No 1 Example (Lecture slides No 5)

Computing TF-IDF — An Example

Given a document containing terms with given frequencies:

    A(3), B(2), C(1)

Assume collection contains 10,000 documents and

document frequencies of these terms are:

    A(50), B(1300), C(250)


A:  tf = 3/3;  idf = log2(10000/50) = 7.6;     tf-idf = tf*idf=1*7.6=7.6

B:  tf = 2/3;  idf = log2 (10000/1300) = 2.9; tf-idf = 2.0

C:  tf = 1/3;  idf = log2 (10000/250) = 5.3;   tf-idf = 1.8

Q. No 2 Solution IDEA

Author: Habibullah Qamar

