# CS726 Assignment No 1 Solution IDEA Spring 2019

CS726 Assignment No 1 for Spring 2019 is comprised of two question. Solution Idea for CS726 Assignment will show you exact same examples. Solutions can easily be made using these examples. These question are for computing Computing TF-IDF — An Example and Computing Cosine similarity between Given Term frequencies and weighted queries.

#### Cs726 Assignment No 1 Questions Spring 2019

Question# 01 (30 marks) Consider a document given below containing terms and their associated frequencies: extracted (350), recited (200), plagiarized (100) The document belongs to collection of 50,000 documents and document frequencies of these terms are: extracted (50), recited (1300), plagiarized (250) Compute the following: 1.

Normalized TF 2. IDF 3. TF-IDF Note: Please use log base 2 only.

Question# 02 (20 marks) Consider the Term Frequency and weighted query terms: better late than never (BLN), bite the bullet (BB) and beat around the bush (BAB)

Compute the following: 1. Cosine similarity between BLN-BBWT ; BB-BABWT 2. Inner product between BB-BBWT

#### CS726 Assignment No 1 Solution Idea Spring 2019

Q.No 1 Example (Lecture slides No 5)

Computing TF-IDF — An Example

Given a document containing terms with given frequencies:

A(3), B(2), C(1)

Assume collection contains 10,000 documents and

document frequencies of these terms are:

A(50), B(1300), C(250)

Then:

A:  tf = 3/3;  idf = log2(10000/50) = 7.6;     tf-idf = tf*idf=1*7.6=7.6

B:  tf = 2/3;  idf = log2 (10000/1300) = 2.9; tf-idf = 2.0

C:  tf = 1/3;  idf = log2 (10000/250) = 5.3;   tf-idf = 1.8

Q. No 2 Solution IDEA

#### Author: Habibullah Qamar

Its me Habib Ullah Qamar working as a Lecturer (Computer Sciences) in Pakistan. I have an MS(M.Phil) degree in computer sciences with specialization in software engineering from Virtual University of Pakistan Lahore. I have an experience of more than 15 years in the filed of Computer Science as a teacher. Blog Writing is my passion. I have many blogs, This one is special made with the aim of providing 100% Free online coaching and training to the students of under-graduate and postgraduate classes. Most of the students enrolled in computer sciences, information technology, software engineering and related disciplines find it difficult to understand core concepts of programming and office automation. They find difficult in understanding and solving their assignments.