Latent semantic indexing


Latent Semantic Indexing

Understanding latent semantic indexing is quite

complex and usually requires a degree in math in order

to figure out and understand.

There are a few methods that can be used in order to

index and retrieve all the relevant pages of the users

query.

The obvious method of retrieving the relevant pages is

by matching words from a search query to the same text

found within the web pages that are available.

The problem with simple word matching is that they are

extremely inaccurate. This is because there are so

many ways for a user to express the desired concept,

which they are looking for.

This is known as synonymy. This also happens because

many words have multiple meanings. This is known as

polysemy.

With synonymy, the user’s query may now actually match

the text on the relevant pages. They will be

overlooked and the problem of polysymy means the terms

in a user’s query will often match terms in irrelevant

pages.

Latent semantic indexing, or LSI is an attempt to

overcome this problem. By looking at the patterns of

words distributed across the entire web.

Pages are considered that have many words in common

and are thought to be close in semantically close in

meaning.

Pages that contain a few words in common are

semantically distant. The result is a relatively

accurate and similar value that has calculated for

every content word or phrase.

In response to a query, the LSI database will return

pages it thinks to be correct and relevant to the

query’s search.

he LSI algorithm doesn’t understand anything about

word meanings and does not require an exact match to

return useful web pages.