Tuesday 21 April 2009

FAST ESP Relevancy - partner briefing

Relevancy:

Relevancy is how we tune the search engine to meet end-user expectations and business needs.


Enterprises use search in multiple contexts: commerce sites, intranets, extranets, portals, etc. Each has distinct objectives, and user communities value content differently. Therefore a one size fits all relevancy a la Google or live.com will not suffice.

Recall is the ratio of the number of relevant records retrieved to the total number of relevant records in the index. Knowledge discovery or compliance applications will rate recall as being more important than precision – in other words, customers do not want to “miss” any important documents.

For example a search for “Apple computers”.

With respect to recall. All docs that mention both apple and computers should be returned.

Precision: Of all the documents recalled, only docs mentioning both apple computers and not the fruit should be returned.

RECALL is about Increasing the findability. Amplifying the target size. Linguistics help us achieve this. For a knowledge discovery or compliance solution we need to capture all possible information.

PRECISION is about Removing spurious results. Returning nothing but the truth.
Precision is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved.
In an e-commerce or e-directory environment, users prefer much more precise search results so that customers are not swamped with too many non-specific results.

In general, customers need to strike a balance between finding everything related to a query and only the documents that relate to a given query.

The truth, The whole truth [Recall] and nothing but the truth [Precision].

Use linguistic tools

Apply lemmatization to improve precision and recall;
- synonym expansion to improve recall;
- spell checking to prevent futile (0 hit) queries.
- Activate antiphrasing to remove the “noise” from the query, such as the text of the phrase “how do I”.

Search users do not all share the same business objects and in turn do not have the same search needs. We need to accomodate this when determining relevancy.

VISUALISE A GRAPHIC EQUALIZER ON AN 80s SOUND SYSTEM WE CAN INCREASE/DECREASE THE BASS FOR EACH OF THESE.

SIMILARILY WE HAVE PRESETS. THINK ROCK, JAZZ

So, Consider a Google search with a particular query term let’s say apple – Irrespective of whether you are a CEO, Finance director, a student or a researcher you will each be returned exactly the same set of results. Your intent and business objectives are ignored. It is one size-fits-all solution.

We determine the static relevancy in the index profile using rank profiles. A FAST ESP installation can have multiple rank profiles over the same set of data. Each department for instance can apply their own relevancy model to the content.

From what we have discussed tell me what type of site or application would benefit from a relevancy model like this?

And what site or application would not?

- As I mentioned there exists TWO TYPES OF RELEVANCY – STATIC RANK. This is DETERMINED AT DOCUMENT PROCESSING TIME.

- AND DYNAMIC RANK DETERMINED AT QUERY TIME.

- We can AUGMENT THE STATIC RELEVANCE WITH DYNAMIC RELEVANCY SETTING.

- we can choose to INCREMENT OR DECREMENT RELEVANCY POINTS FOR A particular DOCUMENT. For example, ANYTHING FROM CEOs OFFICE boost 100pts. ANYTING FROM FT boost 1000pts. ANYTHING CONTAINING A SPECIFIC TERM BOOST BY Xpts. Anythinkg from a particular source decrease by X points.

- SEASONAL SCHEDULES.

No comments: