Tuesday 14 April 2009

FAST ESP ARCHITECTURE

I’ll do two iterations of this diagram, so it’ll really sink in – one with my Techie hat on and the other with my Sales hat on.

  1. Capture Content.
    So, How do we capture the content?
    - using our pre-developed connectors.
    - Seamlessly Connects and Communicates with ALL known data stores - Applications, mainframe, databases, email, CMS, web, SharePoint .
    - Support for 370 different file types – including multimedia.
    - we find Most people save their information in Excel, PowerPoint and Word – What we call Unstructured Data.
    - User's resist the extra work required to add meta-data which helps make it more “findable”.


    2. Enrich Content
    So, What can we do to this data to make it more findable then?
    - Pass the content through a Document Processing Pipeline (think assembly line at – 170 stages out-of-the box – can customise & develop our own) where it Cleansed, Normalised & Enriched – Entities Extracted for later navigation.
    - EXTRACTED ENTITIES ARE USED AS A TABLE OF CONTENTS OF SORTS.
  2. - By Cleasned I mean ensuring it is consistent & removing formatting noise.
    - By Normalised I mean that duplicates are removed.
    - By Enriched ...adding metadata, conducting linguistic analysis (spelling correction, synonym expansion, lemmatization), identifying key entities (people, places), identifying keywords and concepts that will be used later for navigation.

    Why is this important?
    - User queries are typically one or two terms in length and fairly generic.
    – low quality content and poor queries = bad hits.
    – we want to AMPLIFY the TARGET SIZE. Grow the famous needle in the haystack.

    3. Create Index
    From our newly refined data we generate our index.
    – allows for simple search in sub-second speed irrespective of documents actual location.
    - We can setup Alerts that are triggered when a particular document enters the index.

- Users could monitor a particular topic or development within the organisation.

4. Query Submission
If we now take it from the user perspective.
- the users can dispatches a single search via a single search box over all the onformation in the organisation.

- Our aim is to match users' intentions against content not simply returning documents that contain the keywords they typed.

For example, for the query: “FAST SEARCH SOLUTION”.

- Is FAST more important than search?

- Does the query contain a phrase e.g. "FAST SEARCH"?

- Are the query terms linked by an AND or an OR?

- Is an exact match required or should we be looking at variants - searches/searching/searched.

For example, for the query: Where can I find information on FAST?

-Is it important to distinguish FAST the company from FAST the adjective?

-Should entries in uppercase be interpreted as acronyms or stock symbols?

-What type of information is need? technical data, product reviews, office locations?

5.a. Query Processing.
In general we find, users tend to submit short 1 or 2 word queries.
- The right result requires the right question be asked.

- Users can be subjective. I may broker and you might say agent. I may say biggest and you may say largest.

-we help inexperienced users effortlessly create these intelligent queries.

- Again, to AMPLIFY the TARGET SIZE we Convert, Parse & Expand the query.
- Convert: Spell Checking, Term Weighting.
- Parse: Remove stop words.
- Expansion: Adding relevant Synonyms, Lemmatization.
– Auto handled by the system removing the need for complex boolean queries.

5.b. Result Processing.
The results are then processed for relevancy.
- Results are merged from different sources.
– Ranked according to the relevant ranking model for their team, region or indivual preference
- we can also apply Boosting: to ensure a particular document is always returned in the top results.
- and Filtering: to exclude a particular document from the results page.

6. Displaying Results.
- Results are then displayed along with Navigators, Taxonomies & Drill Down Options specific to the result set.

ENTITY EXTRACTION.

- During refinement we have extracted recognisable data from the content such as e-mail addresses, companies, people, locations and provide drill-down options.

- allow users to discern relevant from irrelevant/ useful from useless information.

- allows users to take immediate action.

No comments: