Saturday 31 January 2009

Google vs. FAST relevancy

Relevancy Introduction:

Relevancy is the measure of how well a set of results answers or addresses the intent of a given query.

Relevancy is the balancing of: The truth, The whole truth [Recall - all documents related to the query terms] and nothing but the truth[Precision - only those docs related to the query terms].

In general, customers need to strike a balance between finding everything related to a query and only the documents that relate to a given query.

Precision is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved.

In an e-commerce or e-directory environment, users prefer much more precise search results so that customers are not swamped with too many non-specific results.

Recall is the ratio of the number of relevant records retrieved to the total number of relevant records in the index. knowledge discovery or compliance applications will rate recall as being more important than precision – in other words, customers do not want to “miss” any important documents.

GOOGLE vs. FAST Relevancy

The mechanism underpinning Google's relevancy model is called Page Rank. This is a formula developed by Google to determine a web page's "inbound link ranking". That is the number of web pages that link to that page. Or, to put it in other words, the number of times that page has been cited. The purpose of page rank is to measure a page's relative importance within a set of pages.

In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote.

Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important".

This algorithm works well within a network of hyper linked documents, such as the web . However, within the enterprise we do not solely have web documents, and the documents are rarely if never linked. Therefore, within the firewall Google cannot lean on its page rank algorithm which set it apart from other search engines.

FAST's enterprise search relevancy has needed to evolved beyond Web relevancy techniques. The ranking models are now based on multi-faceted quality measurements such as - context, freshness, completeness, authority, statistics, quality and geography. Each of these dimensions can be augmented by the search business manager. For example for company news search we may want to give a greater relative weighting to the freshness of a document. Where as, with the IT intranet search we may give a greater weighting to the authority of an article, how many times it has been cited or viewed.

Google offer a closed black-box approach to relevancy. However, relevancy is not one-size-fits-all it needs to be contextualised.

Who are the users? What are they trying to achieve? What are they interested in? What are they not interested in?

FAST provides an open flexible relevancy model. Out of the box a set of pre-defined relevancy model profiles are available that align with specific uses or audiences – site search, news, shopping, self-service, market intelligence, surveillance, etc. From these starting sets the relevancy profiles can be tuned based on user feedback or query log reporting.

A convenient way to understand the importance of relevancy models is to visualize a graphic equalizer on an audio system, which has pre-sets for audio environments such as concert hall, car, home, classical, and rock, for example.

It also allows for individual adjustment to meet the needs of the listener.

Similarly, FAST provides pre-set relevancy models where each of the parameters can be independently adjusted, and a change in one does not affect the others.

Four Tips for Improving Relevancy:

  1. Understand the rank profiles – by understanding the user base. What are their objectives? What are their interests? What sources do they use most often? Which of the six dimensions do they value the greatest?

  1. Augment the rank models – search business manager can alter the rank calculation assigned to documents for given queries. This can be based on user feedback, query log reporting, seasonal changes.

  1. Use linguistic tools
    1. Apply lemmatization to improve precision and recall. Go is expanded to going, gone and also went, unlike stemming which fails to capture went.
    2. Synonym expansion to improve recall. Flat is expanded to apartment, studio, condo.
    3. Abbreviation expansion to improve recall. ST is expanded to street, RD is expanded to road.
    4. Acronym expansion to improve recall. U.S.A. Is expanded to united states of America.
    5. Spell checking to prevent futile queries - zero hits.
    6. Activate antiphrasing to remove the “noise” from the query, such as the text of the phrase “how do I”.
    7. Custom vocabularies to increase recall allow for short cuts. List of company specific terms used to generate a dictionary.

  1. Relative query boosting allows the promotion of ranking score to ensure a particular document is always displayed.

  1. Test, measure and refine – use a “golden set” of well-known documents and queries to test and tune relevancy. Providers should use at least 2,000 documents and more than 50 queries.

  1. Entity extraction to unstructured data and dynamic drill-down to structured content

This allows users determine what is relevant to them and gives them hints - e.g. price, rating, availability

Friday 30 January 2009

FAST & Google - The Property Agent

Comparing FAST and Google Search Appliance is like comparing is like comparing Oracle Databases to Microsoft Access. ok . And I will tell you why. FAST not only returns a black box relevancy we provide an open platform that gives you the full flexibility to connect to limitless repositories, enrich content as it is fed into the engine, manage and define the entire search experience for multiple groups of users. Furthermore, because FAST ESP is a true platform we can create multi-purpose applications and not just plug in a point solution.

In saying that, it is possible to keep it simple as Google. but on top of this, we provide richer and more comprehensive management tools and also a richer and more effective user experience.

So we have determined and implement what is required to provide a Google like experience. Because let's face it, we all use Google everyday and it works well. However, we have further innovated beyond this point to what we call Search 2.0, how can we improve over a simply dumping a list of text links in front of the user.

To illustrate with an analogy, let's pretend there is a property agency called Google properties. They are a brick and mortar property agency, but they operate in a similar way to the Google search engine.

So, we call into this agency in search for information on a properties. So we go in and ask them for information on all the properties in a particular area, lets say Hampsted. Now, rather than asking any questions to disambiguate what exactly you mean by properties in Hampsted they say, I just have what you are looking for and pull out a several thousand page directory listing everything they know on properties in Hampsted and dumping it on the table in front of you without asking any further questions. They then ask you to flick through the different pages to find what you are looking for.

This is the search box, keyword, text link paradigm made famous by our good friends at Google.

Now, apply the FAST ESP model to this scenario. In this case the agent is the FAST search engine:

The agent firstly disambiguates the customer's query by breaking it into different categories.

Is it property for purchase, rent or to sell? By asking this question the agent/engine can quickly narrow the possible results by discarding irrelevant categories. This enables the customer to start understanding what exactly it is she requires because we do not always know that as we commence the query. By suggesting information through tag clouds and category dividers we can guide the user down the path to answers.

Let us now say that the customer elects to purchase property. We can now narrow the result set and utilize dynamically generated facets to further guide the customer. In this example, we could ask the customer questions such as:

What is your budget?

What type of property - apartment, bungalow, house?

How many bedrooms?

How many bathrooms?

And so on.

These questions are essentially the guided navigators that the FAST engine provides alongside the results set. They act as questions that further refine the result set as the user steps through.

Furthermore, because FAST can interact with the content as it is being ingested we can begin to extract the key concepts, words and entities mentioned in the document. At results processing time we can present these items extracted from the underlying result set alongside the list of links. They essentially act as a "table of contents" over the results set. They mention the key elements of the under lying results and selecting them direct consumers to the relevant content. This is useful when the user is 'Discovering' an area for the first time. The extracted entities and words give a quick overview of what areas pertain to the query term.

Thursday 29 January 2009

German Jokes

Knock, knock.
Who's there?
The police. I'm afraid there's been an accident. Your husband is in hospital.

A man walks into a pub.
He is an alcoholic whose drink problem is destroying his family.

Did you hear about the blonde who jumped out off a bridge?
She was clinically depressed and took her own life because of her terribly low self-esteem.

What do you call a cat with no tail?
A Manx cat.

Why do undertakers wear ties?
Because their profession is very serious, and it is important that their appearance has a degree of gravitas.

How many electricians does it take to change a light bulb?
One.

Why do women fake orgasms?
Because they want to give men the impression that they have climaxed.

Two men are sitting in a pub.
One man turns to the other and says: 'Last night I saw lots of strange men coming in and out of your wife's house.'
The other man replies: 'Yes, she has become a prostitute to subsidise her drug habit.'

Two cows are in a field. Suddenly, from behind a bush, a rabbit leaps out and runs away.
One cow looks round a bit, eats some grass and then wanders off. (PARAMOUNT COMEDY AD)

Why are there no aspirin in the jungle?
Because it would not be financially viable to attempt to sell pharmaceuticals in the largely unpopulated rainforest.

What do you get when you cross a chicken with a centipede?
A media circus about the debate over the morals and ethics of genetic engineering.

So, there were an Irishman, an Englishman and an American wrecked on an island. One day, they found a bottle, and when they opened it, a ghost came out and offered them each a wish. However, even though they wished for different stuff, nothing happened, as the three guys of varying nationalities were just having shared hallucinations from hunger.

How do you drown a blonde?
Hold her head underwater until she can no longer breathe and stops struggling.

Why did the blonde get fired from the M&M factory?
Repeated absences and stealing.

A black man is going to get a vasectomy. He shows up to the doctor's office wearing a suit. The doctor says, "Why are you wearing a suit?" The black man says, "I just got back from a funeral"

What do you say to a woman with two black eyes?
"Would you like an ice pack?"

A duck walks into a bar.
Animal control is promptly called, the duck is then taken to a near by park and released.

Why did the deaf man take his parrot to work?
He was weird.

A Blonde and a Brunette jump off a tall building at the same time. Who hits the ground first?
Both of them hit the ground at the same time. Hair colour doesn't affect acceleration due to gravity.

What's worse then finding a worm in your apple?
The Holocaust.

A man walks into a whorehouse and pays a prostitute for sex. He contracts an STD and passes it onto his pregnant wife. Their child is born deformed and has a difficult life.
When asked if he could see the humour in the situation, the child replied, "No. No, I don't."

A man called a lawyer and asked, "How much will you charge me to answer three questions?"
The lawyer said "$400."
"Wow," said the man. "Isn't that a lot?"
"I guess so," said the lawyer. "When are you going to ask your questions?"

How do you know when a Frenchman has been near your house?
You don't, really, unless you were there to see him or if one of your neighbors saw him. I wouldn't worry about it, really.

Three men are at the FBI Building for a job interview. The interviewing FBI agent tells the first man, 'To be in the FBI you must be loyal, dedicated, and give us your all. Your wife is in the next room. I want you to go in there and shoot her with this gun.'
The man takes the gun, hesitates, and says, 'Sorry, I can't do it.'
The next interviewee enters the office and the agent tells him the same thing he told the first guy. The second man takes the gun, walks into the room, and walks out. 'Sorry, I can't.' he says.
The last man enters the office and the interviewer said yet again explains the test.' The man says "I'm sorry I love my wife too much to do such a harmful thing, I guess the FBI is not for me after all."

What's sad about 4 black people in a Cadillac going over a cliff?
They were my friends.

Why did the chicken cross the road?
Earlier that morning the farmers daughter had inadvertently left the gate to the yard open as she was preoccupied by her worry over a maths test set for that day. She hadn't studied for the test as she was still deeply distressed over her fathers recent heart attack. This, coupled with the added burden of household chores now delegated to her because her mother was out trying to get the west field prepared for sowing, had made her quite forgetful and distracted of late.
Whilst several chickens escaped, only one strayed so far that it actually encountered the road facing the farm. After crossing the road and gorging itself in a soybean crop, the chicken was struck by a furniture removers van as it attempted to make its way home.
Several hours later the dead chicken was spotted by a Community Mental Health Worker who was doing his bi-weekly rural clinic run. The chicken, being a bantam caught the eye of the Mental Health worker, who was a keen trout fisherman.
"Cool" thought the mental health worker- "those feathers will make for excellent trout flies". He stopped and plucked a handful of the most iridescent blue, green and orange feathers and placed them in an envelope. He rolled himself a cigarette, sat on the trunk of his car and admired the clouds. "God, I love this job", he muttered to no one in particular.

Satan takes the form of Jesus and appears to three priests saying that if they do something evil, he'll let them drink of the holy water.
The three priests discuss the offer and come to the conclusion that Satan must be tricking them into committing sin. When confronted with this accusation, Satan reveals his dastardly plot and salutes the priests on their cunning and steadfast faith.

Why couldn't Helen Keller drive?
Because she was blind and deaf.

The Pope walks into a bar. The bartender says, what'll ya have, Pope? But the Pope's grasp of English is tenuous at best, so he mumbles something in Latin. The bartender doesn't know any Latin. The Pope gets frustrated and leaves.

Have you seen Stevie Wonder's new house?
No.
Well, it's really nice.

Where did Hitler keep his armies?
The brunt of his forces were applied to the Eastern front, but throughout different periods of the war, a sizable chunk were used to protect the Atlantic Wall and a handful of divisions were used in Africa, to secure shipping routes.

A kid is riding down the street when his chain pops off his bicycle. The kid yells "God damn!" as he begins to fix it. A priest walking nearby overhears the boy taking god's name in vein and says "Don't say 'God damn' say 'God help us'".
The kid says, "I am an atheist, get away from me".

What's the difference between a Jew and a pizza?
A Jew is a person adhering to the Jewish faith and a pizza is an oven-baked, flat, usually circular bread covered with tomato sauce and cheese with optional garnishes.

Monday 26 January 2009

Personal Brand

As Tom Peters from FAST Company puts it:

We are CEOs of our own companies: Me Inc. To be in business today, our most important job is to be chief marketer for the brand called "You".


So what is your brand slogan?


Start by identifying the qualities or characteristics that make you distinctive from your competitors.

What have you done lately to make yourself stand out?

What would your colleagues or your customers say is your greatest and clearest strength?

Your most noteworthy (as in, worthy of note) personal trait?

So what is the "feature-benefit model" that the brand called You offers?

What do I do that I am most proud of?

What have I accomplished that I can unabashedly brag about?


If you're going to be a brand, you've got to become relentlessly focused on what you do that adds value, that you're proud of, and most important, that you can shamelessly take credit for.


The key to any personal branding campaign is "word-of-mouth marketing."

Your network of friends, colleagues, clients, and customers is the most important marketing vehicle you've got;


You need to think, how can I leave an impression at every point of contact.


It's influence power.


The way I like to illustrate it is as Personal Stocks.


Stock of personal brand increases/decreases depending on the following:


Thinking outside the box: Exhibiting creativity and solution focus to challenging problems.


Organisational Skills: Functional skills, preparing effectively – for example for a meeting: who are attendees? what is their lens? How do I avoid calendar conflicts? Use To Do lists. Organization notes.


Communication: Using power phrases, using examples, painting analogies, utilising similes, using personification , using control phrases - Right. OK? You See? Exactly Right? Correct!


Knowledge: Demonstrate that you are well-read and knowledgeable on topics;

Monday 19 January 2009

How do you bring in customers?

How do you bring in customers?

  1. Busy gas stations do it by selling petrol at a loss, so customers could buy stuff at their convenience stores.
  2. Busy supermarkets do it by selling milk at a loss, so customer could buy their more profitable items.

The strategy:

  1. Sell an enticing item at a ridiculously low price (either where you're selling at a loss or no gross margin).
  2. Bring in customers to sell more profitable items.

Just copy how american electronics firm Fry's or IKEA promotes its Stores.

Frequent Fry's/ IKEA Electronics promotions:

  1. Hot dogs and Coke for £1.
  2. Ice-cream for 25p.

End Result:

Greater exposure to their customers = Customers have greater exposure to their products = Customers happy after getting a cheap hot dog, feel they have saved so are willing to spend

The Golden Belief of Marketing:

  • The more exposure you have to your target market the more you will sell.

How should we do it?

Creativity thrives here. Examples:

  1. Sell at a loss, to upsell more profitable items.
  2. Sell at a loss, to generate a recurring customer.
  3. Offer a "basic" version at a loss, to sell a more premium version.

Wednesday 14 January 2009

What do I need to know after meeting with every prospect? Qualification

[A Shop Of ]
  • 1: Their History. Where are they coming from? How did they get here? What do they know about your and your firm? What dealings have taken place in the past?

  • 2: Frames of Reference. What ideologies and situations might affect their decision-making? Do they have a certain way of viewing your offering? How do they feel about their own firm?

  • 3: Pains and Objectives. What is the compelling reason to act? Why is the pain so great that they cannot choose to wait? Where do they want to go? How do they expect to feel when they get there? How do they think they’re going to get there? What do they think will prevent it?

  • 4: Likely Objections. What is going to cause them to balk? How fervently do the believe in that objection? How real is it? Might it block the deal, no matter what you say or do?

  • 5: Capacity to Act. Are you communicating with decision-makers or seat-warmers? If decision-makers, what decision do you want them to make? If not, why are you talking to them?

  • 6: Decision-making Style. If they’re decision-makers, how do they make decisions? Are they all about facts and figures? Or do they decide according to a gut feeling?


"Who is the actual final decision maker."
"What is the decision making process"
"Who will be involved in making the decision"
"What one or two critical factors (or People) will play a role in making the final decision?"

Why do new leaders fail?

New leaders know they must prove themselves right out of the gate. But in pursuing quick wins, they often fall into traps that undermine success, say Van Buren and Safferstone. For example, a new leader might:
  • Focus too much on details
  • React negatively to criticism
  • Intimidate others
  • Jump to conclusions about how best to solve particular problems
  • Micromanage employees

One new call center supervisor began micromanaging employees in a bid to improve their first-call-issue-resolution rate. Her style made them feel stifled and underappreciated. Within five months, the rate dropped 15 percent.

Tuesday 13 January 2009

The Indexing Subsystem.

The first indexing dispatcher to register with the Name Service becomes active. The Name Service guarantees that only one indexing dispatcher succeeds. The backup dispatchers monitor the active one, stepping in to take over if it dies. Periodically, the backup indexers connect to the master and ping it.

Incoming operations are rewritten to basic I/O operations

e.g., invalidate file X @ position Y.

e.g., blacklist doc X in index Y_Z.

Index and Search nodes are arranged in a matrix. Each column holds a subset of the content to distribute the load. This allows scale for volume and indexing performance.

Every row holds a replica of the full content of the index which enables an increased number of queries per second.
Rows able to replay operations internally to re-establish synchronization after downtime. Multiple indexing rows add indexer resilience. Rows are in sync both with respect to content and indices.

The column master is elected at run-time. If the master fails, a new master is elected.

The column master synchronizes content operations and indexing to all of its backups.

During indexing of new content there will be 2 indices,

1 that houses the active index against which the search service searches. And.

1 that houses the incremenatal index that is being built from the newly added content. This is added to active index in batches.

This needs to be as large as the active index to allow for scenarios when we need to reset the entire index.

The active index will be divided into 3 partitions - 0, 1 and 2. These vary in ascending size from 25% to 50% to 100%. As content is added, the indexer service will send it to the smallest of the partitions, partition 0. When this reaches maximum capacity the index dispatcher is copied to the next largest partition 1. And so on.

The advantage of this, given that all new content is send to partition 0, is that we can now quickly re-build the smaller index partition.

The column master ensures that column contents are always in sync. By copying first the FIXML to all backup nodes and then copying the index when it has been created on the master.

A new index is not activated until the index is copied to all rows. That is, the new index is not activated until all columns have the same content.

The master indexer is the only one to receive operations and initiate indexing, synchronizing.

Currently, all search controllers connect to the master indexer for guidance. Only the master indexer builds indices. Backups only store the FIXML. If the master fails the crown is passed to a backup, which assumes the master role.

A failover will require the index to be rebuilt from FIXML. This may take several minutes to a couple of hours depending on the volume of content. During this time indexing of new content is not possible. However, search is uninterrupted.

Processing subsystem.

Multiple processor servers provide resilience and throughput.
Multiple content distributors provide for resilience and throughput.

FAST ESP Fault Tolerance

FAST can provide a proprietary active – semi-active fault tolerance within a single cluster with respect to search. With a 2 row architecture.

In this scenario, we will have two nodes – A & B – node A will host all services while node B will hold a search service, indexer service and Query/Results service. Node B can also hold additional document processing services to balance the load and increase performance.

Because both nodes hold search, indexer and Query/Results services, search is available on both nodes. This is managed by a built-in software load balancer.

This model allows us provide active – active search capabilities on both nodes in case of node failure. That is, there is search fail over. This is what is referred to as, a 2 row architecture.

A 2 column architecture would split the content across the nodes with 50% in each. This would be beneficial if there were very high volumes of data but would not provide redundancy.

With respect to content and indexing.

Node A gathers content, processes this content and uses its own indexer services to build the index.

Content dispatchers write the post processed content called FIXML (FAST Index XML) to the Node A indexer services from which the Node A Index is generated. Concurrently, this FIXML is dispatched to node B. Here node B holds the FIXML but does not yet create its own replica of the index unless the master fails.

This process is continuous to ensure both hosts' FIXML are kept in sync. If Node A fails, we can then generate the index from the Node B's FIXML and vice versa.

This process can take anywhere from minutes to several hours depending on the index size. During re-generation of Node A's index, we can still continue to serve searches from the Index Node B.

N.B. The crawler is a single point of failure. If the node containing the crawler fails, we will not be able to add new content to the index until it is brought back up.


Scalability.

Scalability.

What is your projected increase in data volume and user volume?

How scalable is your current search solution?

Rather than, Unlike:

Autonomy or Endeca.

Fast is the most linearly scalable search solution Scales in 3 different dimensions.

Volume of data – Append to the matrix, new servers to partition the index and increase the volume capacity as content grows.

Queries per second – Append to the matrix, new servers to house additional query – response services as the number of users grows.

Freshness - Append to the matrix, new servers to house additional document processing services to increase the throughput capacity.

Scalability

Business objectives, Business Challenges:

Reduce the number of servers.

Reduced total cost of ownership.

I CAN GIVE YOU AN EXAMPLE:

Associated Press were a former Autonomy customer who transitioned to FAST. When wanting to increase their query processing rate due to a growth in traffic they were advised by Autonomy to “keep adding servers”. For 5 million documents they already had 125 servers giving only 4 queries per second. FAST reduced the number of servers from 125 servers to 25. Not to give them the measly 4 QPS but 140 QPS, for complex queries on 5 million documents .

This was an 80% saving in hardware costs.

Similarly, at Autotrader.com they had previously employed a DB solution. FAST reduced the number of required servers to20 from 32. The complexity reduction meant auto trader could reduce the number of full time equivalents from 11 to one part time employee. Query processing plummeted from several seconds to a sub second response.

To summarise, this was a 47% saving in hard ware costs and a 95% saving in employee costs.

Auto Suggests.

Auto Suggests.

How are employees made aware of the availability of new information?

How do we assist employees in finding that useful document that they cannot fully remember the name of?

How do we facilitate the search process reducing the path to results?

FAST can suggest relevant content to employees to enable them ask better questions Auto Suggests

Rather than, Unlike:

Allow users to not only recover, but also discover what information is available. Reduce the time spent searching for information. More time spent on high-value tasks and actioning that information.

I CAN GIVE YOU AN EXAMPLE:

Cisco have a broad array of products and services. In avoid overwhelming the user they need to make their product information as consumable as possible. Cisco, powered by FAST uses auto suggest to direct users to content. This means they can find the product faster without being bogged down by information.