Thursday 30 April 2009

Innovation Powered By Search

Slide 1.

Before we kick-off. Just, so I can get a feel, for the audience’s experience and knowledge of FAST, let me ask you some questions.

So, How many of you, have FAST installations in your organisation?

.. OK, quite a few then, no need for a sales pitch then. J

.. OK, not so many. Just yet. J

And, of those you, who do not have FAST installations. Hands up if you have you every used FAST?

Well, you may have unknowingly would you believe.

If you have every performed a search at the FT.com. Or perhaps purchased equipment at Dell.com. Well then, you have used FAST. You just didn’t know it.

Just as Rolls Royce is the unnamed engine behind the World’s largest and most powerful aircraft. FAST too is the unnamed engine behind some of the world’s most Mission Critical Applications.

Slide 2.

And as you will see later on in the presentation. the solutions we power, here, at these Market Leading companies. Are far evolved beyond simple intranet search solutions.

Slide 3.

I have 4 key messages that I want to share with you today. And they are.

1.The evolution of Information Management.

Moving from a Data Centric approach to a more User Centric approach.

Then we will look at .

2. Why Search 1.0 is outdated. And, How upgrading to search 2.0. Will enable us to address, a consumer-centric information management strategy.

Next we will look at.
3. How Microsoft’s Search will enable this end-user empowerment and allow organisation to truly unlock their tangle of data.

And finally.

4. We will look at Some examples of where FAST is enabling organisations carve out competitive advantage.

.. So let's get started.“

Slide 4.

The first thing I want to share with you is the shifting the focus from the owners of the content to the consumers of the content.

Slide 5.

What we see here in the graph is a transition of information control.

Away from a centralized containment and protectionism of information.

To a diffusion of information from a collaborative network of empowered employees and individuals. Each of which, themselves act as both consumers and producers of content.

An example that illustrates this shift. Is the movement of News media from the printing press. Then to online newspapers. And today individual journalist using the likes of Blogger and Twitter to report news.

And this brings enormous value in terms of speed, flexibility and reduced cost.

This was most recently demonstrated during the Mumbai terrorist attacks in India. Where the Breaking news was first revealed on twitter before television. And subsequently the most comprehensive news was ascertained from Twitter and not traditional media structures.

This transition of information control requires that IT moves with it and moves FAST.

We need to shift from the old mindset: I have all this data, how do I best store it and lock it away in a structured and manageable manner.

To the new mindset: I have all these ‘Information Assets’, How do I ensure I get the right intelligence. to the right people. at the right time. to make the right decisions?

Slide 6.

So that was the Paradigm in Information management and that was the first thing I wanted to share with you today.

The second thing I want to share with you today is. The paradigm shift required in Search in order to satisfy this new Information Management model.

Slide 7.

I’m sure we are all familiar with this search layout above. Popularised by the web's premier search engines.

The original search engines if you can remember them were Lycos, Excite and AltaVista. These were surpassed by Google who had the novel idea of using inbound links to calculate the authority weighting of a web page.

The premise held, that for 2 pages containing similar content. The page with the greatest number of links to it from other sites. Should have the more relevant content.

Using an analogy from the book world. For any two books on a similar topic. The one with the greatest number of citations - references to it – should be the more trusted.

However, CRM data. Office Documents and Emails within the enterprise are not linked. But that’s a whole other challenge for Search 1.0 and out of this scope.

Google’s model is very easy to use. You simply pop key words in that box up there and it regurgitates millions of links to matching documents via a one-size-fits-all relevancy.

I like to call it the Mac Donald's of search engines.

Many people turn to it for answers because, Let’s face it - Its quick. Its easy. And it satisfies.

But just as not every taste is satisfied by Mac Donald's.

Equally not all information requests can be satisfied by Google.

There are many unique, specialist and diverse tastes.

Especially within the enterprise. Where there are different departments, roles, offices and geographies have their own interests and needs. that a Mac Donald's type search will simply not satisfy!

We therefore need a more effective model to accommodate these diverse tastes and requirements.

Slide 8.

So, Just in summary of what Search 1.0 actually is.

It is a monologue interaction with a system that takes a users keywords. And responds with a directory of millions of references to where an answer may be found. Or may not!

There is little or no use of insight, no understanding or care for the users intent. The users role. Or the users context.

Users are required to rejig their queries and Yo-Yo in and out across the surface of the results corpus. In the hope of finding more accurate and better information.

In the previous example a search for project management brought back over 23 million results. So if we were to look at only the first 4 pages that would still be less than 0.00002% of the available information. How can we be certain the information we are overlooking is not the most relevant.

Slide 9.

With FAST we have developed a system that plugs the holes left by search 1.0 to provide a richer and more effective consumer-centric user experience. We don’t see search as an excercise to get the best matching documents to the users key words. We see it as a process in which we match a users intent to the underlying content we know exists.

This starts with the enrichment of content. Unlike other search vendors that simply capture the content and dump it into the index. We can enrich the content by passing it through a series of content processing stages.

We see returning the documents as the easy part. We go one step further and from those documents, extract intelligence, facts and important people, places and companies. So the user is not required to.

Unlike other solutions, we provide a fully open and flexible relevancy model. A as opposed to our competitors one-size fits all, black box approach to relevancy.

We look at - Who are the users? What are they trying to achieve? What are they interested in? What are they not interested in? And based on this architect the most relevant relevancy model.

And we engage users in Dialogue. We provide methods that allow them to quickly slice open a result set in various ways to extract only relevant intelligence.

Slide 10.

So in summary.

So, rather than users having to YO-YO in and out of top 10 results. Re-jigging their query to find the relevant information. FAST provides an effective way of slicing open the results corpus to all users to EXPLORE the information as they desire.

And, as opposed to indentifying documents that contain the keywords and dumping out millions of links to these documents. FAST engages users in dialogue with the data to help them sharpen their query and thus provide the most accurate results.

To illustrate with an example. Lets say I have come to a new hotel in a new city and I want to go to a nice restaurant. There are two concierges at the counter. One wearing a Search 1.0 badge and one wearing a search 2.0 badge. I ask them both for a nice restaurant to dine at tonight but they both respond differently.

Search 1.0 concierge picks up a yellow pages directory of all restaurants in town and slides it over to me. And tells me to have a look through it for a restaurant.

Search 2.0 concierge engages me in conversation. What cusine would you like? How much would you like to pay? How far would you like to travel?

Slide 11.

What you will see here is an example of search 2.0 as powered by FAST.

As we cannot show you internal applications for non-disclosure reasons. I have chosen Globrix, a prpoerty search engine because it is something that we can all relate to.

ACCESS TO AGGREGATED CONTENT FROM 90% of the PROPERTY SITES VIA A SINGLE LOCATION SEARCHBOX.

Transforms UNSTRUCTURED queries into STRUCTURED queries (Advanced Query Processing).

EXTRACTS keywords and entities from UNSTRUCTURED LISTINGS (Advanced Query Processing, Entity Extraction) .

Navigation ADAPTS to users interests (Contextual Navigation) .

Ability to narrow/refine results set through INTUITVE VISUAL NAVIGATION (Advanced Results Processing, Geo-Targeting).

Ability to save/hide properties and SAVE SEARCHES and be ALERTED of new listings (Monitoring and Alerting).

Revenue via adverts powered by AdMomentum.

Slide 12.

So that was Search 1.0 versus 2.0 and that was the second thing I wanted to share with you today.

The third thing I want to share with you is Microsoft’s Vision for Search 2.0.

Slide 13.

So, this slide illustrates Microsofts vision for Search 2.0 within the enterprise.

Enterprise Search is about connecting the right people. To the right information. At the right time. To make the right decisions.

There are 3 pillars that encapsulate this vision of the Microsoft search experience:

- firstly, visual engagement. to identify trends and insights form your data.

- Secondly, Conversation handles. that drive dialogues with the data in order to guide users to answers.

- Finally. Search should be actionable. Search is simply the first step in performing a task. We want to assist in that task as best we can. Search does not become useful until we do something with the newly attained information.

An example of actionable search would be the ability to email a document URL to a colleague. Save a document to your search briefcase. Create a power point deck directly from your search results.

Slide 14.

In order to achieve this vision. search innovation will focus on 3 core areas:

User Interaction Management – Engaging users in interactive dialogues with a personalised relevancy. That addresses your tasks. Your role. Your context.

Contextual Matching – Continuing to architect scalable and easy to manage open & flexible platforms.

Content Analytics – The continued development of exstensible frameworks that enable organisations to cleanse, Normalise and enrich data before it enters the index as well as extracting inteeligence, trends and entities.

Slide 15.

So that was Microsoft’s vision for search and that was the third thing I wanted to share with you today.“

The fourth and final thing I want to share with you is some examples of where FAST is using search to drive innovation within market leading organisations.

Slide 16.

As in the first slide, where I described the different organisations, where you may not have known that FAST was behind the scenes.

There exits too, many business areas that you may not suspected FAST was providing solutions.

FAST is more than a Search Engine as it provides capabilities to build solutions to support various business initiatives and or solve business problems.

An initiative can be as simple as conventional site search or it could as innovative and complex as mining transaction information to identify fraudulent activity.

If you are to take anything from this presentation it is this.

As much as Google, may want you to believe it. Search is not simply about typing in queries to a search box to get back millions of links to documents.

At FAST Microsoft, we use search technology to SOLVE CRITICAL BUSINESS CHALLENGES.

It is about solving business problems that require.

Finding Documents.

Matching Content.

Cleansing Data.

Classifying Content.

Aggregating Information.

Extracting Intelligence.

Identifying Trends.

Here, typically the use of other technology such as databases is extremely costsly or impossible.

Slide 17.

An example of search beyond the search box, keywords, text-link paradigm can be viewed at the New York Times “Topic Pages”.

These are self POPULATING AND SELF ORGANISING PAGES that are automatically constructed using search.

Each of the frames, is what we call a searchlet. These searchlets carry some keywords. In this case “climate change”. They point to different content repository in the back-end of the New York times. These may be editorial content. User-generated content. Multimedia repositories. Picture databases.

For each repsoitory they extract all content from that relates to “climate change”.

This contented is presented together to form a “Topic Page”.

They exist for countries, politicians, celebrities et cetera.

Applying this to the enterprise we could have pages populating for different projects. Different departments. Different initiatives.

Benefits:

  • Hundreds of topic pages – dynamically-generated pages that give users an overview of content on a certain topic
  • Up-to-date information – refreshed with each viewing.
  • Minimum editorial and site design workload – search as the portal.
  • Increase stickiness through contextually related content.

Slide 18.

At Dell.com, a FAST customer. They utilised our Search Business Centre to analyse logs and report on the search activities. They found that generic searches for 'laptop' where quite prominent. This is a very generic term so it is difficult for a user to determine the relevancy of one over the other, but interestingly those laptops at the top of the stack yielded more sales. Armed with this information Dell were able to boost those products with the greatest margins to the top of the stack. This has lead to higher conversions on high margin items and increased profits.

Dell have also linked FAST flexible relevancy API to their ERP system. Here a rules engine promotes items based on availability and profit margin. If surplus, promote for increased conversions. If out of stock, temporarily block from results. For a generic search like laptop or server Dell will promote to the top of the results. They items with the greatest profit margins for Dell. Smart eh.

AT NASA Fast have provided a measurable return on investment with major improvements in data access and retrieval. Reducing days of research and retrieval time to minutes.

FAST also and provides the invaluable benefit of capturing and preserving engineering decisions, best practices and lessons learned which prior to this capability were in large part lost as consequence of workforce attrition.

NASA was initially using a competitor until they realised that they were haemorrhaging money trying to configure and tune the black box . Every time they needed to tune the engine, they had to bring an integrator to do so.

They also had several other search engines in use like Verity K2 and Google and wanted to standardize on one platform.

They now use FAST to provide recommendations to researchers in similar domains.

Person to Person recommendations. You and your peer are working on Shuttle cooling systems, your peer enjoys these documents. You may also enjoy these documents.

Item to Person recommendations. You have profile X,Y,Z. These documents may appeal to you.

Item to Item recommendations. You search for this item often. These items are similar to that item. You may be interested.

Slide 19.

Telstra Voice to text search . Globrix maps. Contoso sliding search.

Slide 20.

FAST is used by Dept of Agriculture in the United States to help fight against illegal plants being sold over the Internet.

FAST is also used by an Asian Enforcement agency to monitor child pornography, and by the Norwegian Toll and Excise department to monitor transactions in and out of the country. A German police force uses Fast to ensure they can match incidents to people across a myriad of different storage systems located in multiple police stations

How can I give myself a good introduction before presenting?

The introductions purpose is to help establish the speaker's credibility.
It should create interest in the speaker and what the speaker has to say.

A good introduction to the delivery of your presentation is extremely important. The first minute or so sets the stage for the rest of your talk and the audiences perception of you.

It is very important to start strong.


Some questions that should be addressed:
  • Why this topic?
  • Why this topic for this audience?
  • Why this topic for this audience at this time?
  • your experience with the topic.
Answer enough to let people know why they should listen to you.
  • Who are you?
  • What is your topic?
  • Why is it important?
  • Don't give away the secret of your talk, but whet their appetite.
  • What will they have gained by the time the talk is finished? Don't feel shy to promise that they'll learn something useful; they really want to know that.
Here are some other ideas for openers:
1. Ask your audience a question and ask them to raise hands in reply.
"How many of you have previously used our technology?..."
2. Begin with an interesting, relevant fact or quote. Then use that quote to launch your talk.
"
IDC determined that the world generated 161 billion gigabytes – 161 exabytes – of digital information last year...That's like 12 stacks of books that each reach from the Earth to the sun. Or you might think of it as three million times the information in all the books ever written, according to IDC."
3. Mention something another speaker said, or a current event, that is related to your presentation.
4. Start with a short, relevant personal story or experience.

Wednesday 22 April 2009

How do I speak and present like Obama?

http://www.bnet.com/2403-13074_23-290100.html?promo=808&tag=nl.e808

1. Talk About the Audience’s Concerns

Tell OUR story before telling YOUR own.

Start your talk by broadly defining the situation that your listeners face. Then, once you’ve got them nodding their heads in agreement, move on to describe the problems or challenges that are on their minds.

Start where the audience is, not where you are.

Use real life GRASS ROOTS EXAMPLES - with emotion, visuals, real people:

- "The parents who lie awake at night when there children are asleep worrying about how to secure the next mortgage payment or pay their doctors bills"

- "There young boys and girls that spent sleepless nights in the icy cold desert of Iraq"

Once you have their attention, you can lead your listeners wherever you want to take them.

2. Keep It Simple and Catchy

“change you can believe in” — simple and easy to remember.

"Increasing the findability" - simple and informative

Chisel away at your topic until you can reduce your presentation to a core message. Once you achieve this, all your complex ideas can march behind it.

All audiences, no matter how sophisticated, have limited attention spans and a limited ability to retain detailed spoken information. Don’t fear that you’re leaving details out; you must be selective. After all, what good is a thorough and detailed argument if it is inaccessible?

3. Anticipate What Your Audience Is Thinking

When you express one view, the odds are high that people will reflexively think about other, unmentioned aspects of the topic.

A presentation that does not deal with this “evoking of opposites” loses the audience’s attention because it fails to address the questions and concerns that come up in people’s minds.

So anticipate it - Show your audience that you understand the contrary view better than they do, and explain why your proposal or argument is still superior.

"It is a better solution than X because of A,B, C ... however, I do understand that solution X provides 1,2,3 but..."

4. Learn to Pause

Obama has mastered the art of pausing will holding control of the audience attention.

- Pauses to let the audience to catch up with him.

- Pauses to let his words resonate.

- He pauses, in a sense, to let us rest and absorb.

- Pauses also give the impression of composure and thoughtfulness.

Here’s an exercise to help you learn to pause.

  • Mark UP your PARAGRAPHS / in THIS manner / INTO / the SHORTEST possible PHRASE_. / First,_ / whisper it, / BREATHING / at all the BREATH marks. / THEN,_ / speak it / in the same way. / DO this / WITH / a DIFFERENT paragraph / every day.

Here’s what the opening paragraph of Obama’s remarks would look like:

  • “If there is anyone out there / who still doubts / that America is a place / where all things are possible, / who still wonders / if the dream of our founders / is alive in our time, / who still questions / the power of our democracy, / tonight / is your answer.”

Where you pause is up to you; there are no hard and fast rules. But try it. Slowly inhale to the count of three at each breath mark. Speak as though you had plenty of time. The goal / of this exercise / is to teach your body / to slow down.

5. Master the Body Language of Leadership

Obama’s body language is relaxed and fluid.

It does not display tension or fear.

He’s calm and assertive — which is exactly what you need to be to get people to comply with your requests.

To achieve the body language that’s effective for you, focus on a single attribute — for example, calm — and practice implementing it in the basic motions of your day, from getting dressed in the morning, to leaving your home for work, to greeting your friends and colleagues. Research in the Scientific American suggests that focusing on one word is the most effective way to learn a new behavior. It will probably feel forced at first, but don’t worry. It will soon become natural, and eventually your body language will communicate the right mix of calm and assertiveness.

Finally, you’ll need to rehearse. Practice calmly walking up to the lectern or the front of the room. Arrange your papers calmly.

Look out to the audience with a sense of command, with assertiveness. Let the silence hang for a moment, and only then deliver your opening remarks.

Calmness begets a sense of authority. Behave as if you are in control, and you will in fact gain control and command attention.

Tuesday 21 April 2009

FAST ESP Index Profile - partner briefing

INDEX PROFILE:

Prior to indexing a document, the FAST Search Engine maps the document's elements to fields.

Fields are defined document elements that are to be searchable.

Defining fields allows the end-user or external query application to specify searches that cover only individual parts of a document such as the title or body part.

FAST ESP supports text, signed and unsigned integer, float, double, and datetime fields.

Integer, float, and double fields contain numerical values that can be matched against a query by using numerical comparisons

such as less than, greater than, and equal to.

WHAT: Analogous to the process of defining a database schema or database table structure.

FIELD ATTRIBUTES: Defines which fields Map from document processing elements to searchable fields in the index

Dynamic Teaser: summary of the field that presents the sections within that field with the most relevant query match, and highlights the query terms. Can quickly identify snippet of texts that contains the key words entered.

Categorization: supervised clustering. Using FAST’s RBC or 3rd party module to classify documents in the pipeline according to rules. Placing the classified documents into a taxonomy structure.

Unsupervised clustering: Clustering is the automatic detection of groups or clusters of documents that have similar content. Unsupervised because it is more dynamic than categorization in that the clusters are generated automatically based on the actual results. In the index profile we can set the similarity threshhold – 0.1 – 1.

Proximity Rank Boost: Proximity relevance implies that the distance between query terms in matching documents impacts the query results. The proximity computation contributes a portion of the overall rank of a document within a result set. Proximity affects only queries with multiple words. We can determine the weight contribution of the proximity to overall ranking. In documents with large

Field Collapsing: Allows a folding of results with the identical value for a given result field. Determine which fields to collapse on.

Text Sorting: Using the fullsort attribute of fields we can specify if a field should be configured for full-string or not. This takes up more index storage space and is memory intensive.

COMPOSITE FIELDS: used to group several fields together - allows a query to be executed on several fields at the same time. We can stipulate which reference fields to combine into a single composite field.

SCOPE FIELDS: special field types that supports dynamic indexing and searching in hierarchical content, such as XML. We can narrow search to specific sections within a document. Focus the scope of a search to specific XML nodes, specific paragraphs or specific sentences. We can define this in the index profile.

RANK PROFILE: We can generate different rank profiles that apply different ranking to the result set based on a number of factors – Authority, Quality, Freshness, Composite Rank. This way we are not limited to a one size fits all relevancy model. We can present different results to different groups depending on their tasks, their objectives, their context.

Rank Profiles are linked, one to one to a composite field. The fields that make up the composite field are mirrored in the rank profile and assigned different weightings.

GEO Specification: based on sorting and/or filtering query results based on geographical distance from a defined geographical location. Using geo search requires that the documents are tagged with geographical position information.

HOT versus COLD Updates:

Certain index profile changes require a full re-index of the content within the affected search engine nodes, while others only require that the new index profile be installed.

An index profile update can either be a Hot update or Cold update, depending on the changes you have applied to the index profile.

• If you perform a Hot update, then existing collections are not affected.

• If you perform a Cold update, the contents of all collections are automatically emptied when the update is performed.

Be sure to refer to the Configuration guide before making index profile changes. There are too many exceptions to the rules to list them here. Best rule of thumb is, determine the configuration required for the index profile before indexing content and don’t change subsequent to indexing content.

FAST ESP Relevancy - partner briefing

Relevancy:

Relevancy is how we tune the search engine to meet end-user expectations and business needs.


Enterprises use search in multiple contexts: commerce sites, intranets, extranets, portals, etc. Each has distinct objectives, and user communities value content differently. Therefore a one size fits all relevancy a la Google or live.com will not suffice.

Recall is the ratio of the number of relevant records retrieved to the total number of relevant records in the index. Knowledge discovery or compliance applications will rate recall as being more important than precision – in other words, customers do not want to “miss” any important documents.

For example a search for “Apple computers”.

With respect to recall. All docs that mention both apple and computers should be returned.

Precision: Of all the documents recalled, only docs mentioning both apple computers and not the fruit should be returned.

RECALL is about Increasing the findability. Amplifying the target size. Linguistics help us achieve this. For a knowledge discovery or compliance solution we need to capture all possible information.

PRECISION is about Removing spurious results. Returning nothing but the truth.
Precision is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved.
In an e-commerce or e-directory environment, users prefer much more precise search results so that customers are not swamped with too many non-specific results.

In general, customers need to strike a balance between finding everything related to a query and only the documents that relate to a given query.

The truth, The whole truth [Recall] and nothing but the truth [Precision].

Use linguistic tools

Apply lemmatization to improve precision and recall;
- synonym expansion to improve recall;
- spell checking to prevent futile (0 hit) queries.
- Activate antiphrasing to remove the “noise” from the query, such as the text of the phrase “how do I”.

Search users do not all share the same business objects and in turn do not have the same search needs. We need to accomodate this when determining relevancy.

VISUALISE A GRAPHIC EQUALIZER ON AN 80s SOUND SYSTEM WE CAN INCREASE/DECREASE THE BASS FOR EACH OF THESE.

SIMILARILY WE HAVE PRESETS. THINK ROCK, JAZZ

So, Consider a Google search with a particular query term let’s say apple – Irrespective of whether you are a CEO, Finance director, a student or a researcher you will each be returned exactly the same set of results. Your intent and business objectives are ignored. It is one size-fits-all solution.

We determine the static relevancy in the index profile using rank profiles. A FAST ESP installation can have multiple rank profiles over the same set of data. Each department for instance can apply their own relevancy model to the content.

From what we have discussed tell me what type of site or application would benefit from a relevancy model like this?

And what site or application would not?

- As I mentioned there exists TWO TYPES OF RELEVANCY – STATIC RANK. This is DETERMINED AT DOCUMENT PROCESSING TIME.

- AND DYNAMIC RANK DETERMINED AT QUERY TIME.

- We can AUGMENT THE STATIC RELEVANCE WITH DYNAMIC RELEVANCY SETTING.

- we can choose to INCREMENT OR DECREMENT RELEVANCY POINTS FOR A particular DOCUMENT. For example, ANYTHING FROM CEOs OFFICE boost 100pts. ANYTING FROM FT boost 1000pts. ANYTHING CONTAINING A SPECIFIC TERM BOOST BY Xpts. Anythinkg from a particular source decrease by X points.

- SEASONAL SCHEDULES.