Tuesday 21 April 2009

FAST ESP Index Profile - partner briefing

INDEX PROFILE:

Prior to indexing a document, the FAST Search Engine maps the document's elements to fields.

Fields are defined document elements that are to be searchable.

Defining fields allows the end-user or external query application to specify searches that cover only individual parts of a document such as the title or body part.

FAST ESP supports text, signed and unsigned integer, float, double, and datetime fields.

Integer, float, and double fields contain numerical values that can be matched against a query by using numerical comparisons

such as less than, greater than, and equal to.

WHAT: Analogous to the process of defining a database schema or database table structure.

FIELD ATTRIBUTES: Defines which fields Map from document processing elements to searchable fields in the index

Dynamic Teaser: summary of the field that presents the sections within that field with the most relevant query match, and highlights the query terms. Can quickly identify snippet of texts that contains the key words entered.

Categorization: supervised clustering. Using FAST’s RBC or 3rd party module to classify documents in the pipeline according to rules. Placing the classified documents into a taxonomy structure.

Unsupervised clustering: Clustering is the automatic detection of groups or clusters of documents that have similar content. Unsupervised because it is more dynamic than categorization in that the clusters are generated automatically based on the actual results. In the index profile we can set the similarity threshhold – 0.1 – 1.

Proximity Rank Boost: Proximity relevance implies that the distance between query terms in matching documents impacts the query results. The proximity computation contributes a portion of the overall rank of a document within a result set. Proximity affects only queries with multiple words. We can determine the weight contribution of the proximity to overall ranking. In documents with large

Field Collapsing: Allows a folding of results with the identical value for a given result field. Determine which fields to collapse on.

Text Sorting: Using the fullsort attribute of fields we can specify if a field should be configured for full-string or not. This takes up more index storage space and is memory intensive.

COMPOSITE FIELDS: used to group several fields together - allows a query to be executed on several fields at the same time. We can stipulate which reference fields to combine into a single composite field.

SCOPE FIELDS: special field types that supports dynamic indexing and searching in hierarchical content, such as XML. We can narrow search to specific sections within a document. Focus the scope of a search to specific XML nodes, specific paragraphs or specific sentences. We can define this in the index profile.

RANK PROFILE: We can generate different rank profiles that apply different ranking to the result set based on a number of factors – Authority, Quality, Freshness, Composite Rank. This way we are not limited to a one size fits all relevancy model. We can present different results to different groups depending on their tasks, their objectives, their context.

Rank Profiles are linked, one to one to a composite field. The fields that make up the composite field are mirrored in the rank profile and assigned different weightings.

GEO Specification: based on sorting and/or filtering query results based on geographical distance from a defined geographical location. Using geo search requires that the documents are tagged with geographical position information.

HOT versus COLD Updates:

Certain index profile changes require a full re-index of the content within the affected search engine nodes, while others only require that the new index profile be installed.

An index profile update can either be a Hot update or Cold update, depending on the changes you have applied to the index profile.

• If you perform a Hot update, then existing collections are not affected.

• If you perform a Cold update, the contents of all collections are automatically emptied when the update is performed.

Be sure to refer to the Configuration guide before making index profile changes. There are too many exceptions to the rules to list them here. Best rule of thumb is, determine the configuration required for the index profile before indexing content and don’t change subsequent to indexing content.

2 comments:

Michael Farag (aka the wolf) said...

Does FAST provide any Schema (as in XSD) for indexing profiles?

luisalves00 said...

Is it possible to give different weighting for different collections on the same profile?

something like:
I have 2 collection - coll1 and coll2


and on My_Default_Profile define taht
results on coll1 have 60 of weight
and results on coll2 have 40 of weight