Tuesday 7 April 2009

ESP INDEXING & SEARCHSUBSYSTEM

INDEXING & SEARCHSUBSYSTEM:

The indexing and search subsystem, often referred to in short as the indexing subsystem, is responsible for :

-receiving processed content in the form of FiXML, persisting it to disk, indexing the new content.

- serving search results from the index.

This task is performed by a number of components interacting:

Indexing dispatcher :

– dispatch content to the potentially multiple columns of indexers in the installation, and to collect callbacks from each column, and send the appropriate callbacks to the previous subsystem usually, but not necessarily, the Content Distributor when it is correct to do so.

The indexing dispatcher also does the translation from external document ids (content id), to internal ids ( which are md5-based – md5 is a cryptographic hash function with a 128-bit hash value).

The indexing dispatcher is fault-tolerant, and also supports load-balancing.

Indexer:

An executable that performs the indexing activity and controls index related processes.

Fix ML index:

The fix ML index process is owned and invoked by the indexer. This process produces new indexes for a specific partition.

F dispatch:

Dispatch queries to a set of configured engines (f search). For each of the Rows and Columns.

Merge results from different rows and columns.

Perform the QPS License check.

F search:

Match a query against a pre-build index.

Perform document summary retrieval and hit highlighting.

Return a single list of matching documents up to fdispatch.

The data flow inside the indexing and search subsystem can be briefly described as follows:

The indexing dispatcher handles the distribution of content across columns, splitting up a batch of operations according to its routing policy, passing it to the indexers and collects the callbacks from the indexers.

The indexer then receives its chunk of data, persists the data to disk, synchronizing it across the column if there is more than one [indexer row], and sends the "secured" callback.

When it is time to build an index, the indexer figures out what content needs to be indexed, and talks to fix ML index to build the index.

Then, the index is distributed to remote search nodes if needed, and the search nodes are informed about the new index set.

No comments: