Monday 12 January 2009

Indexer Fault Tolerance

Indexer Fault Tolerance .

At the end of the document processing pipeline the indexer dispatcher needs to determine which of the columns to dispatch the new content to. The location for the new document is determined by calculating:

Hash of the docid, modulo the number of columns.

This ensures that all documents have a unique location in the index.

The Name services, determines which one of the equivalent indexers is automatically elected to be the Master indexer. The other equivalent indexers are elected as backup indexers.

In a multi node installation we will have a single master Indexer and several equivalent BACK-UP indexers.

After document processing, the indexing dispatcher will dispatch the processed documents (FIXML) to all equivalent indexers on each of the rows.

Only the master indexer will actually create an index for the given document, the other indexers will receive the FIXML but not generate an index until they are required to. Such as, when the master indexer node fails.

If the master indexer fails, search will continue being served by the search node which has a full copy of the index.

Now the name service will designated a backup indexer to be master, and a new index will be generated from its FIXML.

When the original master comes back up again, this will be set up as a backup indexer.

The catch:

When indexer fault tolerance is set up a small performance degradation can be experienced because of the additional effort to copy and secure the processed documents on the backup indexer. To reduce the effect of this, it is important that the traffic between the nodes is sent on a dedicated fast (1Gbit) network.

When a backup indexer takes over, it needs to perform a reset index operation that on large indexes may take several hours to complete before it is ready to index something new.

Search will however be available throughout.

No comments: