Monday 12 January 2009

FAST ESP Fault Tolerance.

FAST ESP Fault Tolerance.

FAST can provide a proprietary active – semi-active fault tolerance within a single cluster with respect to search. With a 2 row architecture.

In this scenario, we will have two nodes – A & B – node A will host all services while node B will hold a search service, indexer service and Query/Results service. Node B can also hold additional document processing services to balance the load and increase performance.

Because both nodes hold search, indexer and Query/Results services, search is available on both nodes. This is managed by a built-in software load balancer.

This model allows us provide active – active search capabilities on both nodes in case of node failure. That is, there is search fail over. This is what is referred to as, a 2 row architecture.

A 2 column architecture would split the content across the nodes with 50% in each. This would be beneficial if there were very high volumes of data but would not provide redundancy.

With respect to content and indexing.

Node A gathers content, processes this content and uses its own indexer services to build the index.

Content dispatchers write the post processed content called FIXML (FAST Index XML) to the Node A indexer services from which the Node A Index is generated. Concurrently, this FIXML is dispatched to node B. Here node B holds the FIXML but does not yet create its own replica of the index unless the master fails.

This process is continuous to ensure both hosts' FIXML are kept in sync. If Node A fails, we can then generate the index from the Node B's FIXML and vice versa.

This process can take anywhere from minutes to several hours depending on the index size. During re-generation of Node A's index, we can still continue to serve searches from the Index Node B.

N.B. The crawler is a single point of failure. If the node containing the crawler fails, we will not be able to add new content to the index until it is brought back up.

The Indexing Subsystem.

The first indexing dispatcher to register with the Name Service becomes active. The Name Service guarantees that only one indexing dispatcher succeeds. The backup dispatchers monitor the active one, stepping in to take over if it dies. Periodically, the backup indexers connect to the master and ping it.

Incoming operations are rewritten to basic I/O operations

e.g., invalidate file X @ position Y

e.g., blacklist doc X in index Y_Z

Index and Search nodes are arranged in a matrix. Each column holds a subset of the content to distribute the load. This allows scale for volume and indexing performance.

Every row holds a replica of the full content of the index which enables an increased number of queries per second. Rows able to replay operations internally to re-establish synchronization after downtime. Multiple indexing rows add indexer resilience. Rows are in sync both with respect to content and indices.

The column master is elected at run-time. If the master fails, a new master is elected.

The column master synchronizes content operations and indexing to all of its backups.

During indexing of new content there will be 2 indices,

1 that houses the active index against which the search service searches. And.

1 that houses the incremenatal index that is being built from the newly added content. This is added to active index in batches.

This needs to be as large as the active index to allow for scenarios when we need to reset the entire index.

The active index will be divided into 3 partitions - 0, 1 and 2. These vary in ascending size from 25% to 50% to 100%. As content is added, the indexer service will send it to the smallest of the partitions, partition 0. When this reaches maximum capacity the index dispatcher is copied to the next largest partition 1. And so on.

The advantage of this, given that all new content is send to partition 0, is that we can now quickly re-build the smaller index partition.

The column master ensures that column contents are always in sync.

A new index is not activated until the index is ready on all rows. That is, the new index is not activated until all columns have the same content.

The master indexer is the only one to receive operations and initiate indexing, synchronizing.

Currently, all search controllers connect to the master indexer for guidance. Only the master indexer builds indices. Backups only store the FIXML. If the master fails the crown is passed to a backup, which assumes the master role.

A failover will require the index to be rebuilt from FIXML. This may take several minutes to a couple of hours depending on the volume of content. During this time indexing of new content is not possible. However, search is uninterrupted.

Processing subsystem.

Multiple processor servers provide resilience and throughput.

Multiple content distributors provide for resilience and throughput.

No comments: