Tuesday, 13 January 2009

FAST ESP Fault Tolerance

FAST can provide a proprietary active – semi-active fault tolerance within a single cluster with respect to search. With a 2 row architecture.

In this scenario, we will have two nodes – A & B – node A will host all services while node B will hold a search service, indexer service and Query/Results service. Node B can also hold additional document processing services to balance the load and increase performance.

Because both nodes hold search, indexer and Query/Results services, search is available on both nodes. This is managed by a built-in software load balancer.

This model allows us provide active – active search capabilities on both nodes in case of node failure. That is, there is search fail over. This is what is referred to as, a 2 row architecture.

A 2 column architecture would split the content across the nodes with 50% in each. This would be beneficial if there were very high volumes of data but would not provide redundancy.

With respect to content and indexing.

Node A gathers content, processes this content and uses its own indexer services to build the index.

Content dispatchers write the post processed content called FIXML (FAST Index XML) to the Node A indexer services from which the Node A Index is generated. Concurrently, this FIXML is dispatched to node B. Here node B holds the FIXML but does not yet create its own replica of the index unless the master fails.

This process is continuous to ensure both hosts' FIXML are kept in sync. If Node A fails, we can then generate the index from the Node B's FIXML and vice versa.

This process can take anywhere from minutes to several hours depending on the index size. During re-generation of Node A's index, we can still continue to serve searches from the Index Node B.

N.B. The crawler is a single point of failure. If the node containing the crawler fails, we will not be able to add new content to the index until it is brought back up.


No comments: