Indexing
No. of indexers – The more indexers we have the more content we can process in parallel. This prevents an indexing bottle neck.
Smallest Partition Size – The smaller the initial partition the quicker the index can be built. If we can place this small partition on RAM we can greatly expedite indexing.
Trigger – since indexing new content require re-indexing of the smallest partition, any contain sitting in the smallest partition will have to be shuffled in the re-index. Think deck of cards. If we ensure this is as empty as possible we can reduce the shuffle load. Need a balance as too much triggering will require a lot of indexing work.
No. of partitions – The more partitions we have less frequent we need to re-indexing the larger partitions and thus shuffle load is smaller.
Dedicated Server - Indexing is resource intensive so give it a dedicated server if possible. At the very least avoid having search and Doc Processing on the same node as they compete for the same resources.
Separate Index & Fix ML – To minimise the number of writes to the same disk it is beneficial to place the index and Fix ML on disparate physical disks.
Tune Index Profile - The number of unused fields times the large data volumes will add to the time it takes to index content.
Multicast or SAN – If the index is distributed over different servers, the time spent to copy an index to each of the the search node(s) can be expensive and adds to the latency. So use multicasting to write to each concurrently. Even better, use a SAN with shared access to have the dedicated search rows read the index from the same location as the indexer.
No comments:
Post a Comment