Setting olap_compression_threads > 0 can significantly decrease the time needed to merge large OLAP shards. However, the value used must be balanced with
olap_merge_threads and the number of cores available on the system.
One parameter worth highlighting is the OLAP ColumnFamily’s gc_grace_seconds. This value controls how long deleted rows called “tombstones” are retained with data tables (called
sstables) before they are truly deleted during a compaction cycle. The default is 864,000 seconds or 10 days, which provides lots of time for a failed node to recover and learn about deletions it may have missed. However, the OLAP service deletes many rows when it merges a shard, and these rows consume disk space. Moreover, they consume memory because active sstables are memory-mapped (mmap-ed) by Cassandra. This can cause excessive memory usage.
A much smaller gc_grace_seconds value is recommended for the OLAP CF, somewhere between 3600 (a hour) and 86400 (1 day). This causes tombstones to be deleted more quickly, freeing-up disk space and reducing memory usage.