Doradus Configuration and Operation : doradus.yaml Configuration Parameters : OLAP Parameters

OLAP Parameters
These parameters control the operation of the OLAP service:
olap_cache_size_mb: 100
This parameter controls the memory size of the most-recently-used field cache. If this value is exceeded, older fields will be un-loaded from memory.
olap_file_cache_size_mb: 100
When this value is non-zero, it enables OLAP “file” caching and sets the total size, in megabytes, of the cached data. OLAP uses virtual files to hold raw scalar values such as text values. Caching this data prevents round-trips to the database for certain types of queries. This value does not affect shard caching defined by other parameters.
olap_query_cache_size_mb: 100
When this value is non-zero, it enables recent query result caching and sets the total size, in megabytes, of the cached search results. Each cached result takes 1 bit per each object in the table.
olap_cf_defaults:
- compression_options:
- sstable_compression: "" # use empty string for "none"
- gc_grace_seconds: 3600
This parameter is used to create the OLAP ColumnFamily. All parameters recognized by Cassandra are accepted and passed “as is”. Parameters must be indented and begin with a “-“; sub-parameters must be further indented. The OLAP ColumnFamily is created when the server is first started for a new database. If the OLAP ColumnFamily already exists, it is not modified to match these parameters. See Cassandra documentation for details about these values. Note that Doradus automatically compresses data before storing in Cassanda, hence compression should be disabled for the OLAP ColumnFamily.
olap_merge_threads: 0
This parameter controls the number of threads used to merge data within a shard. The default value of 0 means that a single thread is used to merge all data. When this value is > 0, multiple threads are used to merge shard data in parallel. Increasing this value can significantly decrease the time needed to merge shards. However, it must be balanced with the number of processors available on the machine.
olap_compression_threads: 0
This parameter controls the number of threads used to compress data before storing in Cassandra. This parameter interacts with olap_merge_threads as follows:
If both olap_merge_threads and olap_compression_threads = 0, then a single thread is used to merge and store all data for each thread.
If olap_merge_threads > 0 and olap_compression_threads = 0, then each merge thread both merges data and compresses all segments before writing to Cassandra.
If olap_compression_threads > 0, then the merge thread(s) create data segments that are then queued for compression in the specified number of asynchronous compression threads.
Setting olap_compression_threads > 0 can significantly decrease the time needed to merge large OLAP shards. However, the value used must be balanced with olap_merge_threads and the number of cores available on the system.
olap_compression_level: -1
This parameter controls the GZIP compression level used to compress data in OLAP before storage. The default value, -1, is the same as value 6, which is a good balance between speed and space usage. Value 0 means “no compression”, which speeds shard merge time but requires the most amount of disk space. Value 9 means “best compression”, which requires the most CPU but uses the least amount of disk space. Tests suggest that levels 2, 4, and 6 are good choices.
olap_search_threads: 0
This parameter controls the number of threads used to perform multi-shard queries. When this value is > 0, up to the configured number of threads can be used by a single query to search different shards in parallel. Increasing this value can significantly improve performance for certain threads. However, it must be balanced with the number of processors available on the machine.
One parameter worth highlighting is the OLAP ColumnFamily’s gc_grace_seconds. This value controls how long deleted rows called “tombstones” are retained with data tables (called sstables) before they are truly deleted during a compaction cycle. The default is 864,000 seconds or 10 days, which provides lots of time for a failed node to recover and learn about deletions it may have missed. However, the OLAP service deletes many rows when it merges a shard, and these rows consume disk space. Moreover, they consume memory because active sstables are memory-mapped (mmap-ed) by Cassandra. This can cause excessive memory usage.
A much smaller gc_grace_seconds value is recommended for the OLAP CF, somewhere between 3600 (a hour) and 86400 (1 day). This causes tombstones to be deleted more quickly, freeing-up disk space and reducing memory usage.