• Embed Doradus: The REST API cuts load time roughly in half (possibly more) due to marshaling, network transfer, parsing, etc. For bulk loading data, you can treat the Doradus server as a “driver” and embed it in the same JVM as your application. (Obviously, this only works with JVM-based applications.) More information about embedding Doradus is provided in the Doradus Administration documentation, but here are the highlights:
• Call DoradusServer.startEmbedded(String[] args, String[] services) or DoradusServer. startServerUnblocked(String[] args) to start the server.
• Load only the services you need for your embedded application, possibly only the OLAP service. For example, if you use startEmbedded(), pass com.dell.doradus.service.olap.OLAPService in the second parameter.
• Find your application’s ApplicationDefinition by calling SchemaService.instance().getApplication(String appName), then add a batch of data by creating an OlapBatch object and pass it to one of the OlapService.instance().addBatch() methods.
• Use HTTP compression: If you use the REST API, compress input batches using GZIP and add the HTTP header Content-Encoding: gzip.
• Optimize batch size: OLAP works best with update batches containing 100’s to 1000’s of objects depending on your schema and data size. Shards containing lots of small batches take longer to merge, so experiment with batch size and merge frequency to determine the optimal batch size for your application.
• Use multiple threads: Whether you use the REST API or embed Doradus in your application, OLAP benefits from loading data in concurrent threads. More load threads requires more memory and CPU, so balance the number of load threads with available resources.
• Enable olap_merge_threads: This doradus.yaml file parameter can be used to merge data segments in parallel even for a single load thread. It defaults to 0 (single-threaded merging). You can set this value up to the number of CPUs to decrease merge time.
• Enable olap_compression_threads: This doradus.yaml file parameter can be used to compress data segments in parallel even for a single load thread. It defaults to 0 (single-threaded compression). You can set this value up to the number of CPUs to decrease merge time.
• Adjust olap_compression_level: This doradus.yaml file parameter controls the GZIP compression level used when data segments are compressed. The default value (-1, which is the same as 6) selects a good balance between CPU time and compression ratio. If you select a smaller value such as 2 or 4, compression speed will improve, which reduces merge time, at a cost of using more disk space. The highest value (9) employs maximum compression, using more CPU and therefore lengthening merge time.
• Use background merging: Instead of explicitly requesting merges via REST commands, you can schedule automatic background merges by setting the application option auto-merge to a value such as 1 HOUR. This simplifies application logic, and on clusters it distributes load tasks among Doradus instances. Background merging can also reduce the need to increase olap_merge_threads or olap_compression_threads, freeing-up CPU and memory resources.