A shard is a data partition. It is analogous to an OLAP cube, containing data that is organized into queryable
dimensions. A shard is an application-level partition: every object is assigned to a shard, and a shard holds objects from all tables. Data is stored in arrays on a per-shard/per-field basis. Queries define one or more shards as their query scope. When a field is accessed by a query for a specific shard, the corresponding array is loaded into memory, typically in one I/O. Accessing field values is very fast: typically millions of values/second. Arrays are cached on an LRU basis.
Objects are loaded into a shard in batches. A batch contains new, modified, and deleted objects in any order for any/all tables. Batches are intended to be large: typically thousands of objects. When one of more batches are loaded, the shard is
merged to apply all updates. Once merged, a shard’s updates are visible to queries. After a merge, a shard can receive more batches and be merged again.
In this example, the database holds multiple applications (Magellan,
Galileo.) Each application holds its own shards: The
Magellan application’s shards are named with dates (“
2012-12-20”, “
2012-12-19”, …). Each shard holds objects from all tables and are interrelated using links. Note that it is not necessary for an application to use multiple tables or links in order to take advantage of OLAP: a single table with only scalar fields is all some applications need.