When Spider Works Best

Doradus Spider databases are designed to support applications with one or more of the following needs:

•

Fine-grained searching: Spider provides rich any-field and field-specific searches for terms, phrases, and wildcards. Efficient equality and range searches are provided for all scalar fields, even with large object populations. Indexing can be disabled for stored-only fields. With these features, Spider is ideal for applications that require fine-grained multi-field searching.

•

Rich relationships: Links support bi-directional relationships between objects in the same or different tables. DQL path expressions make it easy to navigate relationships with filtering, quantifiers, and transitive searching. This makes Spider ideal for applications whose data uses rich relationships that must be easily queryable.

•

Variable structure: Data can vary from structured to highly unstructured. For example, an application that harvests data from web pages, emails, or file servers may dynamically discover fields it wants to store and index. Even for structured data, in which all required fields are predefined in the schema, Spider only consumes space for those fields that actually have values. Both predefined and dynamically-defined fields can be queried via DQL.

•

Immediate indexing: Fields are indexed as they are stored, making them immediately visible to queries.

•

Document management: Compared to Doradus OLAP, Spider is better suited for storing and indexing large content objects such as documents, files, and messages. Text and binary fields up to 10MB or more should work well.

•

Fine-grained updates: Both batch and single-object updates are efficient. Frequent updates to single objects and even single fields are quick and immediately reflected in indexes. Data aging allows expiration of each object based on its own schedule.

•

Complex aggregate queries: DQL extensions such as compound/composite grouping provide a wide range of ways in which aggregate queries can be used.

Conversely, Doradus Spider is not a good choice in the following scenarios:

•

Very large databases: Due to its verbose index structures, Spider uses a lot of disk space, up to 10x more space than the input data depending on which fields are indexed. Spider is designed to support databases with millions of objects, but not billions.

•

Large aggregate queries: Spider is optimized for fine-grained, highly selective queries (so called “needle in the haystack” queries). Spider will also do well with queries that select a lot of data that can be returned in pages without sorting. But for large query results that require sorting or aggregate queries that scan large data sets, OLAP and Logging are much faster services.

•

Immutable, structured data: Spider supports these applications, but Doradus OLAP and Logging provide faster queries and denser space storage for this scenario. Unless immediate indexing or fine-grained updates are required, OLAP is a better choice for semi-mutable, structured data, and Logging is a better choice for immutable, unstructured data.

•

NoSQL Anti-patterns: Spider is not a good choice when a simpler database such as a persistent hash table will suffice, nor for NoSQL anti-patterns such as applications that need ACID transactions. Applications that want a persistent queue are not a good fit since objects are not intended to be short-lived. Very large object (BLOB) storage is also not a good fit since each field is stored in a column, and Cassandra loads whole column values into memory – streaming is not supported.