Logging Data Model : Choosing Logging Object Fields

Choosing Logging Object Fields
Other than Timestamp, which fields should you assign to objects? Below is an example batch that shows a minimalistic approach, using as few fields as possible. Suppose we want to create an object for the following log record:
2015-03-24 14:05:42.893 INFO MBeanProvider: Unregistering the StorageManager MXBean
Within a batch, this record could be loaded with the following URI and JSON message:
POST /LogDepot/AppLogs
{"batch": {
"docs": [
{"doc": {
"Timestamp": "2015-03-24 14:05:42.893",
"Message": "INFO MBeanProvider: Unregistering the StorageManager MXBean"
}},
{"doc": {
...
}},
]
}}
Other than the Timestamp, the rest of the log record is stored in a single field called Message. This allows us to find, for example, all log records that contain the term “info” with a query such as the following:
GET /LogDepot/AppLogs/_query?q=Message:info
However, this query will return all objects that contain the term info—not INFO-level records. To allow more granular queries, we could separate the log Level and Module info separate fields:
...
{"doc": {
"Timestamp": "2015-03-24 14:05:42.893",
"Level": "INFO",
"Module": "MBeanProvider",
"Message": "Unregistering the StorageManager MXBean"
}}
...
Now we can query for all INFO level records using the query:
GET /LogDepot/AppLogs/_query?q=Level=info
To search for INFO level records that were created with a module whose name begins with MBean, we can add a wildcard clause:
GET /LogDepot/AppLogs/_query?q=Level=info AND Module="mbean*"
Note that field names such as Level and Module are case-sensitive whereas text field values are not.
Sometimes it makes sense to store both the original, raw log record and separate individual values into their own fields. Then we can use fine-grained queries but also see the full, original log record if needed. For example, here’s a typical Apache log record with both the raw message stored in the message field and many other fields extracted:
{"doc": {
"Timestamp": "2011-05-18 19:40:18.000",
"agent": "\"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)\"",
"auth": "-",
"bytes": "1189",
"clientip": "134.39.72.245",
"datetime": "2011-05-18T19:40:18.000Z",
"enccred": "U3VwZXJNYW46TG9pcwo=",
"filename": "access",
"host": "node2.origin.dev.us.platform.dell.com",
"httpversion": "1.1",
"ident": "-",
"message": "134.39.72.245 - - [18/May/2011:12:40:18 -0700] \"GET /favicon.ico HTTP/1.1\" 200 1189 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)\"",
"path": "/var/lib/openshift/55b6908dd229e7d004000008/app-root/logs/access.log",
"referrer": "\"-\"",
"request": "/favicon.ico",
"response": "200",
"timestamp": "18/May/2011:12:40:18 -0700",
"type": "apache",
"verb": "GET"
}}
Notice that message contains fields that are also extracted to other fields such as clientip, request, and verb.