This installment of our series on datastores addresses one of the most-used databases in business today: The log store. Log stores are very specialized and have a very specific purpose. Using it for anything outside that purpose is going to make the store inefficient and cause you headaches down the road.
Log stores are meant to be storage for quick inserting and streaming in data. It’s generally time series-ed and ingested in a way where one of two things are happening. Either the data is being consumed by another system and then shoved somewhere to log it, which means it’s not live or it’s streaming in, either the system is reading from something that is streaming in or is being posted to. Either way there are files that are filling with logs on a system which get pulled into a SaaS system or other outside log store. The idea is that as they’re getting pulled in, there are regular expressions parsing the data and giving it context with the expectation that a certain log prints or aligns its data in a certain format. You will know where the date is going to be, where a specific type of data is in a printed error statement, and you can surface the data and sort it from what would otherwise be a never ending log file.
The reason why this is important is it is meant to take simple data that’s usually line delimited and put it somewhere it can be filtered to get eyes on it. Log stores are not meant to do complicated queries and cross-sections, they aren’t meant to delete data or certainly not handle a data life cycle. They are literally meant for logs, to take a quick snapshot of what is happening so that someone can view it and know what the data is doing. If you want to look at the data from two days ago, the log store is the way to do it. You generally do not look at log store data in real-time, unless you put programmatic spies on the data so that it’s being compared to criteria as it comes in. You can then create alerts, alarm points, or have automated actions take place based on specific matches.
The other major advantage of log stores is centralization and aggregation. Often you will take data from multiple systems that have logs in different places. Essentially, you’d have to go into each system to look at them and start pulling down files. But with a log store you can see a snapshot of time across multiple log sources to determine where something started, what systems are affected and figure out what problems might be happening. It centralizes what you are doing.
Log stores can handle massive amounts of data and can filter a lot of data very rapidly (remember you’re meant to be looking at day from two days ago – so rapid may be 15 minutes to get a result), which is why people tend to misuse them. They want to use a log store or an event log for relational purposes, but a log store simply doesn’t work that way. You don’t want to look at patterns in a log store because all you can do is filter for a specific event, there’s no other relationship that can be actively derived. Spying on live data and reacting to it is one thing, but it doesn’t mean it can do other things efficiently. The more data you put into it, the longer it takes for queries to return.
Log stores are limited in their functionality, but they are very good and very efficient at what they are designed to do. Every business should have a log store of some kind to track events and help locate when and where a problem occurred. Just don’t try to do more with them because it’s going to get hung up and not work properly.