Datastores can be complicated and frustrating if you don’t know how to use them properly. The different types of databases have different purposes, different ways of completing tasks and different sets of drawbacks. Knowing what datastore to use for a specific purpose is the key to working with your data efficiently and effectively. In this installment of our datastores series, we will discuss caches and queue stores, what they are good for and what drawbacks they present.
Caches are a different beast than other databases and datastores. The data in a cache is meant to be ephemeral. It works by taking data and giving it a place to live where you can reference it, usually in memory, when needed. Caches are meant to take data from any other datastore and put it in a place to be accessed quickly. Caches move around data faster and are meant to respond to request for data faster. Depending on the datastore and depending on the cache, the response from the datastore is going to be hashed either based on the query that was requested, or the result of that query. The benefit is that once it’s hashed, it’s in the cache. So, the next time you run that same query, it returns faster.
A cache functions like a key in an object store. You have a key, whether it’s a hash query or hash response. Once you have the key, the cache will return the same response over and over and you don’t ever have to run the query again because you can look up where it is. The response is static, unchanging, and you’re not actually doing anything.
But again, caches are meant to be ephemeral. They are meant to do things quickly; they live in memory. Their use is for data needed on a temporary basis. Queries can be run over and over, so if you have a cache database at your fingertips, that query will take less time to return once it’s been hashed. If there’s a miss in the cache, you just re-run the query. The actual data is stored in your main database or datastore, so you still have it, the cache just returns the response faster once it’s been done and placed there.
The biggest problem with cache databases lies with the user. Often people will shove data into a cache when they shouldn’t, or put data there that isn’t located anywhere else, which becomes a problem when the machine starts to break. For example, for some ephemeral storage, you need an extremely rapid response, so you put the data in the cache on a short-term basis. But then you forget to put that data in your long-term storage, whatever your main database is or wherever you planned to put that information. Some people consider this rapid, long-term storage, when it isn’t. It’s ephemeral and will disappear. If the machine breaks, you should be able to turn it off, turn it back on and put up a new cache without disrupting anything. Sure, queries might take a little longer to return for a short time, but nothing is lost. Unless you forgot to put it where it belongs.
Queue stores have a very specific purpose, which is to take a task or information that is asynchronous to the user or to the system. You don’t care about the data at that specific moment, it doesn’t need to be in real time. You just need to do something or get something with that data later, so you drop it in a waiting line. Then, when your system has the resources, it goes back and works with the next item waiting in line. This process allows you to rapidly drop in a lot of data that just sits there until you can get to it, essentially in a holding pattern.
The problem with queue stores is that people often do not understand what should go in a queue. Many times, people will put everything in a queue, even tasks that take just a few seconds that a user would just wait for. People also create queues and don’t think about how the store is going to be ingested, what is going to delete messages out of it, whether it’s going to flood if the data comes in too rapidly, which would mean there’s too much data in the queue and you can’t get anything out of it. That creates a backlog. In most cases, queues are semi-ephemeral, standing somewhere between a standard database and a cache, so a backlog is a problem.
Essentially, when you put data into a queue, the intent is that data will run through a process. Once the process is run and the data has been worked with, the queue will be told to get rid of the data. A queue is not meant to hold data for a long time, which is something people do not always understand. They let data sit in the queue and build up. Because it works similar to a cache, it’s very efficient with a certain amount of data in its memory. But when that buildup happens and the data can’t be emptied fast enough or there isn’t enough memory allocated to it, it becomes extremely inefficient. Then it becomes unstable, which can become detrimental very quickly. Queues are not infinitely elastic, if you stretch it too far, it will break.
Caches and queue stores are datastores. They are not, however, long-term datastores. Caches are highly ephemeral, needing to be emptied and refilled regularly. Queues are semi-ephemeral, so they can hold data and information a little bit longer, but still need to be emptied and have that data moved elsewhere when it’s done being worked with. Both caches and queue stores are very efficient at what they do and can aid in a business’s data storage system but are best used when combined with other databases.