visit
Let’s start this journey with DocumentDB!
Okay, to be really honest, this title is clickbait*.*
I could definitely write something like “how I made cost optimization on our AWS infrastructure by respecting some commons guidelines provided in the documentation” but it’s way less catchy, nah?
Maybe some of you guys will already know these tricks and good practices.
Wait, what, 0.20$ per million I/O?
Once, once the data has been read from the storage volume and continues to reside in memory, subsequent reads of the same data do not incur additional I/Os.This phrase is key to understanding what’s behind I/Os.
Queries that use an index will likely use fewer I/Os as you’re not scanning the all storage of your collection. It’ll certainly consume I/Os but way less than scanning an entire collection.
Furthermore, the RAM of your instance needs to cover your index size, it’ll allow you to not incur additional I/Os.
🧠 First, remember this: fewer I/O’s= cheaper = better performance, here it’s not all about costs or not all about performances, but the two things are linked.
❌ Remove unused indexes: you don’t know how expensive is an unused index for a busy collection. I made my company save 2,000$/month just like that 🤌 , by deleting unused indexes. And it’s very easy to track unused indexes with this query:
The query will output the field ops
which is corresponding to the number of times that your index is hit. Depending on the load of you’re application, please consider removing the unused index.
🧐Activate performance insights and profiling operations: if you use RDS, you might be aware of performance insights, it gives you some very helpful metrics and information about the queries that are hitting your DocumentDB performance, and you can quickly see the queries that consume I/Os operations (and the amount of them), so it’s very good to track easily a bottleneck. Another way to monitor slow queries or collscan queries is by activating Profiling operations, as the name suggests it’s profiling for you some operations (here’s a link to get more info: ), you can set a threshold which will put on CloudWatch a log of an operation that is taking more than n ms. Very useful to track the number of queries that are performing COLLSCAN for example. Please activate both of these options as they’re very valuable!
💾 Look always first at your data: you’ll need to identify the best high-cardinality field that you want to index, if you’re not used to the concept of index cardinality, the documentation of AWS DocumentDB is well explained :)
🫠Avoid small tricky collections: if you plan to have a collection that will have three fields with one of them with a unique key, and if you’re planning to perform a lot of updates/inserts, please consider the modelization of your collection, because your I/O ops will hit like hell and so your I/O usage.
⏱️Avoid TTL, aka time-to-leave indexes: (most of the time) you can handle it without setting a time-to-leave index, so please check that the TTL parameter is not enabled on the instance or cluster.
💡Explain! A very simple way to check the index selectivity of the query planner when you’re making a new query (or not) is to perform an explain operation with the executionStats
parameter. You’ll be surprised that some queries that you’re thinking hit index, just don’t hit any index…
☯️Don’t create an index for a boolean field. Just don’t. Remember cardinality.
⚖️Monitor the average size of an object for each collection that you have with this command: db.<mycollection>.stats(1024)
An extreme average size can create quickly a lock on your queries and increase I/Os ops because the RAM of you’re instance is not enough. Please monitor closely objects and not store unnecessary fields. If you need to store many fields, consider optimizing queries by not selecting all the fields.
⚠️Be aware that DocumentDB is not MongoDB. It’s mainly compatible with MongoDB but it’s not MongoDB as there are some shitty specific behaviors. For example, if you want to perform a query with the $regex
operator, you’ll need to `hint()` you’re index, as it is mandatory. The exclusion operators will never use any index, so please consider these behaviors when making or optimizing your indexes!
👉Never hint. Except for the very-specific use-cases mentioned above, you should avoid the usage of hint
, have in mind that if the query planner doesn’t elect your index, it’s for a good reason. Most of the time it’s because it’s longer or equivalent to scanning the index instead of all the documents from the collection.