visit
The phrase “better safe than sorry” gets thrown around whenever people talk about monitoring or getting observability into your AWS resources but the truth is that you can’t sit around and wait until a problem arises, you need to proactively look for opportunities to improve your application in order to stay one step ahead of the competition. Setting up alerts that go off whenever a particular event happens is a great way to keep tabs on what’s going on behind the scenes of your serverless applications and this is exactly what I’d like to tackle in this article.
And while Cloudwatch is a good tool to get the metrics of your functions, takes it up a notch by providing that missing link that you’d need in order to properly debug those pesky Lambda issues. It allows you to detect any kinds of failures within all programming languages supported by the platform. This includes crashes, configuration errors, timeouts, early exits, etc. Another quite valuable thing that Dashbird offers is Error Aggregation which allows you to see immediate metrics about errors, memory utilization, duration, invocations as well as code execution.
Invocations will calculate the number of times a function has been invoked in response to invocation API call or to an event which substitutes the RequestCount metric. All of this includes the successful and failed invocations, but it doesn’t include the throttled attempts. You should note that AWS Lambda will send mentioned metrics to CloudWatch only if their value is at the point of nonzero.
Errors will primarily measure the number of failed invocations that happened because of the errors in the function itself which is a substitution for ErrorCount metric. Failed invocations are able to start a retry attempt which can be successful.
There are limitations we must mention:
DeadLetterErrors can start a discrete increase in numbers when Lambda is not able to write the failed payload event to your pre-configured DeadLetter lines. This incursion could happen due to permission errors, misconfigured resources, timeouts or even because of the throttles from downstream services.
Duration will measure the real-time beginning when the function code starts performing as a result of an invocation up until it stops executing. The duration will be rounded up to closest 100 milliseconds for billing. It’s notable that AWS Lambda sends these metrics to CloudWatch only if the value is nonzero.
Throttles will calculate the number of times a Lambda function has attempted an invocation and were throttled by the invocation rates that exceed the users’ concurrent limit (429 error code). You should also be aware that the failed invocations may trigger retry attempts automatically which can be successful.
Iterator Age is used for stream-based invocations only. These functions are triggered by one of the two streams: Amazon’s DynamoDB or Kinesis stream. Measuring the age of the last record for every batch of record processed. Age is the sole difference from the time Lambda receives the batch and the time the last record from the batch was written into the stream.
Concurrent Executions are basically an aggregate metric system for all functions inside the account, as well as for all other functions with a custom concurrent pre-set limit. Concurrent executions are not applicable for different forms and versions of functions. Basically, this means that it measures the sum of concurrent executions in a particular function from a certain point in time. It is crucial for it to be viewed as an average metric considering its aggregated across the time period.
Unreserved Concurrent Executions are almost the same as Concurrent Executions, but they represent the sum of the concurrency of the functions that don’t have custom concurrent limits specified. They apply only to the user’s account, and they need to be viewed as an average metric if they’re aggregated across the period of time.
Tough question. While Cloudwatch is a great tool, the second you have more lambdas in your system you’ll find it very hard to debug or even understand your errors due to the large volume of information. Dashbird, on the other hand, offers details about your invocations and errors that are simple and concise and have a lot more flexibility when it comes to customization. My colleague
I’d be remiss not to make an observation: with AWS CloudWatch whenever a function is invoked, they spin up a micro-container to serve the request and open a log stream in CloudWatch for it. They re-use the same log stream as long as this container remains alive. This means the same log stream gets logs from multiple invocations in one place.