How to use AWS Lambda provisioned concurrency against cold startups ?

How to use AWS Lambda provisioned concurrency against cold startups ?

The biggest advantage of functions is there pay-per-use model. But this also means that the underneath hardware will scale down to zero when not used. When this happens and a function is called, this has become known in the community as a cold function (opposed to a warm function). The benefit of lower costs has a downside of a longer startup during the first call. Especially for runtimes that take longer to start (think Java) this can be bothersome.

In the past you could configure different kind of warmer mechanisms to help you work around this. From the basic curl/ping to more advanced Warmup orchestration Lambda functions and CloudWatch events as clients. Also the Serverless framework has multiple plugins like this one to help you automate this. And looking at the downloads there is a demand for warm Lambda functions.

AWS probably noticed the same and during Re-Invent 2019 they announced Provisioned Concurrency for AWS Lambda. As they state themself functions using Provisioned Concurrency execute with consistent start-up latency making them ideal for building interactive mobile or web backends, latency sensitive microservices, and synchronously invoked APIs. Basically, you pay a bit more and they keep your function warm for you.

Since the downside of running AWS Lambda functions within a VPC is a slower startup it was quite interesting for us to implement this on some of our private API Gateway hosted serverless application and get rid of the custom tooling. Less complexity is always good.

So let's look how this is configured in the Serverless framework:

    handler: src/handlers/customer/get-handler.get
    provisionedConcurrency: 1
      - http:
          method: get
          path: /customer/{id}
          private: true

This results after deployment in the configuration of an alias with the set value.

Below image from the AWS blog shows an exaggerated example but the point is clear. With Provisioned Concurrency enabled, user experience is much more stable. This can be very interesting for public facing portals with focus on customer user-experience  in place.

The costs

Provisioned Concurrency adds a pricing dimension to the existing dimensions of Duration and Requests. You pay for the amount of concurrency that you configure and for the  period of time that you configure it. When Provisioned Concurrency is  enabled for your function and you execute it, you also pay for Requests  and Duration based on the normal Lambda rates. If the concurrency for your  function exceeds the configured concurrency, you will be billed for  executing the excess functions at the normal Lambda rates. More information here.


As always the devil is in the details. So to properly auto-tune your serverless application you can use the following new metrics to get a good understanding of your app on runtime.

  • ProvisionedConcurrencyInvocations: Number of invocations as part of the Provisioned Concurrency
  • ProvisionedConcurrentExecutions: Number of simultaneous execution environments in use as part of the Provisioned Concurrency
  • ProvisionedConcurrencyUtilization: Utilization percentage of the Provisioned Concurrency
  • ProvisionedConcurrencySpilloverInvocations: Number of invocations that are above Provisioned Concurrency


  • At this moment Provisioned Concurrency is not supported with Lambda@Edge.