Knative Serving: Streamlining Microservice Deployment on Kubernetes

A short peak into Simplified API Management and Autoscaling

Kubernetes has revolutionized the way organizations deploy and manage applications at scale. However, its complexity can be daunting for developers who may not be familiar with container orchestration concepts. Enter Knative Serving, a Kubernetes-based platform that simplifies the deployment and scaling of serverless applications.

Knative Serving: Making Kubernetes Accessible to All Developers

Knative Serving builds on Kubernetes to support deploying and serving of applications and functions as serverless, autoscaling services. It abstracts away much of the Kubernetes-specific workflow, allowing developers to focus on writing code. This simplifies the deployment model, where a single configuration file can replace a myriad of Kubernetes objects and commands.

Scaling Microservices with Knative Serving

One of the standout features of Knative Serving is its ability to automatically manage the scale of your applications, including scaling down to zero when services are not in use. This feature is particularly useful for cost-saving and efficient resource utilization.

Knative Serving supports various scaling metrics and parameters, allowing for fine-tuned control over how your applications respond to traffic demands. Developers can specify the number of concurrent requests per pod and control the ramp-up and cool-down behavior of their services.

Example: Knative Serving Configuration

Here’s a look at a Knative Service configuration that showcases the simplicity of getting a service up and running:

kind: Service
  name: example-service
  namespace: default
      - image:
        - containerPort: 8080

Note that the configuration file is significantly shorter than the equivalent Kubernetes deployment file, which would require additional objects such as a deployment, service, and ingress. Knative has sensible defaults for many of its parameters, allowing developers to get started quickly. Knative wil take care of the rest, including creating the necessary Kubernetes objects and managing the scaling of your service.

  • Health check using the container port.
  • Deploying the service will create a new revision of the application.
  • The revision will be scaled to zero if there are no requests for a specified period of time.
  • A new revision will be created when the service is updated, allowing for seamless rollouts and rollbacks.
  • Traffic splitting can be configured to allow for canary rollouts and A/B testing.

However, Knative Serving also provides the flexibility to customize many parameters to suit your needs.


For example the autoscaling configuration can be modified to specify the minimum and maximum number of pods, the maximum number of concurrent requests per pod, and the target CPU utilization percentage. The default autoscaling in vanilla Kubernetes is the Horizontal Pod Autoscaler (HPA), which scales based on CPU utilization. Knative Serving uses a custom autoscaler that supports scaling based on concurrency, which is more suitable for serverless applications.

The default in Knative Serving is identical to using those annotations on the service:

      annotations: "concurrency" "70" 

To revert back to plain CPU-based autoscaling, you can use the following annotations: "" "cpu" "100"

which would scale up another pod if the CPU utilization of the current pod is at 100%.

More information on Knative Serving configuration can be found in the official documentation.

Effortless Deployment Pipelines with ArgoCD

ArgoCD can integrate with Knative Serving to create a seamless deployment pipeline. This GitOps tool allows developers to simply merge changes into specific branches, such as the main branch for integration or deployment branches for staging and production environments, to initiate automated deployment processes.

A Continuous Integration (CI) process like Github Actions can be triggered by a merge into the main branch, which will build the container image and tag it with a version. A subsequent merge into a deployment branch can prompt ArgoCD to deploy the tagged image to the respective environment.

Branching Strategy

To visualize the workflow, imagine a branching strategy resembling the following:

[main] ---- [development] ---- [feature branches]
   \                                  /
    \-- [staging] -- [QA] -- [production]

The only interface for developers is the GitHub UI, no special tools or knowledge of Kubernetes is required. This allows for a clear separation of concerns, where developers can focus on writing code and leave the deployment and scaling to Knative Serving and ArgoCD.

Knative Serving vs. AWS Lambda

Knative Serving offers a similar proposition to AWS Lambda in that it removes the need for developers to manage the underlying infrastructure. However, unlike the closed AWS Lambda environment, Knative operates on the open-source Kubernetes system, allowing for use across multiple cloud providers or on-premises environments. It also hooks into the Kubernetes ecosystem, allowing for seamless integration with other tools and services

In Conclusion

Knative Serving stands as a robust solution for teams seeking the benefits of serverless architectures without the intricate knowledge of Kubernetes. It simplifies application deployment, automates scaling to match demand, and integrates easily with modern development workflows. By providing developers with tools that are easy to use and manage, Knative ensures that the focus remains on creating value through application functionality, not infrastructure complexity.

For organizations already invested in Kubernetes, Knative Serving offers a way to streamline and enhance their deployment strategies without the need for extensive Kubernetes expertise, thus further democratizing the power of container orchestration.

PS: Knative Eventing

Knative not only offers the Serving component but also an event mesh and primitives to control delivery of async events. This allows for a more complete serverless experience, where events can trigger serverless functions and services. This is a topic for another post, but I wanted to mention it here as it is a powerful feature of Knative.

Lukas Pfannschmidt
Lukas Pfannschmidt
Sr. Machine Learning Engineer

My expertise now encompasses advanced areas in machine learning and backend system optimization. My work in backend optimization, particularly in managing large-scale data efficiently, aligns with key optimization metrics like cost, quality, and speed. I also contribute to improving overall system reliability and observability, significantly reducing error rates and establishing critical technical metrics. These endeavors complement my previous research interests in high-performance computing.