Serverless

If we use a server, it costs fixed amount every month, but the problem is capacity like a server can hold only up to 1000 requests(say) at once, so the extra requests coming in should wait for their turn
But in serverless, each request have a dedicated separate instance which is handling them, and the instance will be alive until the request is completed and dies after, and here we are charged on how many Gigabytes are used on the how many hours we have used
Let’s see user request cycle on serverless
- The user make a GET request on to Database to get his profile data
- The total time taken to make the request is 100ms(say)
- In this the 15ms goes to making request to Database, 15ms will goes into structuring the request came from Database, and the remaining large chunk of time 70ms will just go into waiting for Database to give response
- It means only 30ms of the CPU is used other all is stale, and traditional serverless will charge you for this stale time too
AWS Lambda - Here each function of your code will be running inside a Firecracker microVM, a lightweight virtual machine monitor built on KVM(Kernel-Based Virtual Memory) and written in Rust. In this we run different applications on the same server, but the kernel is not shared, rather the kernel runs Firecracker which creates microVMs and that microVMs contain separate kernels thus each application having it’s separate kernel of it’s own using KVM
- On invocation, Lambda either reuses a warm microVM or spins up a cold one. Cold starts incur microVM boot time (~10–100ms) and language/runtime startup. AWS’s SnapStart (Java) and proactive initialization occasionally pre-warm environments to reduce cold-start frequency
- Lambda supports up to 1000 concurrent executions per region by default. When concurrency exceeds available execution environments, AWS launches new microVMs until the account limit is reached. Reserved concurrency dedicates a portion of your limit to a function, while Provisioned Concurrency pre-creates execution environments to achieve consistent low-latency starts
- Firecracker enforces hardware-level isolation between microVMs using KVM and Linux cgroups. Each function runs in its own microVM with separate memory, CPU scheduling, and network namespace, preventing noisy-neighbor interference and privilege escalation. Extensions use the Lambda Extensions API within the same microVM, sharing only defined IPC(Inter Process Communication - mechanisms that allow different processes on the same computer to communicate with each other, sharing data and coordinating their actions) channels.
- Firecracker exposes only five paravirtual devices to each microVM: Virtio network, Virtio block storage, Virtio-vsock (for host-guest communication), serial console, A one-keyboard-button device (used to shut down)
- AWS reports being able to create up to 150 microVMs per second per host. That’s the kind of scale needed to serve trillions of Lambda invocations per month
- Services like AWS lambda don’t have a domain name, as they are not backends which runs on a port, they are just functions which get created and die after the function call ends. Between the endpoint we get when lambda is created is an HTTP endpoint of the API Gateway through which we can programmatically invoke lambda functions
AWS Fargate - AWS Fargate runs each containerized task inside its own Firecracker microVM orchestrated by a managed control plane, delivering serverless container execution with hardware‐virtualization–level isolation, automatic data‐plane scaling, and pay-per-use billing
- AWS Lambda is a functions-as-a-service platform that runs short-lived, event-driven functions billed per invocation and GB-second, whereas AWS Fargate is a serverless container engine that runs long-running containerized applications billed per second for vCPU and memory usage
- Control Plane is the fully managed orchestration layer, part of Amazon ECS/EKS, that receives Run Task or Pod API calls, evaluates task definitions (vCPU, memory, networking), places tasks on suitable compute, and issues asynchronous instructions to launch and monitor containers
- while the Data Plane is the invisible fleet of managed compute resources (either EC2 instances running a slim Amazon Linux 2 guest OS with the Fargate agent and Containerd, or Firecracker microVMs) that boot minimal guest kernels, provision ENIs (Elastic Network Interface - a virtual network interface within a Virtual Private Cloud (VPC) in AWS), pull and start containers in isolated environments, and report runtime status back to the control plane
- Service autoscaling adjusts the number of Fargate tasks based on real-time metrics, each task serves as the atomic unit of concurrency, and the data plane transparently grows or shrinks the underlying compute fleet to meet task launch demand. But Each Fargate “task” is the smallest thing that can scale, you can’t auto-scale individual containers inside a task
- Fargate charges for Compute, Memory and Ephemeral storage(temporary, non-persistent storage that is associated with a compute instance or container and is deleted when the instance or container is stopped or terminated) used
Vercel Fluid - Here your code will be running in an instance and it has separate kernel and our instance is independent and will not be sharing with other people’s code. It will Intake the requests parallelly without wasting the unused CPU time
- For example, In the meantime of your serverless server is waiting for request from Database (that 75ms) we can start intaking another request and let the request use the CPU when the first request is not using the CPU(waiting) and we are charged for the total time of the request ran, that is fair as vercel will batch the request incoming in unused CPU time and that’s a good optimization
- But there is one problem, an Fluid Instance can’t be used by multiple users, it can be only used by a single user. It means when I send a request to get Profile data, another request of another user to get profile data can’t be batched up to this fluid Instance
Cloudflare Workers - Here the code is running in an instance in the edge but it is shared instance with other people code, having good enough security in place. Here the kernel and machine is not shared but there is an separate isolation (think of docker) that is created upon the existing hardware and that isolation is V8 Isolation
- V8 isolates are instances of the V8 JavaScript engine, each with its own memory space and JavaScript context, providing strong isolation between different workloads
- The Cloudflare builds on net CPU used, It means let’s say you run a chat bot company, and when the user sends a request to the server and let’s say it takes 1 sec to resolve the request, here about 50ms will go into sending request to backend, 50ms will go into structuring the incoming output from backend, and as the chat bot streams the request there will be tiny tiny 10ms response streams will be coming from backend let’s say about 10 times. Not the total money will be cost is 50ms (Initial request) + 50ms (Structuring the response) + 10 x 10ms (streams) = 200ms. We will charged for 200ms of the CPU time
- This is possible as they don’t need to spin up an separate kernel as Vercel Fluid for every single application as they run multiple applications on single instance separately by creating multiple V8 Instances. Here every request coming to it will share the Event Loop of the V8 Instance spun up. The request will be in the queue of the Event Loop of the V8 Instance and will be processed accordingly even of different users and even of different applications unlike in Vercel Fluid
- On Cloudflare all the code run on workers is converted to WASM (Web Assembly) and if we want to run anything else code rather than converting it into WASM we can use Cloudflare Containers which intake the other language code, run it in a sandbox
Railway - Here the code runs in a separate dedicated containerized (docker) environment, the application has it’s own filesystem, network stack, process space and have strong isolation between multiple applications. Unlike Cloudflare workers and Vercel fluid it provide persistent storage through attached volumes
- Railway operates on a multi-tenant architecture where multiple applications from different users run on shared physical servers, but each application is isolated using dedicated Docker containers
- Railway charges for Actual CPU and Memory utilization used
- Each Railway container can handle multiple concurrent requests within the same instance, similar to traditional server applications
- Railway containers can maintain persistent connections to databases and external services, reducing the overhead of establishing new connections for each request
Fly.io - It delivers global, edge-deployed applications by running each workload inside Firecracker microVMs on shared hardware, with a Rust-based proxy mesh handling networking and a fully usage-based billing model that charges per second for compute, storage, and bandwidth.
- Fly.io’s default operation runs multiple users applications on shared physical servers, each inside isolated Firecracker microVMs(Shared kernel using KVM). while also providing an optional Dedicated Hosts feature for per-customer hardware isolation
- Machines API and flyctl CLI form the control plane, allowing RESTful creation, configuration, scaling, and lifecycle management of microVMs (“Machines”)
- Corrosion, Fly.io’s gossip-based service catalog, propagates state (apps, Machines, services) across all edge and worker nodes, enabling distributed service discovery and autoscaling
- BGP Anycast advertises customer IPv4/IPv6 ranges from every datacenter. Incoming connections hit the nearest edge
- Fly Proxy, a Rust-based daemon on every edge and worker, performs TLS termination, PROXY-protocol insertion, load balancing, and AutoStart/AutoStop of Machines based on traffic patterns
- WireGuard tunnels backhaul(It’s like a Bridge, carrying data traffic from access points like cell towers to the central network infrastructure) proxied traffic from edge servers to the worker hosting the target microVM, minimizing additional latency
- Edge Proxy matches incoming requests to the closest healthy Machine, forwarding over WireGuard; the worker-side Proxy then delivers to the microVMs virtual network interface. All inter-server traffic flows over WireGuard, ensuring end-to-end encryption within the Fly.io network. Apps within the same organization communicate over a virtual private network (six-node 6PN), bypassing the public proxy when needed
- Fly.io will charge for Compute, Storage and Network Engress (The outgoing flow of data from a network to an external destination). Machines scale to zero when idle (configurable via min_machines_running), and Fly Proxy awakens them on new traffic, charging only for storage while stopped
Unikernels - A single‐application operating system compiled with user code into a minimal disk image and booted directly in its own Firecracker microVM, delivering hardware‐virtualization–level isolation, near‐instant boot times, reduced attack surface, and granular resource billing. NanoVMs is a leading lender to provide this technology just like how AWS provides AWS Fargate
- Linux VM/Container vs Unikernel: traditional VM runs full Linux kernel hosting multiple processes and containers share namespaces atop Linux; unikernels embeds only required kernel subsystems and one user process, eliminating context‐switch overhead and userland bloat
- Build Pipeline compiles application and selected POSIX-compatible libraries into a standalone ELF disk image via NanoVMs’ ELF loader; patches out multiuser features (no setuid, no shell), enforces strict page permissions (read/write/execute), and integrates drivers (network, GPU)
- Control Plane can be any orchestration layer (e.g., Kubernetes with Firecracker operator or a serverless scheduler) issuing “boot unikernel” API calls with vCPU, memory, and image URI; NanoVMs provides CLI and SDK to define resource limits, networking (VPC ENI), and custom metadata
- Data Plane is the fleet of Firecracker microVMs launching minimal guest kernels built by NanoVMs, provisioning virtual NICs, mounting ephemeral rootfs, and reporting health via a lightweight agent; each microVM contains exactly one unikernel image and no extra userland
- Image size directly impacts boot latency: a 30 MB Go unikernel boots in tens of milliseconds; stateful applications leverage NanoVMs’ pause/resume and snapshot features for sub-second warm starts
- Autoscaling reacts to real-time metrics by launching or terminating unikernel microVMs; each unikernel instance is the atomic concurrency unit—fine-grained scaling down to single‐threaded workloads
- Billing Model aligns with per-vCPU, per-GB memory, and per-GB ephemeral storage usage billed by the millisecond; eliminates idle overhead of full Linux guests and container daemons
- Performance Characteristics - eliminate user-to-user process context switches, reducing CPU overhead by up to 30%; support full multithreading within a single process, and avoid Linux scheduler noise
- Security Guarantees - include immutable image boots, no shell or SSH access, enforced memory page permissions, null page mapping prevention, and exclusive address space that thwarts container and hypervisor escapes alike
- It supports 16+ cloud providers (AWS, GCP GVNIC, Azure ARM Gen2), commodity and cloud GPUs via native NVIDIA driver ports, and standard debugging via GDB-compatible exports
- Limitations are it encompass lack of multiuser utilities (no interactive debugging in-guest), VM-only deployment (no bare-metal flashing), and evolving ecosystem of third-party unikernel-ready packages

Note:

Containers: Share the host’s Linux kernel and isolate workloads using namespaces and cgroups. They’re fast and lightweight, but security boundaries are weaker, especially in multi-tenant setups
Virtual Machines: Use a hypervisor to emulate hardware and run entire OS instances. They offer strong isolation but are slower to boot and consume more memory
provided.al2: It’s a runtime on AWS Lambda operating system environment that allows us to run code written in any language by uploading your compiled binary or custom runtime. It is based on Amazon Linux 2 (AL2) and offers superior performance, especially on AWS Graviton2 processors, and a more streamlined, faster execution environment compared to older Lambda runtimes. It is the recommended runtime for languages like Go and Rust
Firecracker also helped give rise to the rust-vmm project - a modular set of virtualization components in Rust. Instead of building a monolithic VMM, AWS collaborated with Intel, Red Hat, and others to break Firecracker into reusable building blocks. These components are: Cloud Hypervisor (Intel - run cloud-native VMs with high performance and low footprint), crosvm (Chromium OS), Enarx (Red Hat - Run Web Assembly workloads inside confidential compute environments) , Firecracker itself
QEMU(Quick Emulator) is a free and open-source machine emulator and virtualizer. It can be used to run different operating systems and programs on a variety of hardware architectures, It is an alternative to Firecracker
Weaveworks Ignite: Ignite is a developer tool that lets you run containers inside Firecracker microVMs, using Docker-like commands. Under the hood:
- The OCI image becomes the root filesystem for a microVM
- Ignite supplies the kernel
- Firecracker launches the workload in an isolated VM
Kata Containers: It’s an open-source project that runs each container (or pod) inside a lightweight VM
OpenNebula: It’s a cloud Orchestrator
- Bare-metal clusters can spin up Firecracker microVMs almost instantly
- Firecracker instances replace full VMs for faster provisioning
- Ideal for edge computing and hybrid cloud use cases
- With OpenNebula + Firecracker, organizations can build private or edge clouds that launch secure workloads quickly and at high density, without requiring full hypervisors
Koyeb: Serverless deployment platform that isolates functions with Firecracker
Northflank: PaaS offering container-like UX but with secure VM isolation
Qovery: DevOps automation with Firecracker as a secure runtime
AWS Nitro Enclaves: Secure enclaves inside EC2 instances are actually implemented as Firecracker microVMs that don’t have network access, designed for processing sensitive data securely. e.g., ML model inference with PII(Personally Identifiable Information)
If there is an situation to run an isolated unsafe code in a serverless instance like running an preview code(v0 project maybe) or AI generated code then in vercel we use Vercel Sandbox and in Cloudflare we use Cloudflare Sandbox
Open Router - It’s a company which routes you’re AI API’s to specific best available providers. for example, If you’re using chatGPT then based on the request Open Router will route the request to best available space like may be to OPEN AI servers or to Microsoft Azure server like that. This same service can be also used with Vercel AI Gateway. It is similar what Jupiter do in web3, routing to perfect AMM giving us best price
Cloudflare KV vs Cloudflare Durable Objects:
- Cloudflare KV operates on an eventually consistent model, meaning data changes propagate across the global network over time. This design prioritizes speed and global availability over immediate consistency
  - Globally replicated to all 330+ Cloudflare edge locations
  - Data is cached everywhere simultaneously
  - Reads serve from nearest location
  - Writes propagate eventually
- Durable Objects provide strongly consistent storage with a single, globally unique instance that maintains state. There’s just a single instance of it and it moves around data centers based on where the requests are coming from
  - Single instance per object (not replicated)
  - Dynamically relocates based on request patterns
  - Ping-ponged all over the place on our network
  - Provides global coordination through uniqueness

Sadiq's Knowledge Vaults

Explorer

Graph View