Conceptual

What is Compute?

Learn about the different models for compute and how they can be used with Vercel.
Table of Contents

"Compute" is a term that encompasses the actions taken by a computer. When we talk about it with regards to web development and at Vercel, we use compute to describe actions such as:

  • Building and rendering your app. These are essential operations needed to turn your code into a site that appears for users. Server rendering or partial prerendering are some common methods that could be used to render your site.
  • Working with Vercel Functions. Compute through functions can happen on-demand: either when a user makes a request to your site or they interact with something on your site. This could look like client-side data fetching, required to hydrate the site or provide additional functionality to your app on demand. It could also look like interaction with AI, which requires real-time interaction, with compute heavy workflows, and the streaming of data at a fast pace. Alternatively, features such as Cron Jobs or background ISR revalidation may be run at specific intervals.

Traditionally with web applications, we talk about two main locations:

  • Client: This is the browser on your user's device that sends a request to a server for your application code. It then turns the response it receives from the server into an interface the user can interact with. The term "client" could also be used for any device, including another server, that is making a request to a server.
  • Server: This is the computer in a data center that stores your application code. It receives requests from a client, does some computation, and sends back an appropriate response. This server does not sit in complete isolation; it is usually part of a bigger network designed to deliver your application to users around the world.
    • Origin Server: The server that stores and runs the original version of your app code. When the origin server receives a request, it does some computation before sending a response. The result of this computation work may be cached by a CDN.
    • CDN (Content Delivery Network): This stores static content, such as HTML, in multiple locations around the globe, placed between the client who is requesting and the origin server that is responding. When a user sends a request, the closest CDN will respond with its cached response.
    • The Edge: The edge refers to the edge of the network, closest to the user. While CDNs could be part of the edge, which are also distributed around the world, some edge servers can also run code. This means that caching and code execution can be done at the edge, closer to the user. Vercel has its own Edge Network, which runs middleware and stores build output assets globally.
The request-response cycle between client and server.
The request-response cycle between client and server.

To demonstrate an example of what this looks like in practice, we'll use the example of a Next.js app deployed to Vercel.

When you start a deployment of your Next.js app to Vercel, Vercel's build process creates a build output, that contains artifacts such as bundled Vercel Functions or static assets. It will then deploy either to Vercel's Edge Network or, in the case of a function, to a specified region.

Now that the deployment is ready to serve traffic, a user can visit your site. When they do, the request is sent to the closest edge network region, which will then either serve the static assets or execute the function. The function will then run, and the response will be sent back to the user. At a very high-level this looks like:

  1. User Action: The user interacts with a website by clicking a link, submitting a form, or entering a URL.
  2. HTTP Request: The user's browser sends a request to the server, asking for the resources needed to display the webpage.
  3. Server Processing: The server receives the request, processes it, and prepares the necessary resources. For Vercel Functions, Vercel's gateway triggers a function execution in the region where the function was deployed.
  4. HTTP Response: The server sends back a response to the browser, which includes the requested resources and a status code indicating whether the request was successful. The browser then receives the response, interprets the resources, and displays the webpage to the user.

In this lifecycle, the "Server Processing" step can look very different depending on your needs, the artifacts being requested, and the model of compute that you use. In the next section we'll explore these models, each of which has their own tradeoffs.

Dedicated servers provide a specific environment and resources for your applications. This means that you have control over the environment, but you also have to manage the infrastructure, provision servers, or upgrade hardware. How much you control you have, depends on the dedicated server option you choose. Some options might be: Amazon EC2, Azure Virtual Machines, or Google Compute Engine. All of these services provide you with a virtual machine that you'll configure through their site, will be responsible for provisioning, and will pay for the entire duration of the server's uptime. Other options such as Virtual Private Servers (VPS), dedicated physical servers in a data center, or your own on-premises servers are also considered traditional servers.

Managing your own dedicated servers can look like having a workload that is predicable but static. You don't have a need to scale up or down, and you have a consistent amount of traffic. If you have peaks of traffic, you'll need to anticipate and provision those resources in advance -- often over provisioning and overpaying. The upside is predicable performance and cost, with complete control over the environment and security. The fact that the resource is always available means that you can run long-running processes, and cold-starts are non-existent as nothing needs to be started.

Serverless is a cloud computing model that allows you to build and run applications and services without having to manage infrastructure or your own dedicated servers. It addresses some of the disadvantages of traditional servers, and enables teams to have an infrastructure that is more elastic: resources that are scaled and available based on demand, and have a pricing structure that reflects that. Despite the name, servers are still used.

AWS coined the term "Serverless" to describe the compute used for their Lambda functions, but other cloud providers have adopted similar terminology: Google Cloud Functions and Azure Functions, and Vercel Functions.

The difference between serverless and dedicated servers, is that there is no single server that is dedicated to your application. Instead, when a request is made, a virtual machine on a server is spun up to handle the request, and then spun down when the request is complete. This allows your app to handle unpredictable traffic, use only the resources it needs, and means you only pay for what you use. You do not need to manage the infrastructure, provision servers, or upgrade hardware.

When a function runs on serverless compute and boots up from scratch, this is called a cold boot. When it is re-used, we refer to the function as warm.

Reusing a function means the underlying container that hosts it does not get discarded. State, such as temporary files, memory caches, and sub-processes, are preserved. This empowers the developer not just to minimize the time spent in the booting process, but to also take advantage of caching data (in memory or filesystem) and memoizing expensive computations.

Therefore, by their very nature of being on-demand, serverless architecture will always have the notion of cold starts.

Traditional serverless compute happens in one specified location (or region). This is because once your application is built, it's deployed to a specific region. Having a single (or small number) of regions makes it easier to increase the likelihood of a warm start and increase the benefits of concurrency, as all of your users will be hitting the same instances. You'll likely also only have your data store in a single region, and so for latency reasons, it makes sense to have the trip between your compute and data be as short as possible.

However, a single region can be a disadvantage if you have user request coming from different region, as the response latency will be high.

All of this means that it's left up to teams to determine which region (or regions) they want Vercel to deploy their functions to. This requires taking into account latency between your compute and your data source, as well as latency to your users. In addition, region failover is not automatic, and requires manual intervention.

AI-driven workloads have stretched the limits of serverless compute, through the requirement of long-running processes, data-intensive tasks, a requirement for streaming data, and the need for real-time interaction.

The maximum duration of a function is a key limitation of serverless computing since it operates on a shared infrastructure. It describes the maximum amount of time that a function can run before it is terminated.

As a user, you have to understand and configure the maximum duration, which is a balance between the cost of running the function and the time it takes to complete the task. This can be a challenge, as you may not know how long a task will take to complete, and if you set the duration too low, the function will be terminated before it completes. If you set it too high, you may be paying for resources unnecessarily.

Last updated on January 18, 2025