The need for speed - TTFB
April 1, 2024

In the fast-paced digital landscape, website and app performance is no longer a luxury—it's an absolute necessity. Numerous studies have shown that a sluggish website or mobile app directly translates into lost revenue and skyrocketing bounce rates. So, when we talk about performance, what exactly are we referring to?

Traditionally, web performance has been defined as the speed at which web pages load onto a user's browser. However, this definition doesn't paint the full picture. I've never heard a user complain about technical metrics. Instead, they express frustration with issues like "I can't see the images," "Why can't I click the checkout button," or "I don't know, but I can't scroll down." It's crucial to understand that, at the end of the day, the user is our main character—the one who truly matters.

In the past, web page performance was evaluated using metrics like Time to First Byte (TTFB) and Time to Interactive (TTI). TTFB quantifies the time from making an HTTP request to receiving the first byte of data, while TTI measures how long it takes for a webpage to become fully interactive. However, these metrics don't always align with the user's perception of performance. You can have impressive numbers for both metrics, yet users may still feel that the website or app is slow.

To address this discrepancy, Google introduced Core Web Vitals, a set of user-centric metrics that represent critical aspects of the user experience. These metrics—Largest Contentful Paint (LCP), Interaction to Next Paint (INP), and Cumulative Layout Shift (CLS)—are measurable in the field and reflect real-world user experiences. LCP assesses loading performance, INP evaluates interactivity, and CLS gauges visual stability. By focusing on these metrics, we can establish a stronger correlation between good performance scores and reduced bounce rates or increased conversions.

However, achieving optimal performance requires a holistic approach. It's not just about excelling in one metric; you need to hit the mark across all of them to truly be in the top 1% of websites and apps. And today, we'll explore how to do just that.

It's important to note that improving performance is a collaborative effort between the Content Delivery Network (CDN) and the backend/web server. Some metrics are more influenced by the CDN, while others are primarily impacted by the backend server.

Let's dive into TTFB, a metric heavily influenced by the CDN. In simple terms, TTFB represents the time it takes for a request to travel from point A to point B, get processed by the server, and return to point A. Excluding the server processing time, the remaining duration is all about delivering traffic efficiently.

This is where a CDN comes into play. Imagine a user in Rome, Italy, trying to access a website hosted on a server in Virginia, USA. Without a CDN, the request would have to travel from Rome to Virginia and back, resulting in a suboptimal experience compared to a user in Dallas, who is much closer to the server. The goal is to provide a consistent experience for all users, regardless of their location.

A CDN solves this problem by storing website content on servers distributed around the world. When the Italian user attempts to access the website, they will receive most of the content from a CDN server much closer to Rome than the origin server in Virginia. Cloudflare, for example, boasts a network of over 300 cities, ensuring proximity to large populations worldwide.

But what about personalized user information? How can a CDN store website content without compromising user-specific data? These are excellent questions, and the answer lies in caching.

Caching refers to the practice of storing website content on CDN servers for faster delivery to users. The main rule of CDN caching is to cache static content while avoiding caching dynamic content. For instance, images should be cached, but dynamically generated HTML should not. Anything that is consistent across all users can be safely stored in the cache.

Caching behavior is controlled through the cache-control header, which is sent by the web server with each request. This header specifies whether an object can be cached and, if so, for how long. Most modern web frameworks automatically inject the appropriate cache-control header based on the file type, path, and other parameters.

Example of a cache-control header indicating that content cannot be cached

Cache-Control:no-cache, no-store, max-age=0, must-revalidate

Example of a cache-control header indicating that content can be cached and for how long

Cache-Control: public, max-age=604800

To identify cacheable assets, you can use browser developer tools. Open your website in an incognito window, load the site, and navigate to the network tab. Sort the assets by time and examine each one from top to bottom. Ask yourself: Can this asset be cached? If yes, why isn't it being cached?

If the issue lies with the backend/web server not sending the correct cache-control header, Cloudflare allows you to force caching even when the header suggests otherwise. However, user-specific content like HTML should never be cached.

What about cached assets that are frequently evicted from the CDN cache? This is known as cache eviction and occurs when an object is not accessed regularly, and the CDN needs to free up space for newer objects. Most CDNs use algorithms like Least Recently Used (LRU) for cache eviction.

If an uncached asset has a high response time, it's expected because the CDN cannot deliver the object, and it must be fetched directly from the backend/web server. Subsequent requests should be served from the CDN cache.

But what if the backend/web server is in the same city, and the response time is still high? This is where server processing time comes into play. Requests involving complex database queries, content fetching from microservices, or external API calls can take longer to process.

To determine whether the issue lies with the server or the request path, you can use browser developer tools to examine the timing breakdown of a specific request. The "Waiting for Server Response" metric indicates the server processing time, while "Request Send" represents the request time, and "Content Download" represents the response time.

For a more developer-friendly approach, you can utilize the Server-Timing header. This header accepts string values, allowing you to set timers for each operation and write the results to the header. This way, you can identify whether databases, disks, or other elements are consuming significant time within your server operations.

Server-Timing: db;desc="Database";dur=121.3, ssr;desc="Server-side Rendering";dur=212.2

Another crucial factor in optimizing TTFB is hosting. For websites with low traffic, shared and virtualized hosting may suffice. However, mission-critical applications handling high volumes of concurrent users, database queries, or other intensive operations require properly sized hosting. It's essential to monitor the consumption of RAM, CPU, disk, I/O, and other resources to ensure optimal performance.

In conclusion, TTFB is a critical metric that can be greatly improved by selecting the right hosting solution and implementing a tailored Cloudflare configuration. As mentioned earlier, TTFB directly impacts the Web Vitals metrics, particularly LCP and FCP.

At zeroteam.dev, we have helped numerous customers customize their caching strategies, resulting in faster website speeds and nearly perfect Web Vitals scores. By understanding the intricacies of performance optimization and leveraging the power of CDNs like Cloudflare, you can deliver exceptional user experiences and stay ahead in the competitive digital landscape.

On the next chapter will talk about TTI Time to interactive and how can we improve it.