Computing Applications

Designing APIs For Mobile Performance Best Practices

Kate Matsudaira


Designing APIs to Work Well for Mobile


While there are lots of informative articles on mobile performance, and just as many on general API design, you’ll find few discussing the design considerations needed to optimize the performance of backend APIs for mobile client usage (if you know of any, please add them to the comments).

Certainly, optimizing the on-mobile performance of the application is critical. But we, as infrastructure engineers, can do a lot to ensure that mobile clients be remotely served both data and application resources in a reliably performant manner, ultimately enabling and preserving a positive mobile application user experience.

These are special considerations to take into account when engineering mobile-based applications:

  • Limited screen size – less space for data and smaller images
  •  Smaller number of simultaneous connections – this one is important because unlike web browsers that can run many concurrent asynchronous requests, mobile browsers have limited number of connections per domain at any given moment
  • The network is slower – network performance is heavily affected by general poor signal reception, multiple cellular handovers (and even though some clients are on Wi-Fi, some networks are congested and can require additional lookups if a user changes cell towers)
  • Slower processing power – extensive client-side computations, 3D graphics rendering, and heavy JavaScript usage can greatly affect performance
  •  Smaller caches – mobile clients are generally memory-restricted so it is best not to rely heavily on cached content for performance
  •  "Special" browsers – in many ways the mobile browser ecosystem is reminiscent of the fragmented desktop browser scene of several years ago, with mobile vendors producing versions with fatal deficiencies and incompatibilities

While there are many ways of tackling these unique obstacles of mobile performance, this article is largely focused on things that can be done from an API or backend service to improve the performance (or the perception thereof) of mobile clients.  The article is focused on 2 main parts:

  1. Minimizing network connections and the need to transmit data – efficient media handling, effective caching, and employing longer data-oriented operations with fewer connections.
  2. Sending the "right" data across the network – designing APIs to return only the data that is needed/requested, and optimizing for the various types of forms of mobile devices

Although this article is solely focused on mobile many of the following lessons and ideas can be readily applied to other API client forms as well.


Minimize connections & data across the network

Minimizing the number of HTTP requests a required to render a web page is undoubtedly one of the biggest things that can be done to improve mobile performance.  And there are lots of ways to do this, and the exact approach may depend on your data.



Making a single request for each image on the page can result in speed improvements, and allows one to take advantage of caching for each individual image.  The browser is able to execute each request quickly and in parallel so there isn’t a big performance hit for making many requests (and with the caching benefits there can even be performance gains).  However, this can be a killer on mobile.

Minimizing image requests can reduce the number of requests, and in some cases the amount of data that needs to be sent (which can also help mobile performance).    Here are some strategies to consider:


Image Sprites

Using image sprites can reduce the number of individual images that need to be downloaded from the server. But this approach has downsides because sprites can be cumbersome to maintain, and difficult to generate in some circumstances (such as on product search results where you are showing thumbnail images for many products).


Use CSS instead of images

Avoiding images where possible and using CSS rendering for shadows, gradients, and other effects can reduce the amount of bytes that need to be transmitted and downloaded.


Supporting Responsive Images

A popular way of delivering the right image to the right device is using responsive images.  Apple does this by loading regular images, and then replacing them with the high resolution ones using JavaScript (source).  There are several other ways of approaching this problem as well, but the issue is far from solved.

In these cases you should make sure that the server side support and APIs are able to support different versions of the same image, and the exact way to do that will depend on the approach of the clients.


Use Data URIs for images inline to minimize extra requests

An alternative to sprites is to use Data URIs to embed images inline within the HTML itself.  This makes the images part of the overall page and while the URI encoded images can be larger in terms of bytes, they compress better with Gzip compression which helps minimize the effect of transmitting additional data.

TIP: If using URIs make sure to:

  •  Resize images to the appropriate size before encoding into the URI payload
  •  Gzip compress responses (to take advantage of compression)
  • Note that URI encoded images are part of the CSS of the page, and as a result caching of individual images is more difficult so don’t use this approach if there is good reasons to cache the image locally (i.e. it is reused a lot on several page)


Leverage localStorage and caching

Since mobile networks can be slow, HTML, CSS and images can be stored in localStorage to make the mobile experience faster.  Here is a great case study on Bing’s improvements using localStorage for mobile to reduce the size of their HTML document from ~200 kB to ~30 kB.

One great way to improved perceived performance is by pre-fetching data that will be used throughout the mobile experience so it can be loaded directly on the device without additional requests; such as with paginated results, popular queries, or user data.   Thinking about this use case and factoring it into your API design will allow you to create APIs designed for prefetching and caching data before the user interacts with it, increasing the perception of responsiveness. 

TIP: For data that is not likely to change between app updates (like categories or main navigation) it is worth shipping inside the app so it never requires a trip across the network.

Ideally you want to transfer data when needed, and preload data when advantageous to do so. If an image or content will not be seen by the end-user then don’t send it (this is particularly important for responsive sites since some just "hide" elements). One great use-case for pre-fetching images is in a gallery of image results, it is worth downloading the previous and next image to speed up the UI, but be careful not to go overboard and fetch too many that may not be seen.


Pulling data out of local storage can negatively impact performance, but it is typically much less than going across the network.  And in addition to localStorage, some apps are using other features in HTML5, such as appCache to improve performance and startup time.


TIP:  By embedding CSS and JavaScript directly within a single web request, then storing a reference to those files and passing the references via a cookie to the server, can prevent the need for the client to download those assets again (so only new files are sent over the network).  This can save a lot of time and is a great trick to leverage local caching.  This article has more details on how to directly embed and then reference these files.



Non-blocking IO

When it comes to client optimizations it is well known to watch out for blocking JavaScript execution that can have big impacts on the perception of performance.  However, this is even more important when it comes to APIs.  If there is a longer API call, such as one that could rely on a 3rd party and might time out, it is important to implement this as non-blocking (or even long waiting), and instead choose a polling or triggering model.

  • Polling (pull-based model):  In a polling API the client will make a request and then periodically check for the results of that request, periodically backing off if required. 
  • Triggering (push-based model): In a trigger API the call makes the request and then listens for a response from the server.  The server is provided a call back so it can trigger an event letting the caller know the results are available. 

Triggering APIs are typically harder to implement as mobile clients are unreliable and as a result polling is a much better option in most cases.

For example, for the Decide mobile app we had local prices on product pages that would show where each product was available locally. Since those results were delivered by a 3rd party, implementing a polling API allowed us to make a request for results and then pull for the results instead of halting and keeping the connection open while we waited for the 3rd party results.

In general you want to make sure that APIs return quickly and don’t block while waiting for results since mobile clients have a limited number of connections. 

Tip:  Avoid chatty APIs.  It is important in slow network situations to avoid several API calls.  A good rule of thumb is to have all the data needed to render a page returned in a single API call.

In cases where some components are significantly slower than others on the server side, it can be worth breaking the API into separate calls using typical response time as a factor. That way the client can start rendering page from the initial fast response calls while waiting for the slower ones.  Aim to minimize the time-to-text rendering on the screen.


Avoid redirects & minimize DNS lookups

When it comes to requests redirects can negatively impact performance, especially if they cross domains and require a DNS lookup.

For example, many sites handle their mobile site using a client-side redirect; such that when a mobile client goes to their main site URL (i.e. they would redirect to the client to the mobile site (this is especially common when the sites are built on different technology stacks).  Here is an example of how this works:


1.      A user googles "yahoo" and clicks on the first link in the results

2.      Google captures the click using their own tracking URL, and then redirects to the phone to [redirect]

3.      Google's redirect response goes through the cell tower and then back to the phone

4.      Then there is a DNS lookup for

5.      The IP resulting from the DNS lookup is sent through the cell tower then back to the phone

6.      When the phone hits it is recognized as a mobile client and redirected to [redirect]

7.      The phone then has to do another DNS lookup for that subdomain (

8.      The IP resulting from the DNS lookup is sent through the cell tower and then back to the phone

9.      Finally the resulting HTML and assets are sent back through the cell tower and then to the phone

10.   Some of the images on pages of the mobile site are served via a CDN referencing yet another domain,

11.   The phone then has to do another DNS lookup for that subdomain (

12.   The IP resulting from the DNS lookup is sent through the cell tower and then back to the phone

13.   Finally the images are rendered, completing the page.


As you can see from this example there is a lot of overheard in these requests, they can be avoided using redirects on the server side (so routing via the server and keeping DNS lookups and redirects to a minimum on the client), or by using responsive techniques.

TIP: If DNS lookups are unavoidable, try using DNS prefetching for known domains to save time.


HTTP Pipelining & SPDY

Another technique that can be useful is HTTP pipelining, which allows one to combine multiple requests into one.  Although if I were to implement an optimization translation layer I would opt for SPDY, which essentially optimizes HTTP requests to make them much more efficient and is getting traction in places such as Amazon’s Kindle browser, Twitter and Google.


Send the "right" data

Depending on the client, the experience may require different files, CSS, JavaScript, or even the number of results.  Creating APIs in a way that supports different permutations and versions of results and files will give the most flexibility to create amazing client experiences.


Use a limit and offset to get results

As with regular APIs fetching results using a limit and offset allows clients to request ranges of the data that make sense for the client’s use case (so fewer results for mobile).  I prefer the limit and offset notation, as it is more common (than say start and next), well understood in most databases, and therefore easy to build on.


Choose a default that caters either to the lowest or highest common denominator; depending on which clients are more important to your business (smaller if mobile clients are your biggest users, or bigger if users are most likely to be on their desktops, such as a B2B website or service).


Support partial response and partial update

Design your APIs to allow clients to request just the information that they need. This means that APIs should support a set of fields, instead of returning the full resource representation each time.  By avoiding the need for clients to collect and parse unnecessary data it can simplify the requests and improve performance.

Partial update allows clients to do the same thing with data they are writing to the API (thereby avoiding the need to specify all elements within the resource taxonomy).


Google supports partial response by adding optional fields in a comma-delimited list as follows:,gd:when)

For each call specifying entry indicates that the caller is only requesting a partial set of fields.


Avoid/minimize cookies

Since every time a client sends a request to the domain it will include all of the cookies that it has from that domain – even duplicated entries or extraneous values.  This means that keeping cookies small (and not requiring them if they aren’t need) is another way to keep payloads down and performance up. Don’t use or require cookies unless necessary.  Serve static content that doesn’t require permissions from a cookieless domain (such as images off a static domain or CDN).  For more information here are some best practices for cookies and performance.


Establish device profiles for APIs

With the many different screen sizes and resolutions on desktops, tablets, and mobile phones it is helpful to establish a set of profiles you plan to support.  For each profile you can deliver different images, data, and files so they suit each device, you can do this using media queries on the client.

The more profiles the better each experience can be on a device, but for all the different functions and scenarios that are supported the harder they will be to maintain (since devices are constantly changing and evolving).  As a result it is smart to only support as many profiles as absolutely necessary.  This is a great reference when thinking about some of the tradeoffs and options for creating great experiences on different devices.

For most applications 3 profiles may be sufficient:

1.               Mobile – smaller images, touch enabled and low bandwidth

2.               Tablet – larger images designed for lower bandwidth, touch enabled, more data per request

3.               Desktop – larger, high resolution images designed for tablets with high resolution and Wi-Fi or desktop browsers


Choosing the right profile can be done on the client.  On the server side APIs should be designed to take these profiles as input and send different information based on the device making the request.  This may mean sending smaller images, fewer results, or inline CSS and JavaScript.

For example, if one of your APIs returns search results to the client each profile might be have differently as follows:


Would used the default profile (desktop) and would serve up the standard page making a request for each image so subsequent product views could be loaded from cache


Would return 10 product results and use the low-resolution images encoded as URIs with the same HTTP request 


Would return 20 product results using the larger size low-resolution images encoded as URIs with the same HTTP request


You can even create special profiles for things like feature phones.  Since unlike smartphones, feature phones can only cache files on a per page basis, it is better to send CSS and JavaScript with each request for these clients.  Using profiles is an easy way to support that functionality server side. 

One reason to use profiles instead of partial responses is when the response from the server is drastically different per profile.  For example, if the response has inline URI images and compact layout in one case but not the other.   Of course profiles could be specified using a "partial response," although typically it is used to specify a part (or portion) of a standard schema (like a subset of a larger taxonomy), not a whole different set of data, format, etc.


In conclusion


There are a lot of ways to make the web faster, including mobile.  Hopefully this will be a useful reference for the API developers that are designing the backend systems to be leveraged by mobile clients.


If you have other ideas, suggestions, or resources please leave them in the comments.

And a big thank you to Bryce Howard, Leon Stein, and Ian Ma for reading drafts of this post.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More