By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Blog
Blog

The Problem with Serverless Functions

Why Serverless Functions Are Popular 

Cloud-based computing using serverless functions has recently gained widespread popularity. Its appeal for implementing new functionality derives from the simplicity of serverless computing – just write a code snippet and let the cloud infrastructure deploy and trigger it when an event of interest occurs. For example, use a serverless function to analyze an incoming photo or process an event from an IoT device. Don’t bother allocating servers; just pay when a function runs. It’s fast, simple, and scalable. The major cloud vendors, including AWS, Microsoft, and Google, all have serverless functions. 

For simple or ad hoc workflows, serverless functions make a lot of sense. But are they appropriate for complex workflows that read and update persisted, mission-critical data sets? For example, consider an airline that manages thousands of flights and hundreds of thousands of passengers on a typical day. Modern scalable, NO-SQL data stores (like Amazon Dynamo DB or Azure Cosmos DB) can store data describing flights, passengers, bags, gate assignments, pilot scheduling, and much more, keeping it continuously available even after data center outages. Serverless functions can process events, such as flight cancellations and passenger rebookings, by retrieving persisted data and running proprietary algorithms. But how well do they handle complex workflows like this? Is there a better way to tackle this challenge? 

Issues and Limitations 

The very strength of serverless functions, namely that they are serverless, creates a built-in limitation. Because they are just code snippets, they must retrieve data from external data stores and then update those stores with their results. Two-way data motion on every invocation adds overhead that limits performance, and serverless functions can’t take advantage of in-memory caching to avoid data motion. 

Data Motion with Serverless Functions

Also, they don’t provide a coherent software architecture for building large systems that use them as components. Developers need to informally enforce a clean “separation of concerns” when defining the code that each function runs. Consider a function that rebooks an airline passenger. Does this function also update the flights that the passenger moves between, or does it invoke separate functions to update the flights? It’s easy to see how the code within serverless functions could implement overlapping actions and evolve into a complex, unmanageable code base. We’re not even talking yet about handling locks on stored data to synchronize access by multiple updating functions while avoiding deadlocks.  

A third issue that surfaces when implementing serverless functions is that cloud platforms expect application code to deal with exceptions, such as timeouts and quota limits, and perform retries when problems occur. This complicates application logic, which must assume that it runs in an unstable environment and deal with these issues.

Move the Code to the Data

We can avoid the limitations of serverless functions by doing the opposite: moving the code to the data. Instead of defining a function that operates on remotely stored data, we can just send a message to an object held in memory and invoke one of the object-oriented APIs for its data type. If needed, the in-memory platform can retrieve the object from a persistent cloud store (like Dynamo DB), and it can write back changes to the store after the API completes. It holds the object in memory and keeps it highly available for as long as the object remains active.

Computing Where the Data Lives

This approach to function execution has similarities to data-structure stores, like Redis, which provide pre-defined functions for stored objects. However, our store is now fully extensible and can run any user-defined methods on typed data.

By connecting multiple objects together to implement a message flow, such as cancelling a flight in the airline example, we can create a computing graph that looks like an actor model. However, now the nodes of the graph are user objects and not unstructured functional components. This allows us to focus on changes to the data instead of an abstract computing model.

Inspired by the concept of in-memory digital twins, this new software architecture has several advantages over serverless functions. The use of an in-memory store avoids repeated round trips to and from a cloud store when processing active objects. Built-in high availability lets us slowly persist updates to the cloud store to further boost performance. Highly available message-processing avoids the need for application code to deal with exceptions from the environment. Running application-defined methods on typed objects lets developers build complex workflows with structured access to data and no locking. Lastly, performance transparently scales to process workflows with very large numbers of objects.

Benchmarking an Example

To evaluate the benefits of moving code to the data, we compared a simple workflow in AWS using serverless functions (with AWS Lambda) to the same workflow implemented using ScaleOut Digital Twins™, a scalable, in-memory computing architecture. We designed a workflow that an airline might use to cancel a flight and rebook all passengers on other flights. This workflow used two data types: flight objects and passenger objects, and all instances were stored in Dynamo DB. An event controller triggered cancellation for a group of flights and measured the time required to complete all rebookings.

In the serverless implementation, the event controller triggered a lambda function to cancel each flight. This function triggered lambda functions for all passengers in its manifest. Each “passenger lambda” rebooked a passenger by selecting a different flight, updating the passenger’s information. It then triggered functions that confirmed removal from the original flight and added the passenger to the new flight. Note that these functions required the use of locking to synchronize access to Dynamo DB objects.

Workflow Implemented Using Serverless Functions

The digital twin implementation dynamically created in-memory objects for all flights and passengers when these objects were accessed from Dynamo DB. The event controller sent cancellation messages to the flight objects to be cancelled, and using the flight’s manifest, these digital twin objects sent messages to the respective passenger digital twin objects. The passenger digital twins rebooked themselves by selecting a different flight and sending messages to both the old and new flights. Application code did not need to use locking, and the in-memory platform automatically persisted updates to the digital twin objects back to Dynamo DB.

Workflow Implemented Using Digital Twins

Performance measurements showed that the digital twins processed 25 flight cancellations with 100 passengers per flight more than 11X faster than serverless functions. We could not scale serverless functions to run our target workload of cancelling 250 flights with 250 passengers each, but ScaleOut Digital Twins had no difficulty processing even double this target workload (500 flights).

Preformance Measearments

We ran into numerous issues when implementing this workflow using serverless functions. We had to carefully consider design choices in creating the functions so that we preserved an object-oriented approach (e.g., passenger lambdas should not update flight objects). We had to add code to handle exceptions for timeouts, and we repeatedly hit resource limits. We also needed to implement locking APIs using low-level mechanisms exposed by Dynamo DB. We couldn’t make use of Dynamo DB’s in-memory caching (Dax) to boost performance because it has eventual consistency and doesn’t accelerate writes. Lastly, debugging was a challenge because configuration errors often caused the lambdas to silently fail.

Summing Up

While serverless functions are highly suitable for small, often ad hoc functionality, they may not be the best choice when building complex workflows that manage many data objects. The simple but powerful notion of moving the code to the data proves its worth in multiple ways. It simplifies design by preserving structured access to data, and it significantly boosts performance by reducing data motion. A software architecture that creates in-memory “twins” of typed, persistent data objects and uses messages to invoke application APIs offers a compelling approach to building complex workflows in the cloud.