Partitioning Behavior of DynamoDB


Reading Time: 7 minutes

This is the third part of a three-part series on working with DynamoDB. The previous article, Querying and Pagination with DynamoDB, focuses on different ways you can query in DynamoDB, when to choose which operation, the importance of choosing the right indexes for query flexibility, and the proper way to handle errors and pagination.

As discussed in the first article, Working with DynamoDB, the reason I chose to work with DynamoDB was primarily because of its ability to handle massive data with single-digit millisecond latency. Scaling, throughput, architecture, hardware provisioning is all handled by DynamoDB.

While it all sounds well and good to ignore all the complexities involved in the process, it is fascinating to understand the parts that you can control to make better use of DynamoDB.

This article focuses on how DynamoDB handles partitioning and what effects it can have on performance.

What Are Partitions?

A partition is an allocation of storage for a table, backed by solid-state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS region.

Data in DynamoDB is spread across multiple DynamoDB partitions. As the data grows and throughput requirements are increased, the number of partitions are increased automatically. DynamoDB handles this process in the background.

When we create an item, the value of the partition key (or hash key) of that item is passed to the internal hash function of DynamoDB. This hash function determines in which partition the item will be stored. When you ask for that item in DynamoDB, the item needs to be searched only from the partition determined by the item’s partition key.

The internal hash function of DynamoDB ensures data is spread evenly across available partitions. This simple mechanism is the magic behind DynamoDB’s performance.

Partitioning in DynamoDB

Limits of a partition

The partition can contain a maximum of 10 GB of data. With size limit for an item being 400 KB, one partition can hold roughly more than 25000 (=10 GB/400 KB) items.

Regardless of the size of the data, the partition can support a maximum of 3000 Read Capacity Units (RCUs) or 1000 Write Capacity Units (WCUs).

When and How Partitions Are Created

Taking a more in-depth look at the circumstances for creating a partition, let’s first explore how DynamoDB allocates partitions.

Initial allocation of partitions

When a table is first created, the provisioned throughput capacity of the table determines how many partitions will be created. The following equation from the DynamoDB Developer Guide helps you calculate how many partitions are created initially.

( readCapacityUnits / 3,000 ) + ( writeCapacityUnits / 1,000 ) = initialPartitions (rounded up)

Which means that if you specify RCUs and WCUs at 3000 and 1000 respectively, then the number of initial partitions will be ( 3_000 / 3_000 ) + ( 1_000 / 1_000 ) = 1 + 1 = 2.

Suppose you are launching a read-heavy service like Medium where a few hundred authors generate content and a lot more users are interested in simply reading the content. So you specify RCUs as 1500 and WCUs as 500, which results in one initial partition ( 1_500 / 3000 ) + ( 500 / 1000 ) = 0.5 + 0.5 = 1.

Subsequent allocation of partitions

Let’s go on to suppose that within a few months, the blogging service becomes very popular and lots of authors are publishing their content to reach a larger audience. This increases both write and read operations in DynamoDB tables.

As a result, you scale provisioned RCUs from an initial 1500 units to 2500 and WCUs from 500 units to 1_000 units.

( 2_500 / 3_000 ) + ( 1_000 / 1_000 ) = 1.83 = 2

The single partition splits into two partitions to handle this increased throughput capacity. All existing data is spread evenly across partitions.

Partitions created when Throughput Capacity Units increased

Another important thing to notice here is that the increased capacity units are also spread evenly across newly created partitions. This means that each partition will have 2_500 / 2 => 1_250 RCUs and 1_000 / 2 => 500 WCUs.

When partition size exceeds storage limit of DynamoDB partition

Of course, the data requirements for the blogging service also increases. With time, the partitions gets filled with new items, and as soon as data size exceeds the maximum limit of 10 GB for the partition, DynamoDB splits the partition into two partitions.

Partitions created when Partition Size exceeds Storage Limit

The splitting process is the same as shown in the previous section; the data and throughput capacity of an existing partition is evenly spread across newly created partitions.

!Sign up for a free Codeship Account

How Items Are Distributed Across New Partitions

Each item has a partition key, and depending on table structure, a range key might or might not be present. In any case, items with the same partition key are always stored together under the same partition. A range key ensures that items with the same partition key are stored in order.

There is one caveat here:

Items with the same partition key are stored within the same partition, and a partition can hold items with different partition keys.

Which means partition and partition keys are not mapped on a one-to-one basis. Therefore, when a partition split occurs, the items in the existing partition are moved to one of the new partitions according to the mysterious internal hash function of DynamoDB.

Exploring the Hot Key Problem

For me, the real reason behind understanding partitioning behavior was to tackle the Hot Key Problem.

The provisioned throughput can be thought of as performance bandwidth. The recurring pattern with partitioning is that the total provisioned throughput is allocated evenly with the partitions. This means bandwidth is not shared among partitions, but the total bandwidth is divided equally among them. For example, when the total provisioned throughput of 150 units is divided between three partitions, each partition gets 50 units to use.

It may happen that certain items of the table are accessed much more frequently than other items from the same partition, or items from different partitions. Which means most of the request traffic is directed toward one single partition. Now the few items will end up using those 50 units of available bandwidth, and further requests to the same partition will be throttled. This is the Hot Key Problem.

Surely the problem can be easily fixed by increasing throughput. But you’re just using a third of the available bandwidth and wasting two-thirds.

A better way would be to choose a proper partition key. A better partition key is the one that distinguishes items uniquely and has a limited number of items with the same partition key.

How to avoid the hot key problem with proper partition keys

The goal behind choosing a proper partition key is to ensure efficient usage of provisioned throughput units and provide query flexibility.

From the AWS DynamoDB documentation:

To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.

In simpler terms, the ideal partition key is the one that has distinct values for each item of the table.

Continuing with the example of the blogging service we’ve used so far, let’s suppose that there will be some articles that are visited several magnitude of times more often than other articles. So we will need to choose a partition key that avoids the Hot Key Problem for the articles table.

In order to do that, the primary index must:

  1. Have distinct values for articles
  2. Have the ability to query articles by an author effectively
  3. Ensure uniqueness across items, even for items with the same article title

Using the author_name attribute as a partition key will enable us to query articles by an author effectively.

The title attribute might be a good choice for the range key. As author_name is a partition key, it does not matter how many articles with the same title are present, as long as they’re written by different authors. Hence, the title attribute is good choice for the range key.

To improve this further, we can choose to use a combination of author_name and the current year for the partition key, such as parth_modi_2017. This will ensure that one partition key will have a limited number of items.


In this final article of my DynamoDB series, you learned how AWS DynamoDB manages to maintain single-digit, millisecond latency even with a massive amount of data through partitioning. We explored the Hot Key Problem and how you can design a partition key so as to avoid it.

Check out the DynamoDB Developer Guide’s Guidelines For Tables and Partitions and Data Distributions for further reading. And remember that the first article in this series, Working with DynamoDB, covers the basics of DynamoDB along with batch operations, and the second article, Querying And Pagination with DynamoDB, explains in detail about query and scan operations, as well as the importance and implementation of pagination.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.

We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

  • chris_lewis

    Thanks for the nice writeup on how partitioning works in DynamoDB! I wanted to dig into the hot key problem as presented here, specifically this hypothetical example with articles. First I want to repeat the problem and make sure I’m clear on what has been identified as hot. In setting up the problem, you say:

    “let’s suppose that there will be some articles that are visited several magnitude of times more often than other articles.”

    The implies that there is a set of articles that are significantly more popular than others, presumably judged by how frequently they are accessed. Therefore, the problem of hotness is about reading article objects from the database. (what isn’t stated is the partition key that led to this – I assumed it was a UUID or title of the article)

    The proposed solution is to use a partition key of the author’s name, and a sort key of the article’s title. This might map to a url structure of `/author/title`; let’s say this very article is an example of such a popular artcile and hosted at a site that uses thsi url structure. The relative url to this article might be `/parth-modi/partitioning-behavior-of-dynamodb`, thus yielding a partition key of “parth-modi” and a sort key of “partitioning-behavior-of-dynamodb”.

    Earlier in the article it is noted that capacity units are spread evenly across partitions. That is, if DynamoDB has partitioned your data into 3 groups and you have 150 read capacity units (RCUs) allocated to the table, then each partition will receive 50 reads. Further, it notes that an object that is accessed more than others means more traffic to that partition, and that increasing the capacity only increases it by a third in this example (becase there are 3 partitions and RCUs are distributed evenly). This is indeed the hot key problem, and it arises when your _partition key_ does not distribute evenly, specifically because DynamoDB will evenly distribute RCUs across the partitions.

    However, the proposed solution of using the author as the partition and the title as the sort key does not solve the problem, assuming the problem was that due to a partition key the article’s UUID or title. In fact it may make it worse! Documents that share the same partition key are stored on the same partition. Therefore, with the proposed structure using author as the partition key, every article written by the same author will reside on the same partition. This means that requests for the same article always route to the same partition, and so an increase in requests for that article (ie because it gets really popular) results in increased reads to that partition. What’s worse is that, because the partition key is the author, the more popular that author is, the more traffic is routed to that partition. So in the best case, using the author as the partition key is no worse, but it is certainly no better and quite possibly worse.

    Scenarios like this could obviously benefit from a cache in front of DynamoDB to take the read load off, but this article is about avoiding hot keys. In the case where an identifiable resource is accessed more than others, using its identity as a partition key will always result in a hot key. The only way I can think of to avoid this by using partition key sharding ( If someone knows a better approach, I’d love to hear it.

    PS One other important factor in this that I don’t fully understand is how DynamoDB’s adaptive capacity into this ( fits).