Minimising memory usage for large JVM object stores

A typical JVM domain model can consume use a surprisingly large amount of memory. This post explores some techniques to radically reduce memory consumption.

Throughout this post we'll use the following cut-down version of a StackOverflow post as an example.

public interface Post {
    Integer getId(); // Nullable
    String getTitle();
    String getBody();
    Set<String> getTags();
    Integer getCreatedUserId();
    Date getCreationDate();
    Date getClosedDate(); // Nullable
    BigDecimal getScore(); // Between 0 and 100 with 2 decimal places
}

We'll demonstrate some techniques to store posts using 20% of the memory of a standard Java bean while retaining the same interface and semantics.

A standard Java bean

We'll start be creating a standard Java bean and measure how much memory an object consumes as a baseline.

public class PostBean implements Post {
    private Integer id;
    private String title;
    private String body;
    private Set<String> tags = new HashSet<String>();
    private Integer createdUserId;
    private Date creationDate;
    private Date closedDate;
    private BigDecimal score;

    public PostBean(Post post) {
        id = post.getId();
        title = post.getTitle();
        body = post.getBody();
        tags.addAll(post.getTags());
        createdUserId = post.getCreatedUserId();
        creationDate = post.getCreationDate() == null ? null : 
            new Date(post.getCreationDate().getTime());
        closedDate = post.getClosedDate() == null ? null : 
            new Date(post.getClosedDate().getTime());
        score = post.getScore();
    }

    // Getters omitted
}

As we'll be creating a few implementations of Post we'll use a simple factory to create instances of Posts.

public class PostFactory {
    public static Post create(Class clazz, Integer id, String title, 
        String body, Set<String> tags, Integer createUserId, 
        Date creationDate, Date closedDate, BigDecimal score) {

    // Constructs an instance of clazz with a contructor that accepts Post
}

And now we can use the factory to contruct and instance of a bean as follows:

Set<String> tags = new HashSet<String>(Arrays.asList("tag1", "tag2", "tag3"));
Post post = PostFactory.create(PostBean.class, 1, "Some title", "Some body", tags, 23, new Date(), new Date(), new BigDecimal("40.55"));

Measuring the memory allocation

Let's start by naively adding up how much memory we think the above object will consume by :

Integer id: 32 bits => 4 bytes
String title: "Some title" => 10 bytes (UTF-8)
String body: "Some body" => 9 bytes (UTF-8)
Set tags: 3 x 3 character tags => 9 bytes (UTF-8)
Integer createdUserId: 32 bits => 4 bytes
Date creationDate: 64 bits => 8 bytes
Date closedDate: 64 bits => 8 bytes
BigDecimal score: 32 bits => 4 bytes

If we add this up we get 56 bytes.

Clearly, there's going to be some overhead for objects, string lengths, the collection for tags etc. How much would we expect? 10%? 100%? more?

Let's measure the actual memory consumption using MemoryMeasurer:

Post post = PostFactory.create(PostBean.class, 1, "Some title", ...);
System.out.println(MemoryMeasurer.measureBytes(post));

The result is 736 bytes. Yes, 736 bytes! Or 13 times our naive estimate of how much data we're storing.

We can also get a breakdown of the where the memory has gone as follows:

System.out.println(ObjectGraphMeasurer.measure(post));
// Footprint{Objects=23, References=47, Primitives=[float, int x 25, long x 3, char x 31]}

Somehow our simple post has used 23 objects, and a bunch of references and primitives!

Objects are expensive

In Java, all objects have a header associated with them (that links to their type amongst other things). Furthermore, objects and fields within them may be byte-aligned in memory for performance reasons. In a typical 64-bit JVM with compressed oops (32-bit pointers) an object will consume 16 bytes of memory.

Exactly how much memory is used can vary depending on the JVM, but the following are typical:

private void measure(Object object) {
    System.out.println(MemoryMeasurer.measureBytes(object));
}

measure(new Object());      // 16 bytes
measure(new Integer(1));    // 16 bytes
measure(new Long(1));       // 24 bytes
measure(new BigDecimal(1)); // 40 bytes

At first glance the above looks counter-intutitive. How can an Integer consume the same amount of memory as an Object? Why is a Long 8 bytes bigger than an Integer when it's only 4 bytes more data?

The answers are due to padding and alignment. I recommend java-object-layout for futher anaylsis, but for the purposes of this blog post, we've demonstrated that Objects are expensive in terms of memory consumption.

Replacing objects with primitives (Non-nullable)

Now that we know every object takes up a minimum of 16 bytes, let's replace them with primitives where possible:

Non-nullable wrapper classes can be replaced with their primitive counterparts
Non-nullable dates can be stored as longs
BigDecimals can be replaced by integer primitives internally and we can dervive the scale. e.g. instead of storing $10.53 as a decimal, we can store it as 1,053 cents.

Applying this to our example we get:

public class PostPrimitive implements Post {
    // Only modifications to the the PostBean implementation are shown
    private long creationDate;
    private short score;

    public PostBeanPrimitive(Post post) {
        creationDate = post.getCreationDate().getTime();
        score = post.getScore().movePointRight(2).shortValueExact();
    }

    public Date getCreationDate() {
        return new Date(creationDate);
    }

    public BigDecimal getScore() {
        return new BigDecimal(score).movePointLeft(2);
    }
}

Creating an instance of our primitive bean with same data as the standard bean above results in memory usage of 656 bytes - a saving of 80 bytes.

Replacing objects with primitives (Nullable)

For nullable fields, it's harder to replace them with primitives without changing the semantics of the class. There are a couple of options however:

Use a well-known value that will not occur in the domain to indicate a null (e.g. Integer.MAX_VALUE)
Use a bitset to store whether a field is null, along side the primitive values.

We'll use the latter option and add a nulls field to keep track of values that are null. This way the semantics will remain identical to a standard bean.

public class PostNullable implements Post {
    // Only modifications to the the PostPrimitive implementation are shown
    private byte nulls;
    private int id;
    private long closedDate;

    public PostBeanPrimitiveNullable(Post post) {
        if (post.getId() == null) {
            nulls |= 1;
        } else {
            id = post.getId();
        }

        if (post.getClosedDate() == null) {
            nulls |= 2;
        } else {
            closedDate = post.getClosedDate().getTime();
        }
    }

    public Integer getId() {
        return (nulls & 1) == 1 ? null : id;
    }

    public Date getClosedDate() {
        return (nulls & 2) == 2 ? null : new Date(closedDate);
    }
}

The above changes save us another 16 bytes per instance (and our nulls bitset could track another 6 nullable objects without consuming more memory).

Replacing mutable collections with immutable

Mutable collections are generally a compromise - they tradeoff the amount of memory they consume against the cost of expanding. e.g. An empty ArrayList will create an array of 10 elements internally even though it stores nothing.

Our PostBean currently stores tags in a HashSet, which we will replace with an Array as follows:

public class PostCollection implements Post {
    // Only modifications to the the PostNullable implementation are shown
    private String[] tags;

    public PostCollection(Post post) {
        tags = post.getTags().toArray(new String[post.getTags().size()]);
    }

    public Set<String> getTags() {
        return Collections.unmodifiableSet(new HashSet<String>(Arrays.asList(tags)));
    }
}

Measuring the difference in size between this and our previous version shows that it saves a whopping 240 bytes by replacing the collection.

Compressing Strings

Strings in Java are stored internally as an array of chars - an array of 16 bit values. English (and many other languages) can be represented in 8 bits, or half the memory.

A simple compression technique is to encode the string into a byte array as UTF-8. We'll take it a little bit further by compressing the resulting UTF-8 string with Snappy.

Snappy is one of a class of algorithms that provide significant compression with high throughput. See jpountz's benchmarks for more information.

public class PostSnappy implements Post {
    // Only modifications to the the PostCollection implementation are shown
    private static final Charset UTF_8 = Charset.forName("UTF8");

    private byte[] title;
    private byte[] body;

    public PostSnappy(Post post) {
        title = post.getTitle() == null ? null : 
            Snappy.compress(post.getTitle().getBytes(UTF_8));
        body = post.getBody() == null ? null : 
            Snappy.compress(post.getBody().getBytes(UTF_8));
    }

    public String getTitle() {
        return title == null ? null : 
            new String(Snappy.uncompress(title, 0, title.length), UTF_8);
    }

    public String getBody() {
        return body == null ? null : 
            new String(Snappy.uncompress(body, 0, body.length), UTF_8);
    }
}

See the results section below for the impact of compression on the entire posts dataset.

Interning

Iterning is the mechansimn of returning a canonical representation of an object, so that there is only one instance of an object if those objects are equal.

Java's String class has an itern method that is used automatically by the compiler to reduce the memory consumption of strings.

Consider the following example:

private void memory(Object object) {
    System.out.print(MemoryMeasurer.measureBytes(object) + ": ");
    System.out.println(ObjectGraphMeasurer.measure(object));
}

memory(new String[] { "hello", "hello", "hello", "goodbye" });
memory(new String[] { new String("hello"), new String("hello"), new String("hello"), new String("goodbye") });

// Output:
// 160: Footprint{Objects=5, References=6, Primitives=[char x 12, int x 6]}
// 224: Footprint{Objects=7, References=8, Primitives=[char x 12, int x 12]}

From the output we can see that the 1st variant uses 2 less objects and subsequently less memory. (This happens because the compiler implicitly calls intern on all string literals.)

We'll apply the same interning technique to tags within posts. Here's some stats on the dataset we're using:

Number of posts: 137,918
Number of tagged values: 66,993 (sum of sizes of all tag sets)
Number of unique tags: 1,723

Top 10 tag sets by usage

Tag set	Count
[]	111932
[interview]	144
[programming-languages]	123
[java]	111
[algorithms]	104
[career-development]	92
[licensing]	83
[open-source]	81
[c++]	74
[design-patterns]	73

From this we can see:

We can remove 65,270 (66,993 - 1,723) strings from memory by interning tag values
Empty tag sets are extremely common, so we should be sure to intern an empty set to a single instance (this will remove 111,932 instances of empty tag sets)
Single word tag sets seem to be fairly common, so we can intern those to reduce the number of tag sets (not just the tags within them).

Be careful to take into account the structure you are using to intern objects as it will also consume memory (usually backed by some kind of map). You'll see best returns for interning frequently used values.

Here's our interning implementation (I've done the interning manually here, but I would recommend Guava's Interners for production use):

public class PostIntern implements Post {
    // Only modifications to the the PostSnappy implementation are shown

    private static final Map<String, String> strings = 
        new HashMap<String, String>();
    private static final Map<String, String[]> tagSets = 
        new HashMap<String, String[]>();

    private static <K, V> V intern(Map<K, V> map, K key, V value) {
        V canonical = map.get(key);
        if (canonical == null) {
            canonical = value;
            map.put(key, canonical);
        }
        return canonical;
    }

    private String[] tags;

    public PostIntern(Post post) {
        tags = internTags(post.getTags());

        if (tags.length <= 1) {
            tags = intern(tagSets, tags.length == 0 ? "" : tags[0], tags);
        }
    }

    private String[] internTags(Set<String> tags) {
        Set<String> interned = new HashSet<String>();
        for (String tag : tags) {
            interned.add(intern(strings, tag, tag));
        }

        return interned.toArray(new String[interned.size()]);
    }
}

Interning by its nature only saves memory when multiple instances share interned values, so see the results section below for the impact of interning on the whole dataset.

Results

To test our variants of Post, we'll use the 'Stack Exchange Data Dump - Jun 2013' containing 137,918 posts.

Posts excluding 'title' and 'body'

The following table shows the results of loading all posts into an array and measuring how much memory it consumes. In this example it nulls out title and body fields so we can see the impact ignoring large strings.

Class	Memory (MB)	% Reduction	% Total Reduction
PostBean	44.00	0.0	0.0
PostPrimitive	34.53	21.5	21.5
PostNullable	33.35	3.4	24.2
PostCollection	14.78	55.7	66.4
PostSnappy	14.78	0.0	66.4
PostIntern	8.62	41.7	80.4

Reading down the table, each class contains the optimisations of the previous class, plus it's own. '% Reduction' shows the reduction from the previous variant, while '% Total Reduction' shows the total reduction from the unoptimised PostBean baseline.

With all optimisations applied our memory usage is reduced from 44MB to 8.62MB, an 80% reduction.

Posts including 'title' and 'body'

We'll now run the same test again, including the Post title and body fields.

Class	Memory (MB)	% Reduction	% Total Reduction
PostBean	301.57	0.0	0.0
PostPrimitive	292.10	3.1	3.1
PostNullable	290.92	0.4	3.5
PostCollection	272.35	6.4	9.7
PostSnappy	113.77	58.2	62.3
PostIntern	107.60	5.4	64.3

Here we can see that large strings dominate the memory consumption of posts and that we get a major saving through string compression, with a total saving of 64.3% for all optimisations.

Garbage collection

Some of the above techniques may result in more garbage collection than a typcial bean, especially using immutable collections. However, the garbage created tends to be short-lived which is very efficiently handled by modern JVMs.

On the other hand, the techniques above reduce the number of objects that the garbage collector has to manage.

Whether the net result is a positive or negative really depends on the access patterns for the data, but should be considered.

Other Techniques

There's a few other minimisation techniques worth mentioning that we haven't touched on in this post:

BitSets: bitsets are an excellent technique for storing collections of a finite set of values. See Java's BitSet for more information.
Primitive collections: Libraries such as Trove provide implementations of collections that store primitives instead of object wrappers.
It is possible to use memory outside of the Java heap completely. This removes the overhead of the object header and you have complete control of the data layout, but must manage the memory yourself. See this blog post for an example.

Conclusion

The techniques above have demonstrated that a simple bean can be stored in 20% of the memory in some cases. For real-world examples with more attributes, it is reasonable to expect you could reduce memory consumption even further.

Links

https://code.google.com/p/memory-measurer/: Tool used to measure the memory consumed by object graphs

https://github.com/shipilev/java-object-layout: Tool to show the physical memory layout of a Java object

http://jpountz.github.io/lz4-java/1.1.0/lz4-compression-benchmark/: Review of fast object compression techniques

https://github.com/dain/snappy: Java Snappy compression library used

https://code.google.com/p/guava-libraries/: Google's Guava library contains immutable collections and interners

Radiant Cheese