Deobjectify

Low memory-overhead arrays of objects in Java

- Array combining/inlining in Java (bytecode) -

A common structure used in scientific programming are large arrays of small objects on which computation is performed throughout the entire program. A good example of this are graph applications where you compute on big graphs represented as lists of nodes and edges or the classic Barnes Hut application where you compute on a large list of astronomical objects.

Unfortunately, in Java, using arrays of objects leads to huge memory overhead: in Java 6 from Sun/Oracle the overhead is about 3 machine words per object (2 words for object header, 1 word for a reference). Because objects stored in large arrays are often small, a large portion of memory is wasted on storing references. In addition, such arrays are slow to allocate, the garbage collection is slow and unnecessary, and each access to an object in the array requires an additional indirect memory access.

The Deobjectify library offers arrays of objects that look and behave like regular objects, while the arrays have virtually no memory overhead, can be allocated fast and used with minimal programmer's effort. Deobjectifier automatically rewrites the code into a number of arrays of primitive types, laying the arrays according to hints from the programmer. As a result, garbage collection will be performed per-array and not per object in the array (which is typically, but not always, desired). Simple benchmarks show that the gain in memory and allocation time is easily 5-fold. Complex banchmarks are still in preparation (as of Mar, 2011).

Basic idea for the impatient

Instead of using this: Use this:
T[] arr = new T[size] NOArray<T> arr = new NOArray<T>(size)
T t = new T(params) T t = new T(params)
arr[i] = t arr.set(i, t)
a = arr[i] arr.get(i, t)

And later: pack your app in a jar file app.jar and run:

java deobjectify.Deobjectifier app.jar
This is where the magic happens: a non-generic class NOArray_T containing primitive arrays will be added on-the-fly to the jar file and used in the code.

Basic idea explained

When coding, use the "fake" class deobjectify.NOArray parametrized with the type of your choice of objects that will be stored in the array. There are some limitations about the type T, in general: it should be "simple", as best a list of fields of primitive types.

The deobjectifier applied it to your jar file changes the generic type NOArray<T> into a non-generic NOArray_T, which is created on the fly and added to the jar file. This new class contains a number of primitive type arrays and translates accesses to get() and set() into accesses to the primitive type arrays. The get(i,t) method copies values at index i into the t object. The set(i,t) method copies values from object i into, logically, object of the array at index i. In both cases the t object acts only as a box to transfer data.

The class of objects stored in the array can be optionally annotated with hints about layout, for example

@NOStoragePolicy(group = GroupPolicy.ALL_IN_ONE)

is the default policy that will store all fields within the minimal number of arrays (one or two, if objects contains references), figuring out transformations to other types where necessary. Other policies include BY_TYPE (group fields by type), ALL_SEPARATE (do not group at all), and CUSTOM grouping, for instance:

@NOStoragePolicy(group = GroupPolicy.CUSTOM, customGroups="x,y|vobj"),

which means that the Deobjectifier should make the code such that the fields x and y are stored in one group (one array), vobj in a separate group, and all remaining fields in another group. Why the policies all useful? You may want to optimize caching behavior depending on access patterns of your algorithm. In a typical access pattern (accesses to all fields in an object are grouped): more arrays means worse cache behavior.

Downloads

Download release with sources and the libraries with examples compiled with Sun's Java 6:

In case of difficulties with these files, please contact E. Krepska (e.l.krepska"AT"vu.nl).

Compile, install and run

Please refer to the short README file. Requirement Java version ≥ 5 (needs generics). Dependencies (all included in the distribution in the external directory):

Ant and JUnit are also useful to easily run tests and examples.

Licence

Deobjectifier is open-source and is licensed under GPL-3.0. In simple terms that means that you can reuse the library and the code freely (even charge for your software), but the code you release must also be open source and must include the licencing messages of this code.
A copy of the licence is included in the distribution.

Papers

Coming up!

Credits

The Deobjectify library is being developed at the VU University Amsterdam (also called Free University Amsterdam) by Elzbieta Krepska working with Thilo Kielmann, Henri Bal and Wan Fokkink.

Have fun with the Deobjectifier!
Mar 10, 2011