I just committed some rough source code for, amongst other things, ImageUtil.
You’ll find it in my Google code SVN repository
You need this code to create the colour charts I posted a couple of weeks ago.
It’s inside a Maven project, but you can probably just cherry pick this file and have things work:
And watch out for this possible issue:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4886732
I haven’t been able to track down all of the situations it occurs in.
Following on from my post about the design of the BitVector class yesterday, the question was asked, do you really need operations over ranges when you could just use views?
For example, the following two lines of code will have exactly the same result:
v.setRange(100, 200, false);
v.rangeView(100,200).set(false);
Except that the second line will create an intermediate object. Putting aside the fact that not every environment that executes Java code has performance to match the standard Java runtime, the question of whether intermediate views should replace ranges hinges on the degree to which they impair performance.
To ascertain this, I put together a hasty benchmark this evening. It repeats a large number of different operations over a BitVector, using both ranges and views. For stable results, the fastest and slowest third of each run was discarded, and the remainder averaged for each operation.
The tests were performed on a 32 bit dual core machine running
OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10.1)
OpenJDK Server VM (build 20.0-b11, mixed mode)
The following chart gives a summary of the results:

It shows there is a clear and consistent cost to the object allocations (about 100ms in each of these tests). Given the performance sensitive nature of bit handling, this isn’t something I’d necessarily want to give away for the benefit of fewer methods.
In addition to these operations, I also measured the performance of the much slower rotate operation. This threw up an interesting anomaly: in contrast to all other operations, creating the intermediate view is always faster!

From the perspective of a developer who doesn’t have visibility of the algorithms applied by the JIT, this is inexplicable. Here is the rotateRange method which is being called:
public void rotateRange(int from, int to, int distance) {
if (from < 0) throw new IllegalArgumentException();
if (from > to) throw new IllegalArgumentException();
from += start;
to += start;
if (to > finish) throw new IllegalArgumentException();
rotateAdj(from, to, distance);
}
In the other case we are calling rangeView() followed by rotate(). Here is one subcall of the rangeView method:
public BitVector duplicate(int from, int to, boolean copy, boolean mutable) {
if (mutable && !copy && !this.mutable) throw new IllegalStateException();
if (from < 0) throw new IllegalArgumentException();
if (to < from) throw new IllegalArgumentException();
from += start;
to += start;
if (to > finish) throw new IllegalArgumentException();
return duplicateAdj(from, to, copy, mutable);
}
Notice the basic similarity in structure. Both methods are doing the same checks except that the rangeView method actually ends up doing more because it then needs to create a new BitVector. Finally, here’s the rotate method that will be called on the newly created view:
public void rotate(int distance) {
rotateAdj(start, finish, distance);
}
These code snippets show that the the rotateRange method cannot do more work than a call to rangeView and rotate combined. So how can it be consistently slower?
I don’t even have the start of explanation for this anomaly (except that it may be a bug or shortcoming in the JIT), but as I observed in my comments yesterday, JITs can be capricious. Writing APIs that anticipate good results from them is fine, but there’s no escaping the fact that it actually passes risk onto the developers who use the library. The size of that risk is a function of performance: the degree to which JIT optimizations influence it and its importance to the developer.
It’s been a while since I first shared my BitVector class and in that time it has grown larger and even more useful. Yet it remains a self contained class that’s easy to use in any Java project. The source code is available under the Apache 2.0 licence.
It’s killer benefit for me is that it makes it really easy to cast bits from one form into another and to operate on them, and safely pass them around: all very cheaply. It’s so useful, I’d really like to see other developers using it, so I’ve started trying to write a code based introduction to the class.
I’m not even half done, but I think the results so far are worth sharing. Don’t be intimidated! There are lots of methods but any given application will only need a small number of them. The benefit of locating all of the methods on a single class is that it makes things convenient & efficient.
Imagine a situation where every word in a large corpus is assigned an arbitrary numeric identifier, and you need a data structure that can quickly return the word-frequency for any particular identifier. If you are not a programmer or you can’t imagine finding yourself in this situation you can safely stop reading now.
I wasn’t in exactly in this situation, but a similar one that shares the same basic constraints:
It’s not a run-of-the-mill set of requirements for a data-structure; you certainly won’t find any help in the standard Java libraries, so I thought I’d share my approach.
The underlying concept is simple - sort the map entries into key order and use a binary search on the key to find the entry and return the associated value. This requires O(log n) accesses per map lookup but the problem with this naive approach is memory consumption. Imagining that both keys and values can be accommodated in 64 bit longs, one entry requires 16 bytes. With 10 million entries, we’re looking at 160 megabytes of memory for a data structure which in my application is only subsidiary to a number of larger algorithms. Can we do better?
It turns out that we can take advantage of fact that data frequently exhibit a bias towards small values. Though some values may be very large (eg. there are a small number of very high frequency words) most values are small (eg. most words are infrequent). By padding every value to the same width (eg. 64 bits) we’re wasting a lot of space.
To trim the values we can use a universal code and this will reduce the number of bits required to store most values if most values are relatively short. But now we have a problem: how can we operate a binary search when the entries are variable length?
My solution to this was to use Fibonacci coding. This elegant code has the property that every number’s binary representation ends with 11 and there are no consecutive ones elsewhere present. How does this help?
We encode every key and value using Fibonacci coding and lay them out [key,value,key,value,…] without any padding, in ascending key order. Then we perform our binary search at a bit-level. We examine the middle bit of our data and work outwards from there looking for a pair of consecutive ones. Because of the properties of this encoding, we know that the binary sequence 11 indicates the end of one number’s encoding and we can then decode the next pair of numbers. But how do we distinguish keys from values?
I chose a simple scheme: for each key, add one and double it; for each value double it and add one. So keys are always even and values are always odd and we can easily identify if it’s a key or a value we’ve decoded based on whether it is even or odd.
So the algorithm, slightly simplified, looks like this:
With the right library support, this is very easy to code — my crinch library has excellent support for universal codes.
There may be better approaches (I’d be happy to hear recommendations), but I thought this one was interesting enough to share.
Permutations are a key abstraction that aren’t covered by the standard libraries and haven’t been well catered for outside of them (as far as I know).
My crinch library now addresses this with a new package for handling permutations. It’s efficiently implemented for speed and memory usage, robustly coded for use in security sensitive contexts, and has a nice fluent API.
It’s under the Apache 2.0 license and available from my Google Code project*.
Here’s the gist…
(*) No jar binaries yet; take the source, or build via Maven.
The Android AdapterView classes are very efficiently implemented and provide developers with a solid basis for creating fast, smooth views over arbitrarily large datasets. Their basic APIs have remained stable since Android was introduced and yet, based on the applications I try-out (and even some I use regularly) it seems that developers often make poor use of them.
This is the first of two posts that I’m hoping will improve this situation, by sharing some of the code I’ve developed for my own applications.
The performance of an AdapterView is mostly determined by the Adapter you provide it with. There are two core obligations when developing an Adapter that will provide a smooth user experience†:
It’s that simple… Except what happens when your items are records that have to be retrieved from a Web server? And, to compound things, what if displaying the record requires downloading a couple of images? These problems can actually be tackled separately. This post provides a base class that helps with 2 (I’ll provide code to help with 1 at a later date).
What makes 2 complex in this scenario is that there’s no time to do anything other than return a preliminary rendering of the View from the Adapter, so the work to produce a complete rendering of a View must be done on a background thread.
Unfortunately, what makes this still more complex, is that the efficiency of classes like ListView and GridView is only possible because they recycle the Views they use to render items. This complicates the task of asynchronously updating the View because during the interval in which the complete content for the View is being generated, the parent AdapterView may have assigned it another item to display (possibly many times). In this case the existing rendering task should probably be abandoned (it certainly shouldn’t get assigned to the view) and instead a new rendering needs to commence.
Of course, this situation may itself be temporary: A user may briefly scroll an item off-screen and then back again; we don’t want to keep restarting the same rendering task without making any progress. To handle this efficiently caching may be necessary and may need to be combined with partial updates. And, as if things weren’t complex enough, the re-appearing item may be assigned to a different View to the one that initially displayed it (even if its position on-screen does not appear to have changed for the User).
So, yes, there is some complexity around implementing a good Adapter, but many Android developers have had years to get this right! Hopefully this code will help: it handles all of this apparent complexity in what is a fairly simple base class called ViewRenderer.
It’s designed to be invoked within the getView() method of Adapter. In this approach the getView() method should be limited to:
View, or inflating/constructing a new one (but not customizing it for the item)renderView() method of ViewRenderer to perform all subsequent customization of the View.To make this work, you obviously need to customize the ViewRenderer with your own rendering logic. This is done by implementing the following three methods:
void prepare(View, Param, int) This method is called on the main application thread before the View is first displayed to the user. It’s your applications opportunity wipe-down the View (since it may have previously presented another adapter item) and display a quick placeholder/loading view.Render render(Param, int) This method is called on a background thread and does the slow work (eg. drawing or loading) to convert the item into the resources needed to display it (eg. a Bitmap or POJO).void update(View, Render, int) This method is called on the main application thread to apply a new render to a previously prepared View.And that’s pretty much it, everything is reduced to these three digestible chunks. The int that’s being passed into these methods specifies an optional rendering pass to accommodate progressive rendering. There are a few additional operational aspects that can be controlled: render caching, view tagging, thread priority and execution. Documentation for these and everything else can be found in the source code comments.
The code is amenable to a number of improvements (some possibilities are noted code comments), but I hope it’s useful and results in some smoother Android lists and grids, and here it is:
† Not creating garbage helps too.
My money library provides convenient and robust abstractions for processing monetary amounts in Java.
It’s tiny but has seen a new release today with added support for controlling the precision of calculations and reliably splitting monetary amounts by fixed proportions.
For more information see any of the following:
(The code is released under the Apache 2.0 licence)
After a few birthing pains, Maven finally popped out a 1.0 release of my Java library for GVM clustering.
It feels great to finally tick that off my list. Hopefully someone with a very big dataset will find it useful.
I just published a Java implementation of my GVM clustering algorithm to Google Code. Later, I’ll be collating all the information I’ve been compiling about the algorithm at my website.
Until then, here are some basic build steps, you’ll need svn, Java 1.6 and Maven 2.2:
svn checkout http://tomgibara.googlecode.com/svn/trunk/cluster
cd cluster/cluster-mojo/
mvn install
cd ..
mvn install
Then, if you want to try out the included demo app, try:
java -jar cluster-demo/target/cluster-demo-0.1-SNAPSHOT-jar-with-dependencies.jar