A minimal perfect hash is simply a bijection between a set of N objects and the first N integers. See the wikipedia page about perfect hashing for more information.
I’m looking to produce a fast and memory efficient map implementation over String keys and I want to use a perfect hash to pack values without risking collisions.
Casting around online for a Java implementation led me to Sux4J (this rather unpromising sounding library looks excellent and I’m going to take a closer look at it soon) but the implementations are too generic for my purposes.
So I’ve fashioned my own minimal perfect hashing algorithm for Java strings.I don’t know anything about the theory of such algorithms, but my approach was to build a data-structure as follows:
The gist of this approach is outlined visually in the following diagram. Here, the hashes for the strings “cat” and “dog” collide, so a small decision tree needs to be consulted in the case that either of these two are hashed to confirm which value should be returned.

There are some nuances in the actual implementation. For example, Strings are actually ordered by length and character sequence too; this assists the building of fast decision trees. Also each duplicated string stores its offset within the span to avoid scanning back to find the start of the span (I think there are better space/time trade-offs around this that I may investigate in the future.).
However, these complexities only come into play when there are collisions of the String hash function. The common case, even for fairly large numbers of strings, is that there are no collisions. This reduces the time complexity to a binary search over the integer array of hashcodes — not as fast as only taking the hashcode, but still fast. And even the exceptional case is still fairly fast since it typically needs to test candidate string’s length to determine which minimal hash value it should be assigned.
Note that the algorithm is designed to examine as little about the string as is necessary to determine the perfect hash value (this helps to make it fast). Furthermore, the algorithm does not keep a reference to the list of potential candidates for hashing (this helps to reduce memory overhead). As a consequence, the algorithm cannot necessarily distinguish valid candidates; though they are frequently identified as such, invalid string arguments may generate valid hashes.
The source code for this implementation is fully documented and is under the Apache 2.0 licence. It is available through my website’s Google code project
The project that’s been taking up so much of my time recently is to architect a complex B2B e-commerce system. This naturally requires some form of “monetary abstraction” which is something that Java doesn’t provide (quite a surprise really, when you think about some of the classes that have made it into the system libraries over the years).
This is also something that the Java community doesn’t appear to have addressed in any significant way. Faced with this situation on the current project, I contributed my own small collection of classes for dealing with monetary amounts via the expedient of open-sourcing them and then treating them as I would any other third-party library.
This is a one line example that just about sums-up the whole library. Here’s how you make $100:
new MoneyType(Locale.US).money(10).calc().multiply(BigDecimal.TEN).money();
A page containing all relevant links to downloads, documentation etc. is available on my website:
With just 30 minutes of coding, I knocked out a small - but pretty inefficient - class to generate an Ulam spiral. In my plot, dot radius increases with fewer unique prime factors; brightness increases with the total number of prime factors.
The source code, should anyone want it, is public domain and can be found at:
It’s surprising how fast Java can be for one-shot bespoke graphics tasks like this.
I’ve been meaning to post something about the Android Developer Challenge for a good while, but I’ve been so busy recently (architecting a multi-faceted e-commerce system at the moment) that I’ve had no time for anything else. In fact, I was so busy that the results almost passed me by.
Daisy Garden didn’t win. This was not surprising since I’d had an opportunity to judge a number of the applications in entertainment category. It’s not that I felt there were lots of significantly ‘better’ applications, but that so many of the applications were more suited to the challenge. Congratulations to all the winning entrants. My prediction had been for a music app 1,2,3 in my category and I was almost correct (it was a music app 2,3 instead).
But the results aren’t what prompted this post; I’ve been more interested in how the voting has been structured in both challenges, and I’ve had the opportunity to think about it: my entry into the first ADC flopped, then I joined a team that went on to win, this time round my entry made it to the shortlist but didn’t win - so I’ve pretty much run the gamut of possibilities.
I was mildly critical about the structure of the voting after the first ADC. In my opinion, it pretty much amounted to a map-reduce algorithm in which apps were distributed over judges. The ADC2 was essentially similar and introduced bucketing (perhaps ensure variety?) and pre-filtering with a public vote (to ensure the expert judges had a manageable task, I’m sure).
My basic criticism was, and remains, that such a voting scheme hampers people’s capacity to reach a good judgement because it doesn’t provide a medium through which they can influence each other’s evaluations. The judging process essentially atomizes the judges and fails to take advantage of their human strengths. And I think that understanding this has implications for my field of endeavour: creating software.
I think I’m paraphrasing Bjarne Stroustrup here when I say that software design involves balancing lots of subtle and complex considerations, and the best place to do that is inside the brain of a programmer. If one accepts this, there is an implicit assumption that all of the necessary considerations are known (and will fit into one brain*). I believe this perspective explains a lot about why ‘design by committee’ fails: a team of human’s can’t compete with synapses at transmitting and evaluating ideas. It also explains why ‘elite solo programmers’ may often fail: they can form good judgements but usually don’t have all the necessary data.
In trying to avoid the worst weaknesses of these two extremes, I look to encourage good design by:
With this approach I try both to preserve the clarity that solo-design frequently provides and avoid the narrowness that often accompanies it.
Back to the ADC judging. Forming an unqualified analogy between designing software and judging is application is clearly absurd, but there are similarities, and I think that a panel of judges who freely discussed their opinion about each application before recording individual ratings would provide fairer judgements than may have been the case in the Android Developer Challenge. For example, if there were five judges, and one judge knew that the application under consideration was a direct copy of an existing application, he could share that information with the other judges who could factor it into their rating. Without that exchange of information, the more knowledgeable contribution becomes marginalized.
The point is to target two human strengths: information sharing and individual decision making. Of course, the curse is that this approach doesn’t scale, but I’d love to find one that does, because I’m sure I’d learn a lot about designing software from it.
Thanks to Google for hosting and funding the Android Developer Challenges, it’s been fun.
(*) I think this is the simplest gauge of whether a software project will be successful: If the requirements can’t simultaneously fit into one brain then problems are unavoidable.
It’s been an extremely busy few weeks and I’m now looking forward to having a little more time to focus on Android development again. One thing I did achieve this week was to put live a very early version of a game I’ve been developing called Daisy Chase. It’s a “simple” game in which you have to capture flowers by linking them together. I say “simple” instead of just simple because I’m again humbled by how I can misjudge what users will really can understand simply. Nevertheless, I’m quite pleased, the game appears to work (no bug reports yet) and some users report really enjoying it. I would have liked to have entered it into the ADC2, but I had no time (and the competition in the game categories looks especially strong).
I’m really excited about two elements of this new Android application. The first is that it will be the first application that makes use of the Moseycode publishing API to allow users to publish, own and share their own levels. One of the screenshots above gives a preview of the user registration page (not yet incorporated into the application).
The second thing is that, as a big fan of the Intent system used in Android, I’m looking forward to the integration that game will have with Daisy Garden. For example, By achieving particular goals on a level, you will win one of the flowers that appears in the level and this will be added to your garden. In the full version of the game, levels will be collected into gardens and completing all of the levels in a garden will add that garden to the collection of gardens you can grow flowers in.
It’s all part a minor (but time consuming!) experiment in how games, and not just regular applications, can be broken down into separate units that have individual value. It’s a remarkably involved task to do effectively.
The alpha version of Daisy Chase is available from the Android Market.
The ADC Round 2 top 200 list isn’t ordered by category so here they are, conveniently ordered so.
I got to rate a number of really good applications that made it on to this list. Sadly I also rated some promising looking ones that aren’t.
(Oh yeah, you will will find Daisy Garden on this list - thanks to everyone who participated in the rating).
(previously named top 100 list through a spectacular failure of mental arithmetic)
Reading posts on the android developer group prompted me to reread the Java Language Specification and confirmed that I had forgotten something very important about Java, specifically:
Thread contention for Java synchronized blocks is not guaranteed to be fair.
This matters, because if you have a situation where a ‘producer’ is delivering data to a ‘consumer’ via a synchronized data structure, the producer may be permanently blocked while the consumer polls for data even though the consumer is repeatedly taking and releasing the lock.
In the context of an Android application: if the producer is the event thread delivering events to an application created thread for processing, then it’s a potential (and surprising!) source of ANRs. This is exactly what was happening to me when I filed an issue I titled: “ANR arising from touch event dispatch”. If only I’d remembered that Java threads aren’t guaranteed to be fair, I wouldn’t have wasted hours producing a (mostly) reproducible test case.
You can avoid the problem by using a fair lock, but this is relatively expensive if you’re aiming to create a high performance application, like a game say, which I was (and still am). So instead I wrote a simple event queue for delivering motion events to a thread other than the main application thread based on a synchronization-free fixed-size circular buffer that uses volatiles to manage concurrency.
I’ve been using the class below, for a while now and haven’t encountered any issues with it. Then again, I could once have said that of the code it replaced…
import android.view.MotionEvent;
/**
* <p>
* A simple event queue for delivering motion events to a thread other
* than the main application thread. The implementation is based on a
* synchronization-free fixed-size circular buffer that uses volatiles
* to manage concurrency. Adding an event to the queue is allocation
* free. Removing an event from the queue is only allocation free if
* the event object has been recycled.
* </p>
*
* <p>
* The implementation assumes that calls to the add (resp. next) method
* are single threaded (ie. made on just one thread, or externally
* synchronized).
* </p>
*
* <p>
* NOTE: Mapping of the MotionEvent fields is incomplete and does not
* include historical data; this is easily addressed by modifying the
* implementation.
* </p>
*
* @author Tom Gibara
*/
// NOTE: The safe concurrency reliability of this implementation is based on:
// 1. Calls to the add and next methods are single threaded.
// 2. The mHead and mTail fields are volatile.
// 3. The mHead (resp. mTail) field is only modified in the next (resp. add) method.
// 4. Updates to the mHead and mTail methods are the last operations in any method.
public class MotionEventQueue {
private final int mSize;
private final long[] mDownTimes;
private final long[] mEventTimes;
private final int[] mActions;
private final float[] mXs;
private final float[] mYs;
private volatile int mHead;
private volatile int mTail;
public MotionEventQueue(int size) {
mSize = size;
mDownTimes = new long[size];
mEventTimes = new long[size];
mActions = new int[size];
mXs = new float[size];
mYs = new float[size];
mHead = 0;
mTail = 0;
}
public boolean addEvent(MotionEvent event) {
final int head = mHead;
final int tail = mTail;
int nextTail = tail + 1;
if (nextTail == mSize) nextTail = 0;
if (nextTail == head) return false;
mDownTimes[tail] = event.getDownTime();
mEventTimes[tail] = event.getEventTime();
mActions[tail] = event.getAction();
mXs[tail] = event.getX();
mYs[tail] = event.getY();
mTail = nextTail;
return true;
}
public MotionEvent nextEvent() {
int head = mHead;
if (head == mTail) return null;
MotionEvent event = MotionEvent.obtain(mDownTimes[head], mEventTimes[head], mActions[head], mXs[head], mYs[head], 0);
mHead = head == mSize - 1 ? 0 : head + 1;
return event;
}
}
I just finished uploading a new version of Daisy Garden which – I hope – support high and low density devices. Along with some minor other changes I couldn’t resist adding another garden design to the application.
The garden designs are actually produced on the phone with an application that allows me to build layered tile maps with masks and shadows. Android was an ideal platform for building this tool because:
It allows me to share any code I want to, between the editor and the application because they are running on the same platform.
It allows me to doodle new designs any place I have free time.
I get to see the design (mostly) as other phone users will see it as I design it.
I may get the opportunity to release it in the future, to allow anyone to make their own garden designs.
I’ve been getting to grips with the new drawable density functionality that’s been exposed in Android 1.6 and it’s terrific - it simply saves so much effort when you want to target different screen sizes.
You can see how the layouts (originally designed around HVGA) are faithfully recovered compared with Android 1.5. For example in the first screenshot, the buttons return to the correct proportions, and in the second, the flower reclaims most of the screen estate.
I’ve just started retargeting my applications for the latest version of Android (1.6) which introduces full support for different screen densities. The responsibility for scaling bitmap resources to match the screen density falls to the BitmapFactory class.
This is a new behaviour applied to existing methods; I haven’t got enough experience with the new SDK to form an opinion, but it looks as though the new functionality will do exactly what you want it to in most situations; it should simply be a case of moving resources to an appropriate ‘density directory’ (see the resources i18n guide for details).
But there are some instances where you don’t want your images scaled to match the screen density. One case is with images that you will scale within your application, usually when drawing them to a canvas. This occurs in my Daisy Garden application – different garden designs are created from a set of masked tiles which I don’t want pre-scaled.
In 1.6 there is support for density independent bitmaps - these can be placed in a “drawable-nodpi” resources folder. Unfortunately it seems that this isn’t properly supported for 1.5 (it appears that bitmaps loaded from this folder always come out as 1px squares). Alternatively, there is also flag: BitmapFactory.Options.inScaled which you can set to false to disable the scaling, but this is new in 1.6 so isn’t available in applications that aim for compatibility with 1.5.
So I think there’s no option but to produce a utility class that switches code path depending on the API number. It’s not a big deal (it took me longer to write this post) but it might save someone else some time. It’s not documented, but simple to use, just call:
UnscaledBitmapLoader.loadFromResource()
import android.content.res.Resources;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.graphics.BitmapFactory.Options;
import android.os.Build;
public abstract class UnscaledBitmapLoader {
public static final UnscaledBitmapLoader instance;
static {
instance = Integer.parseInt(Build.VERSION.SDK) < 4 ? new Old() : new New();
}
public static Bitmap loadFromResource(Resources resources, int resId, BitmapFactory.Options options) {
return instance.load(resources, resId, options);
}
private static class Old extends UnscaledBitmapLoader {
@Override
Bitmap load(Resources resources, int resId, Options options) {
return BitmapFactory.decodeResource(resources, resId, options);
}
}
private static class New extends UnscaledBitmapLoader {
@Override
Bitmap load(Resources resources, int resId, Options options) {
if (options == null) options = new BitmapFactory.Options();
options.inScaled = false;
return BitmapFactory.decodeResource(resources, resId, options);
}
}
abstract Bitmap load(Resources resources, int resId, BitmapFactory.Options options);
}