Latest Tweets

 

BitVector: a Java class for bit twiddling

Prior to implementing a BloomFilter along the lines of the interface described in the Guava project issue tracker, I’ve reviewed my sprawling personal collection of Java libraries to identify the all of the ad-hoc bit-processing related code I’ve written with a view to unifying it all into something more reusable.

To my surprise, much of the code didn’t actually deal with tricky bitwise processing, but with the practicalities of efficiently moving bit data between different (bytes, ints, longs, byte arrays, IO streams, Strings, BigIntegers etc). All of those bits have to come from somewhere I guess.

Of course, there was plenty of bit twiddling, but another obvious sticking point was supporting competing requirements for mutability and immutability, as is common when one is seeking to design APIs that are both robust and efficient.

So I’ve spent a bit of time combining everything I need for various binary processing tasks (bitwise operations, (im)mutability control, conversion to standard Java types) and the result is a single stand-alone class called BitVector.

The source code for the class is available under the Apache 2.0 licence from:

http://tomgibara.googlecode.com/svn/trunk/crinch/crinch-bits/src/main/java/com/tomgibara/crinch/bits/BitVector.java

The class is mostly complete for my needs, though I haven’t yet begun to employ it in any of my projects.

For a better idea of what the class provides, the class Javadoc is included below.


BitVector stores fixed-length bit sequences of arbitrary length and provides a number of bit-wise operations, and methods for exposing the bit sequences as established java types.

In keeping with Java standards, bits are operated-on and exposed as big-endian. This means that, where bit sequences are input/output from methods, the least-significant bit is always on the right and the most significant bit is on the left. So, for example, the toString() method will contain the most significant bit in the character at index 0 in the string. Naturally, In the cases where this class is used without externalizing the bit representation, this distinction is irrelevant.

A consequence of this is that, in methods that are defined over ranges of bits, the from and to parameters define the rightmost and leftmost indices respectively. As per Java conventions, all from parameters are inclusive and all to parameters are exclusive.

Instances of this class may be aligned (see isAligned() and aligned(). Many operations will execute more efficiently on aligned instances. Instances may also be immutable (see isMutable(), mutable() and immutable()). In addition to a number of copying operations, the class also supports views; views create new BitVector instances that are backed by the same bit data, this allows applications to expose mutable BitVector instances ‘safely’ via immutable views.

The class extends Number which allows it to be treated as an extended length numeric type (albeit, one that doesn’t support any arithmetic operations). In addition to the regular number value methods on the interface, an bigIntValue() method is available that returns the BitVector as a positive, arbitrarily sized integer. Combined with the fromBigInteger() method, this allows, with some loss of performance, a range of arithmetic calculations to be performed.

The class also implements Iterable and provides methods to obtain ListIterator as one would for a List, but stops short of implementing the List interface. This allows a BitVector to be used with a range of Java language constructs and standard library classes.

The class is Serializable and Cloneable too (clones are shallow and essentially behave as a view of the original instance). For instances where more control over serialization is needed, read(InputStream) and write(OutputStream) methods are available, though better performance may result from calling toByteArray() and managing the writing outside this class.

In addition to all of the above methods outlined above, a full raft of bitwise operations are available, including:

  • set, and, or, xor operations over a variety of inputs
  • shifts, rotations and reversals
  • tests for bit-wise intersection, containment and equality
  • tests for all-zero and all-one ranges
  • counting the number ones/zeros in a range
  • searches for first/last ones/zeros in a given range
  • searches for next/previous ones/zeros from a given index
  • copying and viewing bit ranges
  • obtaining bits as Java primitive types

Most such methods are available in both an operation specific version and operation parameterized version to cater for different application needs.

Performance should be adequate for most uses. There is scope to improve the performance of many methods, but none of the methods operate in anything more than linear time and inner loops are mostly ‘tight’. An implementation detail (which may be important on some platforms) is that, with some exceptions, none of the methods perform any object creation. The exceptions are: methods that require an object to be returned, floatValue(), doubleValue(), and situations where an operation is applied with overlapping ranges of bits, in which case it may be necessary to create a temporary copy of the BitVector.

The class is marked as final to ensure that immutable instances can be safely used in security sensitive code (eg. within a PrivilegedAction).

blog comments powered by Disqus