Saturday, March 16, 2013

Multidimensional arrays and Java


There are no true multidimensional arrays in Java. This is Java 101. But why?

I always knew that Java's closest representation was arrays of arrays. This differs from true multidimensional arrays as the length of the second array is not bound. That is, an  M x N multidimensional array can be represented in Java but also, it might be jagged (that is, N is not the same for all M).


        int[][] array2D = new int[2][];

        array2D[0] = new int[] {1,2,3};
        array2D[1] = new int[] {1,2,3,4,5};

        // prints: [[1, 2, 3], [1, 2, 3, 4, 5]]
        System.out.println(Arrays.deepToString(array2D));

        assertEquals(3, array2D[0].length);
        assertEquals(5, array2D[1].length);

In a true multidimensional arrays, each row would have the same length.

This shouldn't be too surprising. But a silly bug this week also highlighted another aspect of why 2 (or more) dimensioned arrays are not true multidimensional arrays.


        int[][] original = new int[10][10];
        assertEquals(0, original[5][5]);
        int[][] copied = original.clone();

        // change element in the '1st dimension'
        copied[4] = new int[] {1, 2, 3, 4, 5};
        assertEquals(2, copied[4][1]);
        assertEquals(0, original[4][1]); // change not seen in the original

        // change element in the '2nd dimension'
        copied[5][5] = 100;
        assertEquals(100, copied[5][5]);
        assertEquals(0, original[5][5]); // fails. change *is* seen in the original

Cloning an array produces a deep copy of the first array but not the elements in it (the other "dimensions"). These other "dimensions" of the array are treated like any other object and are not deep copied. In this way, our "multidimensional" array is just like a 1-D array of any other type of object.

Sorted!


Surprises in SortedSet

There is more information in a collection than its elements. There is also its structure.

For instance, if the elements of a collection were steps in a recipe, the order of these steps is significant. Change the order and your meal may taste bad.

So, if I used a SortedSet to hold the steps in my recipe, I might reasonably assume that this is not the same recipe as one with the same steps but in a different order.

However, in Java this is not true.


        SortedSet sortedSet = new TreeSet();
        addElementsInAnyOrder(sortedSet);
        
        assertOrder(sortedSet); // passes
        
        Set noSortGuarantee = new HashSet();
        addElementsInAnyOrder(noSortGuarantee);
        
        assertEquals(sortedSet, noSortGuarantee);
        
        assertOrder(noSortGuarantee); // fails


TreeSet.equals doesn't take order into account. Indeed, the equals method is in the superclass - the same superclass as HashSet, which has no notion of order.

Consistent With Equals

The JavaDocs say:

"The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C.... Virtually all Java core classes that implement Comparable have natural orderings that are consistent with equals. One exception is java.math.BigDecimal, whose natural ordering equates BigDecimal objects with equal values and different precisions (such as 4.0 and 4.00)."

BigDecimal is not consistent with equals. To check whether two BigDecimals are equal as most people would understand it, we must compare them thus:


        BigDecimal _4_00 = new BigDecimal("4.00");
        BigDecimal _4 = new BigDecimal("4");
        
        assertFalse(_4_00.equals(_4));        // wrong
        assertEquals(0, _4_00.compareTo(_4)); // right


Identity

Comparisons are useful in identity elements. An identity element "is a special type of element of a set with respect to a binary operation on that set. It leaves other elements unchanged when combined with them."

For instance, for the + operator, the identity is 0 (since x + 0 = x).

For the * operator, the identity element is 1 (since x * 1 = x).

Where it gets interesting is that for the maximum operator, the identity element is negative infinity. That is, any argument to the partially applied function max(NEGATIVE_INFINITY, ... is itself.

For floating point numbers, x > NEGATIVE_INFINITY for all x.

The converse is true for the minimum function.

Friday, March 1, 2013

When x does not equal x

When is x != x ? How can this be?

Take this:


> insert into bar (id, avalue) values (22, null);
Query returned successfully: 1 rows affected, 18 ms execution time.
> select * from bar where avalue = null;
>


Hmm, but select * shows me that the row is there. OK, what about:


> select * from bar where not avalue = null;
>


I saw a bug only this week where somebody thought that just because a comparison with null didn't return anything that noting the predicate would mean the the row will show up in the result set. This is not the case. You would have to use is null in SQL.

Comparisons with null in SQL do not compute. This is part of the SQL standard called F571 and all databases I cared to check implement it. You can't even join tables on null values. For instance:


> insert into foo (id, avalue) values (33, null);
> select * from foo f, bar b where f.avalue = b.avalue;
>


SQL and Java are different in the way that they treat nulls. SQL uses Three Value Logic where null means undefined (or unknown or whatever semantics the developer wants to associate it with).

But you might be surprised to know that Java too has a notion of x != x.

Of course,

assertTrue(null == null);

is true. However, with floating points, this:

assertTrue(aDouble == aDouble);

may not always be true. This is nothing to do with rounding errors. If aDouble = Double.NaN then this assertion will fail (where NaN is not a number). The same is true for Float.NaN as defined in the IEEE spec.

Interestingly, infinities can be compared.

assertTrue((10d/0) == Double.POSITIVE_INFINITY);
assertTrue(Double.POSITIVE_INFINITY == Double.POSITIVE_INFINITY);

passes their assertions. And although (10d/0) may return an infinity, (0d/0d) returns NaN. This is consistent with the rules of arithmetic as mathematicians rather than software engineers understand it.

Finally, note that autoboxing may change your results, thus:


    public void testNans() {
        checkDoublePrimitivesEqual(Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY);
        checkDoubleObjectsNotEqual(Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY);
    }

    private void checkDoubleObjectsNotEqual(Double d1, Double d2) {
        assertFalse(d1 == d2);
    }

    private static void checkDoublePrimitivesEqual(double d1, double d2) {
        assertTrue(d1 == d2);
    }


[1] Wikipedia.