Wednesday, December 29, 2010

More Oddities

Nothing particularly enlightening, this is more a scratchpad for me.

Taking yesterday's code, I got to thinking what happens if I compile a class external to our friend, the class that appears to violate the language spec. It has two methods:

/** This code compiles in Eclipse 3.4 but does not in Eclipse Helios */
public class ErasureTest extends TestCase {
.
.
.
public String sameMethod(Class clazz) {
return STRING;
}

public Integer sameMethod(Class clazz) {
return INTEGER;
}
The Java Language Spec says this should be forbidden but the JVM seems happy with it. Whatsmore, I now have a class that looks like this:

package com.henryp.lang;

public class ErasureCaller {
public static void main(String[] args) {
new ErasureTest().sameMethod(ErasureCaller.class);
}
}

And upon compilation, I get:

phillip:Phill henryp$ javac -d bin -cp .:/Users/henryp/Documents/workspace/Test/bin:/Users/henryp/.m2/repository/junit/junit/3.8.2/junit-3.8.2.jar com/henryp/lang/ErasureCaller.java
com/henryp/lang/ErasureCaller.java:5: cannot find symbol
symbol : method sameMethod(java.lang.Class<com.henryp.lang.ErasureCaller>)
location: class com.henryp.lang.ErasureTest
new ErasureTest().sameMethod(ErasureCaller.class);
^
1 error
because the generic information in the signature is not entirely erased as we saw yesterday.

But this is fine:

package com.henryp.lang;

public class ErasureCaller {
public static void main(String[] args) {
new ErasureTest().sameMethod(String.class);
}
}
As if the erasures were not, well, erased at all. Which leaves me wondering what's the use of erasures.

"Type erasure exists so that new code may continue to interface with legacy code."
(Type Erasure, The Java Tutorials)

But this seems to be a unidirectional relationship. New code can use old binaries but type erasure does not mean old code can use new binaries. Setting the -source and -target switches on javac to 1.4 does not mean generic code will compile.

Tuesday, December 28, 2010

I am not a language lawyer...

... but this week has been a strange one for delving into the minutia of the Java language as well as the JVM.

First, some of us upgraded our Eclipse to Helios. Despite the fact that we were using the same JVM as before, some of our code was not compiling with the new version of the IDE when it was perfectly fine with the old. The problem was similar to what was outlined in this Eclipse bug here and the JDK bug here.

Our code looked something similar to:

package com.henryp.lang;

import org.junit.Test;

import junit.framework.TestCase;

/** This code compiles in Eclipse 3.4 but does not in Eclipse Helios */
public class ErasureTest extends TestCase {
private static final int INTEGER = 3;
private static final String STRING = "String";

public void testWhichOneIsCalled() {
Object anObject = sameMethod(String.class);
assertEquals(STRING, anObject);
}

public String sameMethod(Class<String> clazz) {
return STRING;
}

public Integer sameMethod(Class<Integer> clazz) {
return INTEGER;
}
}

The question being: if generics are erased, how does the compiler know which sameMethod method to call?

[Aside: although it has no bearing on this discussion, generic declarations are not completely removed. If a library uses generics, your code can take advantage of them even if you don't have the source code. Just to prove it, this class demonstrates you can see the erased parameter types via introspection:

package com.henryp.lang;

import java.lang.reflect.Method;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;

import junit.framework.TestCase;

public class GenericErasureTest extends TestCase {

public void testGetGeneric() throws Exception {
Method method = ClassWithParametizedMethod.class.getMethod("aMethodWithGenerics", Class.class);
assertNotNull(method);
Type[] genericParameterTypes = method.getGenericParameterTypes();
assertNotNull(genericParameterTypes);
assertEquals(1, genericParameterTypes.length);
Type type = genericParameterTypes[0];

assertTrue(type instanceof ParameterizedType);
ParameterizedType parameterizedType = (ParameterizedType)type;
Type[] actualTypeArguments = parameterizedType.getActualTypeArguments();
assertNotNull(actualTypeArguments);
assertEquals(1, actualTypeArguments.length);
assertEquals(String.class, actualTypeArguments[0]);
}
}

class ClassWithParametizedMethod {
public void aMethodWithGenerics(Class<String> clazz) {

}
}


If we decompile ClassWithParametizedMethod, we see something like:

phillip:Test henryp$ javap -c -verbose -classpath ./bin com.henryp.lang.ClassWithParametizedMethod
.
.
public void aMethodWithGenerics(java.lang.Class);
Signature: length = 0x2
00 11
.
.


The 11 is hexadecimal and refers to an entry in the constant pool (0x11 = 17):

const #17 = Asciz (Ljava/lang/Class<Ljava/lang/String;>;)V;

and that's where our generics information is stored [1]

]

END OF ASIDE

The Java Language Specification says:

"It is a compile-time error to declare two methods with override-equivalent signatures (defined below) in a class.

"Two methods have the same signature if they have the same name and argument types.

"Two method or constructor declarations M and N have the same argument types if all of the following conditions hold:

  • They have the same number of formal parameters (possibly zero)
  • They have the same number of type parameters (possibly zero)
  • Let <a1,...,An> be the formal type parameters of M and let <b1,...,Bn> be the formal type parameters of N. After renaming each occurrence of a Bi in N's type to Ai the bounds of corresponding type variables and the argument types of M and N are the same."
(JLS 8.4.2)

What does this mean? Let's define some terms.

Generic
"A class, interface, or method that declares one or more type variables. These type variables are known as type parameters."
(Java glossary)

Type Parameters
"A method is generic if it declares one or more type variables. These type variables are known as the formal type parameters of the method. "
(JLS 8.4.4)

Formal Parameters
"The formal parameters of a method or constructor, if any, are specified by a list of comma-separated parameter specifiers. Each parameter specifier consists of a type (optionally preceded by the final modifier and/or one or more annotations) and an identifier (optionally followed by brackets) that specifies the name of the parameter. "
(JLS 8.4.1)

Aside: parameters and arguments are often used synonymously but they are slightly different. Parameters define the method or constructor at compile time. Arguments are what are passed to the method or constructor at runtime [2]. Alternatively, the words actual and formal can be used to distinguish between an argument and a parameter, respectively [3] [4].

It appears that the last of these bullet points in JLS 8.4.2 (quoted above) is describing type erasure. It's saying that

public String sameMethod(Class<String> clazz)
and

public Integer sameMethod(Class<Integer> clazz)
are the same as far as the Java language is concerned and so this is a bug in the compiler.

But the interesting thing is that although the Java language says you can't have this, the JVM does not seem to mind (although there is some confusion about this that I have not resolved - see below). We have the odd situation of having Java byte code that cannot be rendered into Java.

Decompiling ErasureTest.testWhichOneIsCalled, we see:

public void testWhichOneIsCalled();
Code:
Stack=2, Locals=2, Args_size=1
0: aload_0
1: ldc #23; //class java/lang/String
3: invokevirtual #25; //Method sameMethod:(Ljava/lang/Class;)Ljava/lang/String;


The comment tells us that item #25 in the constant pool is indeed our method definition:

const #25 = Method #1.#26; // com/henryp/lang/ErasureTest.sameMethod:(Ljava/lang/Class;)Ljava/lang/String;

This is confirmed by Bill Venner's splendid Inside the Java 2 Virtual Machine:

"The method_info table contains several pieces of information about the method, including the method's name and description (its return type and argument types)."
(p200)

Note, if the two methods did not have the same return type, the JVM would treat them as the same and the class would fail linking where:

"The virtual machine checks the referenced class for a method of the specified name and descriptor [including its return type]. If the machine discovers such a method, that method is the result of the successful method lookup. [...] Otherwise the method lookup fails."
(ibid, p289)

However, I mentioned there was some confusion about this as the actual JVM spec says:

"The signature of a method consists of the name of the method and the number and type of formal parameters of the method. A class may not declare two methods with the same signature."
(Java VM Spec 2.10.2)

When we have exactly this. Maybe there is something wrong with my JVMs (Sun 1.6.0_20 on Windows and 1.5.0_22 on Mac).

So, that was the first odd thing this week. The second was similar. An Ant script was not deleting the binary directory before building. As a result, old class versions were hanging around when new classes that they called were changing. This lead to a hard-to-debug exception being thrown in Hibernate that was not its fault. I had to go hiking through Hibernate code to realise this, though.

But the problem that manifests itself has a bearing on what is discussed above. Again, it supports Bill Venners rather than the JVM Spec itself.

Try this: write two classes as below:
package com.henryp.lang;

public class VersionCaller {

public static void main(String[] args) {
Object object = new VersionedClass().getNumber();
System.out.println(object);
}

}

package com.henryp.lang;

public class VersionedClass {

public Integer getNumber() {
return 0;
}

}
Copy them somewhere else, so:


phillip:Test henryp$ cp bin//com/henryp/lang/VersionedClass.class /tmp/Phill/com/henryp/lang
phillip:Test henryp$ cp bin//com/henryp/lang/VersionCaller.class /tmp/Phill/com/henryp/lang
phillip:Test henryp$ java -cp /tmp/Phill com.henryp.lang.VersionCaller
0


No surprises. So, just subtly change the return type of getNumber() to return a Number (after all java.lang.Integer extends java.lang.Number so it should be OK, right?).

package com.henryp.lang;

public class VersionedClass {

public Number getNumber() {
return 0;
}

}
This time, just copy VersionedClass and then run.


phillip:Test henryp$ cp bin//com/henryp/lang/VersionedClass.class /tmp/Phill/com/henryp/lang
phillip:Test henryp$ java -cp /tmp/Phill com.henryp.lang.VersionCaller
Exception in thread "main" java.lang.NoSuchMethodError: com.henryp.lang.VersionedClass.getNumber()Ljava/lang/Integer;
at com.henryp.lang.VersionCaller.main(VersionCaller.java:9)


This happens because VersionCaller was compiled against an old version of VersionedClass:

public static void main(java.lang.String[]);
Code:
Stack=2, Locals=2, Args_size=1
0: new #16; //class com/henryp/lang/VersionedClass
3: dup
4: invokespecial #18; //Method com/henryp/lang/VersionedClass."":()V
7: invokevirtual #19; //Method com/henryp/lang/VersionedClass.getNumber:()Ljava/lang/Integer;


So, it begs the question that if calling code can distinguish between two methods by virtue of their return type, why the JVM spec says you cannot have two otherwise identical methods?

[1] http://stackoverflow.com/questions/937933/where-are-generic-types-stored-in-java-class-files
[2] http://mindprod.com/jgloss/parameters.html
[3] http://en.wikipedia.org/wiki/Parameter_%28computer_science%29
[4] http://stackoverflow.com/questions/156767/whats-the-difference-between-an-argument-and-a-parameter