Tuesday, September 30, 2014

Constructors and Assignment in Different Languages


If you focus on Java all the time it's easy to forget that something as simple as constructors and assignments follow very different paths in other languages.

Scala

Let's say we have a simple Scala class that takes no arguments in construction:

class ClassWithNoArgs {
  println("ClassWithNoArgs constructed")

  def fn() = { println("ClassWithNoArgs.fn") }
}

Now, let's instantiate it, assign it and use it:

    def classWithNoArgs = new ClassWithNoArgs() 
    println("classWithNoArgs assigned. About to use...")
    classWithNoArgs.fn()                        

So, a Java programmer might be surprised to see this output:

classWithNoArgs assigned. About to use...
ClassWithNoArgs constructed
ClassWithNoArgs.fn

That is, we make an assignment but not until we use the reference does instantiation take place!

This is entirely down to us using the def keyword rather than val. If the above code had been:

    val classWithNoArgs = new ClassWithNoArgs()
.
.

Then the output would have been more familiar to a Java programmer.

Similarly, if we had a function that had side effects then returned 1, such as:

    def fn() = {
      println("calling something")
      1
    }

and two similar functions:

  def callByValue(x: Int) = {
    println("x1=" + x)
    println("x2=" + x)
  }

  def callByName(x: => Int) = {
    println("x1=" + x)
    println("x2=" + x)
  }

we could pass fn to both functions but in a call to the first, fn will be evaluated before the call and in the second, it will be evaluated twice - once upon each call to println.

Consequently, since both functions can take fn, we cannot overload functions with these two different argument lists. For instance, we could not have in the same namespace functions overloaded(x: Int) and overloaded(x: => Int).

(I stole this code from this excellent clarification)

C++

Let's take two very simple classes, one extending the other. First the base class that we will imaginatively call BaseClass:

BaseClass.h

#include <iostream>

using namespace std;

class BaseClass {
public:
    BaseClass();
    BaseClass(string aString);
    BaseClass(const BaseClass& orig);
    virtual ~BaseClass();


    const string aString;

private:

};

BaseClass.cpp

#include "BaseClass.h"

BaseClass::BaseClass() {
    cout << "BaseClass::BaseClass()\n";
}

BaseClass::BaseClass(string aString) : aString(aString) {
    cout << "BaseClass::BaseClass(string)\n";
}

BaseClass::BaseClass(const BaseClass& orig) {
    cout << "BaseClass::BaseClass(const BaseClass& orig)\n";
}

BaseClass::~BaseClass() {
    cout << "BaseClass::~BaseClass()\n";
}

And now the sub class that we will call with equal imagination, SubClass:

SubClass.h

#include <iostream>
#include "BaseClass.h"

using namespace std;

class SubClass : BaseClass {
public:
    SubClass();
    SubClass(string aString);
    SubClass(const SubClass& orig);
    virtual ~SubClass();
private:

};

SubClass.cpp

#include "SubClass.h"

SubClass::SubClass() {
    cout << "SubClass::SubClass()" << endl;
}

SubClass::SubClass(string aString) : BaseClass(aString) {
    cout << "SubClass::SubClass(string aString)" << endl;
}

SubClass::SubClass(const SubClass& orig) {
    cout << "SubClass::SubClass(const SubClass& orig)" << endl;
}

SubClass::~SubClass() {
    cout << "SubClass::~SubClass()" << endl;
}

(My, isn't C++ verbose when you compare it to Scala?)

When we point to objects in C++, things are very similar to Java. But when we use code like this:

    int main(int argc, char** argv) {
.
.
    SubClass subclass; // instantiates a SubClass
    SubClass otherSubclass = subclass;.
.
.
    }
The output is very different.

BaseClass::BaseClass()
SubClass::SubClass()
BaseClass::BaseClass()
SubClass::SubClass(const SubClass& orig)
.
.
SubClass::~SubClass()
BaseClass::~BaseClass()
SubClass::~SubClass()
BaseClass::~BaseClass()

The last 4 lines are the destructors being called when we leave the method. But what of the first 4 lines?

The first two lines are fair enough. We're instantiating the SubClass via C++'s syntax for putting it on the stack. Just like Java, C++ put's an implicit call to super in the constructor for us automatically.

The third and fourth are the super constructor (again) and the copy constructor that assigns subclass to otherSubclass. This is very much unlike Java where there would be two references to the same object but here we have two different objects. We can prove this by writing a CppUnit test that looks like this:

    SubClass subClass;
    SubClass otherSubClass = subClass;

    CPPUNIT_ASSERT(&subClass != &otherSubClass);

Implicitly, C++ is passing otherSubclass to the copy constructor and it's the job of the developer to populate his new object. 

Now, let's instantiate another SubClass but this time with an argument:

    string      aString         = "a string";
    SubClass    subClassWArg    = SubClass(aString);
    CPPUNIT_ASSERT(aString == subClassWArg.aString); // must have Subclass : public BaseClass to access this

and use the copy-constructor again:

    SubClass    otherWArg       = subClassWArg;
    CPPUNIT_ASSERT(otherWArg.aString == subClassWArg.aString);

Hmm, this fails. So, let's change the copy constructor thus:

SubClass::SubClass(const SubClass& orig) : BaseClass(orig.aString) {
.
.

And that passes.

Note, you usually do assignments in C++ with:

    this->aString = orig.aString;

but in our code, we made aString a constant so this would cause a compilation error. Note, by making it constant, we must write our own copy-constructor and not use the compiler implicitly generates for us (see here for why). This demonstrates the difference between initialization (the former) and assignment (the latter).


Java

I won't spend too long on this as most Java programmers should know the order of instantiation and assignment but suffice to say Java assignment/instantiation follows the order of C++, behaves like C++ pointer assignment (not object assignment) and is like Scala's use of the val keyword using call-by-value semantics.

Wednesday, September 17, 2014

Scala Crib Sheet #3

Scala for the busy Java programmer:

Type Aliasing

In Akka, you'll see this in the Actor object:

type Receive = PartialFunction[Any, Unit]

What this means is that where ever we see Receive we mean PartialFunction[Any, Unit]. This is type-aliasing much like in C where, for example, you see:

   typedef int jint;

This is from the OpenJDK's jni_x86.h where jint is defined as an int in Linux. 

Obviously, running in the JVM, Scala does not need to define the type of something like jint according to architecture. But it's more than a convenience. Scala has something called type members whose effect is a little like parameterized types but promises not to bloat as quickly as generics can.

"In Scala, the inner class is addressed using the expression Outer#Inner instead of Java's Outer.Inner. The '.' syntax is reserved for objects" [1]

Let's use a slightly modified version of Odersky's example [1] to illustrate this:

abstract class Food

class Grass extends Food {
  override def toString() = "grass"
}

class DogFood extends Food {
  override def toString() = "dog food"
}

abstract class Animal {
  type SuitableFood

  def eat(food : SuitableFood) = { 
    println(this.toString + " eats " + food.toString)
  }
}

class Dog extends Animal {
  type SuitableFood  = DogFood

  override def toString() = "dog"
}

class Cow extends Animal {
  type SuitableFood  = Grass

  override def toString() = "cow"
}

object Animals {

  def main(args : Array[String]) {
    val dog                     = new Dog
    val cow                     = new Cow
    val dogFood                 = new DogFood
    val grass                   = new Grass

    dog.eat(dogFood)
    cow.eat(grass)
  }

  def doEat(animal : Animal, food : Animal#SuitableFood) {
    /* This fails to compile with:

 found   : food.type (with underlying type com.phenry.scala.Animal#SuitableFood)
 required: animal.SuitableFood 

    */

    animal.eat(food) // <-- does not compile!
  }
}

The same effect can be achieved with Java generics but that can be subverted via erasure. This appears more solid. More information can be found here.


Emptiness

There are a few ways to represent nothingness in Scala and a good description lives here and here. The main points are that null is just like Java, Nil is an empty List and Unit is just like Java's void.

This last one is interesting. There is only one way to have an instance of Unit and it's represented as (). It's an actual reference (unlike java.lang.Void which cannot be instantiated due to its private constructor) and can be used like this:

  def takesVoidToString(arg: () => String) : Unit = { 
    println("called with " + arg()) 
  }

  def main(arg : Array[String]) = {
    def fnTakesVoidReturnsString() = "fnTakesVoidReturnsString"
    takesVoidToString(fnTakesVoidReturnsString)
.
.

which in this case prints out:

called with fnTakesVoidReturnsString

Here our function takesVoidToString takes another function that in turn takes no arguments and just returns a String. By calling arg() we implicitly call fnTakesVoidReturnsString's apply method. Without the (), we'd see arg's toString method called and see that it is a Function0.

A note on syntax can be found here.

Finally, Nothing can be used when a function does not terminate properly. For example:

scala> def fnThrowsException = throw new UnsupportedOperationException
fnThrowsException: Nothing

That is, the REPL is telling us the return type of fnThrowsException is Nothing.


Partial Application of Functions in Scala

The syntax for a function that can be partially applied looks like this:

  def canBePartiallyApplied(count : Int)(f : Int => Int) { // this compiles but we'd like syntactic sugar: (f : Function1[Int, Int])
    for (i <- 1 to count) {
      println(f(i))
    }
  }

And we call it thus:

    def g(x : Int) = x + 1
    def partiallyApplied = app.canBePartiallyApplied(3)(_)
    partiallyApplied(g)

Note the underscore to indicate that we'll be filling that in later. This is similar but different to currying. The idea of currying takes one argument and returns a function. Partial application takes two arguments (but also returns a function). See here for more information.


Scala Hierarchy

Absolutely everything extends Any.

The Scala equivalents of primitives in Java extends AnyVal. All other types extend AnyRef.

Null is the base class for all  AnyRef.

Nothing is the base class for Null and all AnyVal.


Terminology

"Anonymous functions in source code are called function literals" [1].



Monday, September 15, 2014

Scala Crib Sheet #2

More Scala for Java programmers:

Case statements as functions

In Akka's sample.hello.HelloWorld:

  def receive = {
    // when the greeter is done, stop this actor and with it the application
    case Greeter.Done => context.stop(self)
  }

"Case sequences as partial functions: A sequence of cases (i.e. alternatives) in curly braces can be used anywhere a function literal can be used. Essentially, a case sequence is a function literal, only more general." [1]

Partial Functions

Not to be confused with partially applied functions, this borrows from the mathematical concept of a partial function where not all values in the domain map to other values (this contrasts with a total function). An example of a partial function in mathematics is the log function over all real numbers. Clearly, negative real numbers do not map to any real numbers.

Following on from the idea above of case statements as more generalized functions, then a case statement that does not cover every eventuality is a partial function. The Scala compiler will issue a warning but will not abort compilation of this (although you might get runtime errors).

Taking this example in the Scala docs:

    val isEven: PartialFunction[Int, String] = {
        case x if x % 2 == 0 => { println(x) ; x+" is even" }
    }

we can see that this is indeed a partial function as it doesn't return anything in the event x is odd.

All the other nice functionality that Scala gives you for this class is added by the compiler. For instance, I didn't implement the isDefinedAt function but the compiler did it for me:

$ javap -c ./target/scala-2.10/classes/com/phenry/scala/MyScala\$\$anonfun\$1.class
.
.
  public final boolean isDefinedAt(int);
    Code:
       0: iload_1       
       1: istore_2      
       2: iload_2       
       3: lookupswitch  { // 0
               default: 12
          }
      12: iload_2       
      13: iconst_2      
      14: irem          
      15: iconst_0      
      16: if_icmpne     23
      19: iconst_1      
      20: goto          24
      23: iconst_1      
      24: ireturn 
.
.

which when JADed, becomes:

    public final boolean isDefinedAt(int x1)
    {
        int i = x1;
        boolean flag;
        if(i % 2 == 0)
            flag = true;
        else
            flag = false;
        return flag;
    }


Type parameter hints

In Akka's sample.hello.HelloWorld:

    // create the greeter actor
    val greeter = context.actorOf(Props[Greeter], "greeter")

which calls:

  /**
   * Scala API: Returns a Props that has default values except for "creator" which will be a function that creates an instance
   * of the supplied type using the default constructor.
   */
  def apply[T <: Actor: ClassTag](): Props = apply(defaultDeploy, implicitly[ClassTag[T]].runtimeClass, List.empty)


Unlike Java, Scala can instantiate objects from type parameters. "What's required here is that you help the compiler by providing a runtime hint of what the actual type parameter is ... This means following the type with a a colon and the class name ClassManifest."  [1]. Since Programming in Scala was published, this ClassManifest has been deprecated in favour of ClassTag which is what we see here. It carries the type information that was erased at compile time.

This adding of an implicit parameter is called a context bound.

"Implicits in subclasses and subobjects take precedence over implicits in base classes." [1]

[1] Programming in Scala, Odersky.

Tuesday, September 9, 2014

SSH: Performance and Behaviour


The problem

We had set up a Jetty server to use SSL much as in a previous post. This was fine until we started pooling connections (pooling connections didn't greatly improve performance in London-to-London communication but improved Hong Kong-to-London performance by about 30% since the ping time for a packet was a huge 220ms). But when we pooled connections and introduced SSL, all communication froze after some happy-path results.

The problem did not appear to be with encryption itself as the first few requests succeeded. But after 30 hits, all the Jetty threads were blocked like this (from running jstack):

"qtp401625763-13" #13 prio=5 os_prio=0 tid=0x00007f3e2021b800 nid=0x1ecb runnable [0x00007f3e0d2e7000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
- locked <0x00000000d90404a8> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:911)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000000d906e358> (a sun.security.ssl.AppInputStream)
at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)

(where we're using Jetty 7.6.15)

At first, we thought something was wrong with our certificates etc but a handful of requests at first went thorugh without issue. But look for this in the server-side logs when you have given the JVM argument -Djavax.net.debug=all

*** ServerHelloDone

on both the client and server side, then you know the server completed the handshake OK. Look for:

main, READ: TLSv1.2 Change Cipher Spec, length = 1

or, in Jetty: 

qtp1418621776-17, WRITE: TLSv1.2 Change Cipher Spec, length = 1

and you'll know the whole handshake pretty much completed OK.

What happens in SSL?

Asymmetric encryption used briefly at start-up to establish a more efficient cipher. The private keys in this exchange are ephemeral and generated on the server side as soon as it hears the client say "hello" and on the client side as soon as it receives the server's choice of key.

Syn

In kickstarting the SSL handshake, the client advertises all of its cipher suite codes, elliptic curve details etc to the server (see sun.security.ssl.HandshakeMessage$ClientHello.send(..) ). The client thread then blocks waiting for the server to respond.

Syn/Ack

Upon receiving the client's message, (see sun.security.ssl.ServerHandshaker.clientHello(..) ) the server chooses an algorithm that both client and server support. When I was stepping through the code, this appeared to be DSA. The server-side must have a public and private key that corresponds to this algorithm in its keystore (see ServerHandshaker.setupPrivateKeyAndChain(..)).

Here, a sun.security.ssl.DHCrypt is instantiated. From the JavaDocs:

"This class implements the Diffie-Hellman key exchange algorithm.  D-H means combining your private key with your partners public key to generate a number. The peer does the same with its private key and our public key. Through the magic of Diffie-Hellman we both come up with the same number. This number is secret (discounting MITM attacks) and hence called the shared secret."

Along with the server certificates, all of this is sent to the client and the server thread blocks.

Ack

The client then unblocks and deserializes the ServerHello created on the server side (via a bespoke deserialization process). It uses the cipher suite the server told it to use and checks the server's certificates and stores the server's public key. It then sends its own public key (via DHClientKeyExchange) to the server.

Symmetric cipher keys are then generated (Handshaker.calculateConnectionKeys), the client tells the server that it is ready to talk then blocks.

A little more server Ack

Given what the client and server have exchanged, the server now sets its own agreed secret key using the same method in Handshaker (that is the superclass to both ServerHandshaker and ClientHandshaker) and sends a Finished object back to the client. The server is now finished and it notifies an listeners in a separate thread (this listener can be found as an inner class in Jetty's SslConnectorEndPoint.run). The thread in SslConnectorEndPoint then awaits incoming data.

Introducing MAT

There's a very nice tool from the ladies and gentlemen of Eclipse called MAT. Very quickly, you can dump the memory of a JVM and query its contents with a SQL-like language. For instance, I found all the client-side sockets that were listening to my server port of 8192 by executing:

select * from java.net.SocksSocketImpl where port = 8192

Interestingly, all their incoming references originated in the connection pool. So that is what is keeping them open and correspondingly keeping the server threads listening for a request that never comes! All client threads doing something useful are blocked by Jetty not having any server threads to service them; and all those Jetty threads are listening on sockets whose other end is held open by the idle sockets in the client's connection pool.

Conclusion

By replacing Jetty's SslSocketConnector with its SslSelectChannelConnector, the server behaves asynchronously and connection pooling doesn't make it grind to a halt.

We rolled-our own nonce encryption and it performed poorly. When we got rid of it, we found that Jetty using SSL had roughly the same performance as plain, clear-text HTTP.


Further reading

1. A very good (maths) blog on elliptic curves.