Sunday, September 30, 2012

When buffering does not improve performance

We wrote a simple server earlier this month and were surprised how inefficient it was when under heavy load.

Following the Agile mantra of the simplistic thing that works, we used standard Java classes such as java.io.BufferedReader. To our surprise, this proved a very poor performer when compared to reading the raw InputStream. To illustrate, let's create an interface thus:

package com.henryp.io.stream;

import java.io.IOException;

public interface StreamReader {

    public void readStream() throws IOException;

}

and have two implementing classes. The first is a simple implementation of BufferedReader:


package com.henryp.io.stream;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;

public class BufferedStreamReader implements StreamReader {

    private final BufferedReader bufferedReader;

    BufferedStreamReader(InputStream inputStream) {
        Reader reader = new InputStreamReader(inputStream);
        bufferedReader = new BufferedReader(reader);
    }

    public void readStream() throws IOException {
        String line = bufferedReader.readLine();
        while (line != null) {
            line = bufferedReader.readLine();
        }
    }

}


The second is a little more sophisticated. It doesn't allow the creation of Strings but instead uses byte arrays. What's more, it caches these byte arrays so it can reuse them:


package com.henryp.io.stream;

import java.io.IOException;
import java.io.InputStream;

public class FixedBufferStreamReader implements StreamReader {
    
    private static final int BUFFER_SIZE = 8192;
    
    private final ThreadLocal buffers = new ThreadLocal();

    private final InputStream inputStream;

    FixedBufferStreamReader(InputStream inputStream) {
        super();
        this.inputStream = inputStream;
    }

    @Override
    public void readStream() throws IOException {
        byte[] buffer = getBufferForThread();
        int read = inputStream.read(buffer);
        while (read != -1) {
            read = inputStream.read(buffer);
        }
    }

    private byte[] getBufferForThread() {
        byte[] buffer = buffers.get();
        if (buffer == null) {
            buffer = new byte[BUFFER_SIZE];
            buffers.set(buffer);
        }
        return buffer;
    }

}


Note, neither implementation actually parses the data. Obviously, using raw byte arrays will be harder but if you want better performance, you'll have to take that hit.

The test harness looks like this:


package com.henryp.io.stream;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;

public class StreamReaderStressTest {
    
    private static final int CONTENT_SIZE = 10000;

    private final int numThreads = 100;
    
    private final int numIterations = 200;
    
    private final AtomicInteger readerIndex = new AtomicInteger(-1);

    private final StreamReader[] streamReaders;
    
    private final CountDownLatch countDownLatch = new CountDownLatch(numThreads);

    public static void main(String[] args) throws InterruptedException {
        long initialMemory = memory();
        System.out.println("Initializing...");
        StreamReaderStressTest app = new StreamReaderStressTest();
        long memoryBeforeRun = memory();
        System.out.println("Sleeping...");
        Thread.sleep(4000L);
        app.runTest();
        long memoryAfterRun = memory();
        System.out.println("Memory consumed during setup: " + (initialMemory - memoryBeforeRun) + " bytes"
                + "\nChange in memory during test: " + (memoryBeforeRun - memoryAfterRun) + " bytes");
    }
    
    private static long memory() {
        System.gc();
        long freeMemory = Runtime.getRuntime().freeMemory();
        System.out.println("Free memory: " + (freeMemory / (1024L*1024L)) + " MiB");
        return freeMemory;
    }

    StreamReaderStressTest() {
        InputStream[] inputStreams = initialize();
        streamReaders = new StreamReader[inputStreams.length];
        for (int i = 0 ; i < inputStreams.length ; i++) {
            streamReaders[i] = 
                //new FixedBufferStreamReader(inputStreams[i]); // 13ms
                new BufferedStreamReader(inputStreams[i]); // 367ms
        }
    }

    private void runTest() {
        System.out.println("About to start test...");
        StreamReadingThread[]   readers = new StreamReadingThread[numThreads];
        Thread[]                threads = new Thread[numThreads];
        for (int i = 0 ; i < numThreads ; i++) {
            readers[i] = new StreamReadingThread();
            threads[i] = new Thread(readers[i]);
            threads[i].start();
        }
        waitForThreadsToEnd(threads);
        outputStats(readers);
    }
    
    private void outputStats(StreamReadingThread[] readers) {
        long totalTime = 0;
        for (int i = 0 ; i < readers.length ; i++) {
            totalTime += readers[i].getDuration();
        }
        System.out.println("\nAverage time per thred " + (totalTime / (numThreads)) + "ms\n");
    }

    private void waitForThreadsToEnd(Thread[] threads) {
        for (int i = 0 ; i < threads.length ; i++) {
            try {
                threads[i].join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    class StreamReadingThread implements Runnable {
        
        private volatile long duration = 0;

        @Override
        public void run() {
            getReady();
            long start = System.currentTimeMillis();
            try {
                for (int i = 0 ; i < numIterations ; i++) {
                    streamReaders[readerIndex.incrementAndGet()].readStream();
                }
                duration = System.currentTimeMillis() - start;
            } catch (IOException x) {
                x.printStackTrace();
            }
        }

        private void getReady() {
            countDownLatch.countDown();
            try {
                countDownLatch.await();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        
        long getDuration() {
            return duration;
        }
    }

    private InputStream[] initialize() {
        int numStreams = numThreads * numIterations;
        InputStream[] inputStreams = new InputStream[numStreams];
        String stringForStream = createArbitraryString(CONTENT_SIZE);
        for (int i = 0 ; i < numStreams ; i++) {
            inputStreams[i] = new ByteArrayInputStream(stringForStream.getBytes());
        }
        return inputStreams;
    }

    private String createArbitraryString(int length) {
        StringBuffer szb = new StringBuffer();
        for (int i = 0 ; i < length ; i++) {
            szb.append("A");
        }
        return szb.toString();
    }

}


Note:

  • All initialization is done up-front.
  • A 4 second pause is implemented before the test so there is no interference from biased locking being turned off at start up (see here for a full description).
  • Run the JVM with the arguments -Xmx1024m -Xmn1024m so the memory is fully initialized to begin with.

Over 5 runs each, the use of BufferedReader yielded a mean time of 1813ms whereas using the raw InputStream took a mean time of 380ms.

A clue for why this is the case is in the source code for BufferedReader. There is a liberal sprinkling of System.arraycopy().

Furthermore, the amount of memory used by the BufferedReaders was about 4 times greater than the InputStreams (roughly 700MB versus 185MB). This is not surprising when you look at the code for BufferedReader. It creates byte arrays on instantiation.

Greater efficiency using the raw InputStream was achieved by using a smaller size for the byte array. Reducing it to 1024 bytes gave a mean time of 217ms (although with a greater standard deviation).

Looking at the source code for GlassFish's Grizzly project, you can see ByteBuffers being cached to read the data from a SocketChannel (see the method org.glassfish.grizzly.nio.transport.TCPNIOUtils.readSimpleByteBuffer(...) for more information. The test org.glassfish.grizzly.http.server.HttpInputStreamsTest.testCharacter010() executes this code).

[Aside: to make Grizzly into an Eclipse project, you need to run:


mvn org.apache.maven.plugins:maven-eclipse-plugin:2.6:eclipse

See here for why.]




Saturday, September 8, 2012

Garbage Collection Rubbish

More memory leaks recently and again from various byte-code manipulators. A memory dump showed huge swathes of memory taken by classes whose names indicate that they were dynamically generated by CGLIB.

The problem manifested itself during integration test runs where we constantly start and stop Jetty servers. Since we start and stop Jetty programatically (see a previous post) and the way we used Spring seemed to dynamically generate many classes, these classes ultimately consumed all the memory because they were then be stored in static references - that is, they are stored with the classes. And these classes were not garbage collected since Spring's classloader was the same as that running the test.

"Types loaded through the bootstrap class loader will always be reachable and will never be unloaded. Only types loaded through user-defined class loaders can become unreachable and can be unloaded by the virtual machine." - Inside the Java 2 Virtual Machine, Bill Venners, p266.

If we'd deployed a WAR file presumably all generated classes would have been garbage collected when the web app was unloaded in the test's tear down method.

To demonstrate our problem, let's:
  1. write our own class loader
  2. use it to load a class
  3. instantiate an instance of that class and let it consume lots of memory by storing objects in one of its static references.
  4. allow the classloader to be garbage collected
  5. see how the memory changed during all these steps
OK, so here's the class loader:



package com.henryp.classloader;

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;

public class MyClassLoader extends ClassLoader {

    private final String baseDir;

    MyClassLoader(String baseDir) {
        super();
        this.baseDir = baseDir;
    }

    @Override
    protected Class findClass(String filename) throws ClassNotFoundException {
        filename                        = filename.replace(".", "/") + ".class";
        InputStream resourceAsStream    = null;

        try {
            URL     url         = new URL(baseDir + filename);
            System.out.println("URL = " + url);
            resourceAsStream    = url.openStream();
            System.out.println("Resource null? " + (resourceAsStream == null));
            byte[]  buffer      = new byte[4096];
            byte[]  bytes       = new byte[4096];
            int     read        = resourceAsStream.read(buffer);
            int     total       = 0;

            while (read != -1) { // this could be a lot more efficient but it does the job
                System.out.println("Read " + read + " bytes");
                byte[] dest = new byte[total + read];
                System.arraycopy(bytes,     0, dest, 0, total);
                System.arraycopy(buffer,    0, dest, total, read);
                bytes = dest;
                total += read;
                read = resourceAsStream.read(buffer);
            }
            return defineClass(null, bytes, 0, total);
        } catch (IOException e) {
            e.printStackTrace();
            throw new ClassNotFoundException(filename);
        } finally {
            try {
                resourceAsStream.close();
            } catch (Throwable e) {
                e.printStackTrace();
            }
        }
    }
}


Not awfully pretty but it doesn't need to be terribly efficient.

The main class that uses our class loader looks like this (notice that we measure the memory before and after our class loader is eligible for garbage collection):

package com.henryp.classloader;


public class Main {
    
    public static void main(String[] args) {
        Main app = new Main();
        app.doClassloading();
    }

    private void doClassloading() {
        MyClassLoader   classLoader = new MyClassLoader("file:/Users/phenry/Documents/workspace/TestOther/bin/");
        consumeMemory(classLoader);
        long            beforeNull  = gcThenLogMemory();
        System.out.println("Making classloader eligible for garbage collection");
        classLoader                 = null;
        long            afterNull   = gcThenLogMemory();
        System.out.println("Memory usage change: " + (afterNull - beforeNull) + " bytes liberated");
    }

    private void consumeMemory(MyClassLoader classLoader) {
        try {
            // filename from a project that depends on this but this project does not depend on that
            Class clazz = classLoader.findClass("com.henryp.classloader.OtherStaticReferenceBloater");
            consumeMemory(clazz);
        } catch (Exception e) {
            e.printStackTrace();
            System.exit(-1);
        }
    }

    private static long gcThenLogMemory() {
        System.gc();
        long freeMemory = Runtime.getRuntime().freeMemory();
        System.out.println("After garbage collection, free memory = " + freeMemory);
        return freeMemory;
    }

    private void consumeMemory(Class clazz) throws Exception {
        LargeMemoryConsumable   memoryConsumer  = instantiateMemoryConsumer(clazz);
        long                    initialMemory   = Runtime.getRuntime().freeMemory();
        doCallOnForeignObject(memoryConsumer);
        System.gc();
        long                    finalMemory     = Runtime.getRuntime().freeMemory();
        System.out.println("Initial memory = " + initialMemory);
        System.out.println("Final   memory = " + finalMemory);
        System.out.println("Total consumed = " + (initialMemory - finalMemory));
    }

    private LargeMemoryConsumable instantiateMemoryConsumer(Class clazz)
            throws InstantiationException, IllegalAccessException {
        System.out.println("Class name: " + clazz.getName());
        gcThenLogMemory();
        LargeMemoryConsumable memoryConsumer = (LargeMemoryConsumable) clazz.newInstance();
        checkDifferentClassLoader(clazz);
        return memoryConsumer;
    }

    private void doCallOnForeignObject(LargeMemoryConsumable memoryConsumer) throws Exception {
        for (int i = 0 ; i < 500000 ; i++) {
            memoryConsumer.consumeMemory();
        }
    }

    private void checkDifferentClassLoader(Class clazz) {
        if (clazz.getClassLoader() == this.getClass().getClassLoader()) {
            System.out.println("Experiment useless if the classloaders are the same");
            System.exit(-1);
        }
    }
}


And the memory consumer we refer to is a simple interface:



package com.henryp.classloader;

public interface LargeMemoryConsumable {
    public void consumeMemory() throws Exception;
}


Now, this is where it gets a little more interesting. In another project, we have an object that just takes up memory. But note that although this new project depends on the classes in our first project, the first project does not reference this new project. It does, however, use a URL that points at the directory that happens to be the directory into which the classes of the new project are compiled.

(I've added a toString method to stop the compiler optimizing away the aString member.)



package com.henryp.classloader;

public class MyOtherFatObject {

    public static Object create() {
        return new MyOtherFatObject("" + System.currentTimeMillis());
    }
    
    private final String aString;

    private MyOtherFatObject(String aString) {
        super();
        this.aString = aString;
    }

    @Override
    public String toString() {
        return "MyOtherFatObject [aString=" + aString + "]";
    }
    
}


In the same project, we have a class that emulates what we saw Spring doing, that is having a static reference to a collection of memory-consuming objects.



package com.henryp.classloader;

import java.util.ArrayList;
import java.util.Collection;

public class OtherStaticReferenceBloater implements LargeMemoryConsumable {
    
    private static final Collection objects = new ArrayList();

    public void consumeMemory() throws Exception {
        if (this.getClass().getClassLoader() != MyOtherFatObject.class.getClassLoader()) {
            throw new Exception("Different classloaders");
        }
        objects.add(MyOtherFatObject.create());
    }

    public void addObject(Object object) {
        objects.add(object);
    }

}



Running with these JVM args:



-verbose:gc -Xmx64m -Xmn64m

the output looks like:


URL = file:/Users/phenry/Documents/workspace/TestOther/bin/com/henryp/classloader/OtherStaticReferenceBloater.class
Resource null? false
Read 1300 bytes
Class name: com.henryp.classloader.OtherStaticReferenceBloater
After garbage collection, free memory = 63333768
URL = file:/Users/phenry/Documents/workspace/TestOther/bin/com/henryp/classloader/MyOtherFatObject.class
Resource null? false
Read 896 bytes
Initial memory = 63333768
Final   memory = 15130184
Total consumed = 48203584
After garbage collection, free memory = 15130200
Making classloader eligible for garbage collection
After garbage collection, free memory = 66266816
Memory usage change: 51136616 bytes liberated


(Notice how we explicitly load one class but it implicitly pulls in another using our class loader, the same class loader that loaded it - see the lines beginning with URL = ...).

Noting the usual caveats that System.gc() is a suggestion to the JVM and not a guarantee that the garbage collector should run, making the reference to the class loader null seems to liberate all the objects stored in the static reference of the class it loaded.

What we finally did was make all references we had in our integration tests to Spring's context loaders non-static. As our tests were garbage collected, memory was freed.