Tuesday, April 9, 2013

Dumpster Diving in the JVM

If you're monitoring your application, you might notice the maximum amount of heap memory changing with time. How can this be? Surely the maximum is, well, the maximum and not a moving target?

The JavaDocs for the MemoryUsage defines the maximums as:

"the maximum amount of memory (in bytes) that can be used for memory management. Its value may be undefined. The maximum amount of memory may change over time if defined. The amount of used and committed memory will always be less than or equal to max if max is defined. A memory allocation may fail if it attempts to increase the used memory such that used > committed even if used <= max would still be true (for example, when the system is low on virtual memory)."

In Linux, it's easy to reserve more memory than you can use. For instance:

#include <sys/mman.h>
size_t sillyAmountOfMemory = 1024 * 1024 * 1024 * 1024; // over 1 terabyte!
printf("About to map %lld bytes\n", sillyAmountOfMemory);
addr = mmap(NULL, sillyAmountOfMemory, PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

Meanwhile, in a shell:

[phenry@localhost MyMemoryTestsC]$ ps aux | head -1 ; ps aux | grep MyMemoryTest
phenry   11905  0.0  0.0 1050452  236 pts/0    S    13:43   0:00 /home/phenry/workspaceGanymedeC/MyMemoryTestsC/Debug/MyMemoryTests

Notice the VSZ value - that's the virtual size of our application:

[phenry@localhost MyMemoryTestsC]$ man ps | grep -A2 VSZ
       vsz         VSZ       virtual memory size of the process in KiB (1024-byte units). Device mappings are currently excluded; this is subject to change.

This is much more physical memory than I have on this machine:

[phenry@localhost MyMemoryTestsC]$ cat /proc/meminfo | head -1
MemTotal:        1026144 kB

Back to our Java process. Depending on the Garbage Collector being used, the maximum heap size may change to optimize performance. Oracle says:

"The statistics such as average pause time kept by the collector are updated at the end of each collection. The tests to determine if the goals have been met are then made and any needed adjustments to the size of a generation is made. The exception is that explicit garbage collections (e.g., calls to System.gc()) are ignored in terms of keeping statistics and making adjustments to the sizes of generations...

"If the maximum pause time goal is not being met, the size of only one generation is shrunk at a time. If the pause times of both generations are above the goal, the size of the generation with the larger pause time is shrunk first.

"If the throughput goal is not being met, the sizes of both generations are increased."

On a beefier machine, I can see the Parallel Scavenging collector being used. Oracle says of this type of collector:

"The parallel scavenge collector is similar to the parallel copying collector, and collects young generation garbage. The collector is targeted towards large young generation heaps and to scale with more CPUs. It works very well with large young generation heap sizes that are in gigabytes, like 12GB to 80GB or more, and scales very well with increase in CPUs, 8 CPUs or more. It is designed to maximize throughput in enterprise environments where plenty of memory and processing power is available.

"The parallel scavenge collector is again stop-the-world, and is designed to keep the pause down. The degree of parallelism can again be controlled. In addition, the collector has an adaptive tuning policy that can be turned on to optimize the collection. It balances the heap layout by resizing, Eden, Survivor spaces and old generation sizes to minimize the time spent in the collection. Since the heap layout is different for this collector, with large young generations, and smaller older generations, a new feature called "promotion undo" prevents old generation out-of-memory exceptions by allowing the parallel collector to finish the young generation collection."

(As an aside, there have been enhancements in JDK 7:

"The Parallel Scavenger garbage collector has been extended to take advantage of machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.

"In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from" and "to" survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average." [1])

Now, we connect to our Java process from another Java process and examine the first's MBeans with something like this:

        HashMap map = new HashMap();
        JMXConnector c = JMXConnectorFactory.newJMXConnector(createConnectionURL(host, port), map);
        Object o = c.getMBeanServerConnection().getAttribute(new ObjectName("java.lang:type=Memory"), "HeapMemoryUsage");
        CompositeData cd = (CompositeData) o;

        Object max = cd.get("max");

This is after we have changed the C++ code in hotspot/src/share/vm/services/management.cpp thus:

// Returns a java/lang/management/MemoryUsage object representing
// the memory usage for the heap or non-heap memory.
JVM_ENTRY(jobject, jmm_GetMemoryUsage(JNIEnv* env, jboolean heap))
        printf("PH: MemoryPool %s has %llu bytes\n", pool->name(), u.max_size());
        printf("PH: so the total is %llu bytes\n", total_max);

We see output from the JVM that looks like:

PH: MemoryPool PS Survivor Space has 65536 bytes
PH: PSGenerationPool::get_memory_usage: size 1431699456: 
PH: MemoryPool PS Old Gen has 14316601344 bytes
PH: so the total is 21473656832 bytes

when we hit it with the above MBean code. The JVM being monitored has the command line switch -Xmx20g.

Now, when this monitored process starts consuming large amounts of memory, our Java code that is monitoring it prints out:

Tue Apr 09 22:59:42 BST 2013: committed = 18403360768 (17550 mb, 17972032kb), max = 19088801792 (18204 mb, 18641408kb)
Tue Apr 09 22:59:43 BST 2013: committed = 20726546432 (19766 mb, 20240768kb), max = 20726546432 (19766 mb, 20240768kb)

(notice how the maximum changes) and the output from the JVM being monitored indicates a change:

PH: PSYoungGen::resize_spaces
PH: MemoryPool PS Survivor Space has 1179648 bytes
PH: PSGenerationPool::get_memory_usage: size 1431699456: 
PH: MemoryPool PS Old Gen has 14316601344 bytes
PH: so the total is 21473722368 bytes

So the maximum is only the maximum until the next one :-)

No comments:

Post a Comment