JVM Tuning Options

JVM Settings

The Cassandra Technology Stack

  • Cassandra is a program written in Java
  • Java programs execute on a Java Virtual Machine (JVM)
  • The JVM runs on top of the computer’s operating system
  • The operating system is the software interface to the computer’s hardware

Performance Tuning

  • We can adjust parameters at each level of Cassandra’s technology stack

    • Cassandra settings (in cassandra.yaml)
    • JVM settings (in cassandra-env.sh and jvm.options)
    • Operating system settings (kernel configuration)
    • Hardware (adding or upgrading)

JVM Settings - Memory Management

  • The most significant settings in the JVM deal with memory management
  • JVM memory has several designated areas, including code, stack and heap
  • The heap is where Java programs allocate and deallocate transient memory
  • Garbage collection refers to when the JVM reclaims the deallocated memory in the heap

cassandra-env.sh

  • We usually configure JVM settings when we launch Java programs

    • As command line options

  • cassandra-env.sh is a shell script that launches the Cassandra server
  • Therefore we can modify cassandra-env.sh to adjust the settings
  • cassandra-env.sh includes jvm.options which also has settings

Warning: Only adjust these settings when you understand why you are doing it. The default settings are there for a reason.

Heap Sizing Options

  • MAX_HEAP_SIZE set to maximum of 8GB

    • Large heaps can introduce GC pauses that lead to latency
    • different workloads can justify different settings

  • HEAP_NEWSIZE set to 100MB per core

    • Example: 8 cores would mean 800MB
    • The larger this is, the longer GC pause times will be. The shorter it is, the more frequently GC will run
    • different workloads may react differently

The JVM heap is divided into 2 areas: young generation and older generation.

Initially, the JVM allocates memory in the younger section.

If the memory is not soon deallocated, it migrates to the older section.

MAX_HEAP_SIZE is the size of the total heap.

HEAP_NEWSIZE is the portion of the heap allocated to the younger section.

Tuning the Java heap

jvm settings\heap options

Don’t turn up Java heap size too high!

  • Capability of Java to gracefully handle garbage collection above 8GB quickly diminishes.
  • May interfere with operating system’s ability to maintain OS page cache for frequently accessed data

Tuning Java Garbage Collection

  • Default collector is the Concurrent-Mark-Sweep (CMS) garbage collector.
  • G1 garbage collector is better than CMS for large heaps.

    • Works on regions of heap with most garbage first.
    • Compacts the heap on-the-go.

To configure Cassandra to use G1:

  • Open $CASSANDRA_HOME/conf/jvm.options
  • Comment out all lines in the ### CMS Settings section.
  • Uncomment the relevant G1 settings in the ### G1 Settings section.
### CMS Settings (comment out the G1 section and uncomment section below to enable)

#-XX:+UseParNewGC
#-XX:+UseConcMarkSweepGC
#-XX:+CMSParallelRemarkEnabled
#-XX:SurvivorRatio=8
#-XX:MaxTenuringThreshold=1
#-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
#-XX:CMSWaitDuration=10000
#-XX:+CMSParallelInitialMarkEnabled
#-XX:+CMSEdenChunksRecordAlways
## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
#-XX:+CMSClassUnloadingEnabled

### G1 Settings

# Use the Hotspot garbage-first collector.
-XX:+UseG1GC

JMX Options

  • JMX is a Java technology that supplies tools for managing and monitoring Java applications and services.
  • You can modify the following properties in the cassandra-env.sh file to configure JMX to listen on port 7199 without authentication.

JMX Options

  • com.sun.management.jmxremote.port

    • The port on which Cassandra listens from JMX connections.

  • com.sun.management.jmxremote.ssl

    • Enable/disable SSL for JMX.

  • com.sun.management.jmxremote.authenticate

    • Enable/disable remote authentication for JMX.

  • -Djava.rmi.server.hostname

    • Sets the interface hostname or IP that JMX should use to connect. Uncomment and set if you are having trouble connecting.

Garbage Collection

  • Garbage collection is a process that a JVM is undergoing all the time to clean out any processes that are not live

    • Objects are moved and deleted to free up memory
    • GC should happen often enough to create large sections of free memory, but not so often that the CPU is churning on GC all the time

  • Since Cassandra runs in a JVM, the pauses to do garbage collection affect Cassandra’s performance

    • Sizing the JVM is important to performance
    • The number of CPUs can also affect performance

What to consider when tuning garbage collection

  • Pause time

    • length of time the collector stops the application while it frees up memory

  • Throughput

    • Determined by how often the garbage collection runs, and pauses the application
    • More often the collector runs, the lower the throughput

  • We want to minimize length of pauses as well as frequency of collection

When tuning your garbage collection configuration, the main things you need to worry about are pause time and throughput.

Pause time is the length of time the collector stops the application while it frees up memory.

Throughput is determined by how often the garbage collection runs, and pauses the application. The more often the collector runs, the lower the throughput.

When tuning for an OLTP database like Cassandra, the goal is to maximize the number of requests that can be serviced, and minimize the time it takes to serve them.

To do that, we need to minimize the length of the collection pauses, as well as the frequency of collection.

JVM available memory

garbage collection\jvm mem

  • Permanent generation
  • New generation (ParNew)
  • Old generation (CMS)

The garbage collector Cassandra ships with, the JVM’s available memory is divided into 3 sections.

The new generation , which is collected by the Parallel New or ParNew collector

Old generation which is collected by the Concurrent Mark and Sweep or CMS

And finally the permanent generation.

The New Generation

garbage collection\new gen

  • Eden
  • 2 survivor spaces

The new generation is divided into 2 sections: eden, which takes up the bulk of the new generation, and 2 survivor spaces.

Eden is where new objects are allocated, and objects that survive collection of eden are moved into the survivor spaces.

There are 2 survivor spaces, but only one is occupied with objects at a time, the other is empty.

The New Generation

  • Once eden fills up with new objects, a minor GC is triggered
  • A minor GC stops execution, iterates over the objects in Eden, copies any objects that are not (yet) garbage to the active survivor space, and clears eden
  • If the minor GC has filled up the active survivor space, it performs the same process on the survivor space
  • Objects that are still active are moved to the other survivor space, and the old survivor space is cleared

The New Generation

ParNew collection of the new generation:

  • It’s a stop-the-world algorithm
  • Finding and removing garbage is fast, moving active objects from eden to the survivor spaces, or from the survivor spaces to the old gen, is slow

The two most important things to keep in mind when we’re talking about ParNew collection of the new gen are:

It’s a stop the world algorithm, which means that every time it’s run, the application is paused, the collector runs, then the application resumes.

Finding and removing garbage is fast, moving active objects from eden to the survivor spaces, or from the survivor spaces to the old gen, is slow. If you have long ParNew pauses, it means that a lot of the objects in eden are not (yet) garbage, and they’re being copied around to the survivor space, or into the old gen.

The Old Generation

garbage collection\old gen

  • Contains objects that have survived long enough to not be collected by a minor GC
  • the CMS collector runs when 75% full

The old generation contains objects that have survived long enough to not be collected by a minor GC.

When a pre-determined percentage of the old generation is full (75% by default in cassandra), the CMS collector is run.

Under most circumstances, it runs while the application is running, although there are 2 stop the world pauses when it identifies garbage,

Full GC

  • Multi-second GC pauses = Major collections happening
  • If the old gen fills up before the CMS collector can finish, the application is paused while a full gc runs
  • Full GC checks everything: new gen, old gen, and perm gen,
  • Significant (multi-second) pauses

However, if the old gen fills up before the CMS collector can finish, it’s a different story.

he application is paused while a full gc is run.

A full GC checks everything: new gen, old gen, and perm gen, and can result in significant (multi-second) pauses.

If you’re seeing multi-second GC pauses, you’re likely seeing major collections happening.

If you’re seeing these, you need to fix your gc settings.

Heap Dump

  • Useful when troubleshooting high memory utilization or OutOfMemoryErrors
  • Show exactly which objects are consuming most of the heap
  • Cassandra starts Java with the option -XX:+HeapDumpOnOutOfMemoryError
  • Perhaps you are getting a message similar to the following:
WARN [ScheduledTasks:1] 2012-10-22 12:14:49,889 GCInspector.java (line 145) Heap is 0.9941227313009479 full. You may
need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory.
Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
  • By default, Cassandra puts the file in a subdirectory of the working, root directory when running as a service
  • BUT!

    • If Cassandra does not have write permission to the root directory, the heap dump fails
    • If the root directory is too small to accommodate the heap dump, the server crashes

By default, Cassandra puts the file in a subdirectory of the working, root directory when running as a service. If Cassandra does not have write permission to the root directory, the heap dump fails. If the root directory is too small to accommodate the heap dump, the server crashes. To ensure that a heap dump succeeds and to prevent crashes, we must configure a heap dump directory that is: Accessible to Cassandra for writing and large enough to accommodate a heap dump

Configuring Heap Dump directory

  • Open the cassandra-env.sh in a text editor
  • scroll to:
set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
  • set the path:
set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR CASSANDRA_HEAPDUMP_DIR=<path>

Manual Heap Dumps using jmap

sudo -u <user> jmap -dump:file=heapdump.hprof,format=b <pid>

Analyzing Heap Dumps

  • Eclipse Memory Analyzer Tool (MAT)

Profiling

How do we discover issues with the JVM?

  • nodetool tpstats
  • OpsCenter
  • JMX Clients

nodetool tpstats

  • Staged Event Driven Architecture (SEDA)
  • Tasks are separated into stages that are connected by a messaging service
  • Some stages skip the messaging service and queue tasks immediately on a different stage if it exists on the same node

nodetool tpstats

profiling\tpstats2

OpsCenter

A number of graphs

  • JVM Collection Count for both ParNew and CMS
  • JVM Collection Time for both ParNew and CMS
  • Heap Max, Heap Used, Heap Committed
  • Number of pending tasks --drill down for more information

JMX Clients

profiling\jconsole

  • jconsole
  • visualvm

Exercise 7—​Tune the Cluster and Measure Performance