Configuring Clusters

cassandra.yaml Configuration File

The main configuration file for Cassandra nodes

  • Cassandra nodes read this file on start-up

    • Remember, restart the node for the changes to take effect!

  • Located in the following directories:

    • Cassandra package installations: /etc/cassandra
    • Cassandra tarball installations: <INSTALL_LOCATION>/resources/cassandra/conf

Quick Start

Minimal properties needed for configuring a cluster

  • cluster_name (Default: Test Cluster)
  • listen_address (Default: localhost)
  • rpc_address (Default: localhost)
  • seeds (Default: "127.0.0.1")
  • cluster_name - must be the same for all nodes of the cluster
  • listen_address - IP address used by other nodes in cluster
  • rpc_address - address used by clients (e.g. CQLSH)
  • seeds

    • This is a list of IP addresses of nodes to contact to join the cluster
    • Note that since this is a list, you must enclose the list in quotes
    • Ususally, all nodes in the cluster will have the same seeds list

Example YAML for cluster_name

# Cassandra storage config YAML

# NOTE:
#   See http://wiki.apache.org/cassandra/StorageConfiguration for
#   full explanations of configuration directives
# /NOTE

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'Test Cluster'

Example YAML for listen_address

# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# If you choose to specify the interface by name and the interface has an
# ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6.
# If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults
# to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
listen_address: localhost
# listen_interface: eth0
# listen_interface_prefer_ipv6: false

Do NOT set both the address and the interface - choose one or the other.

Example YAML for rpc_address

# The address or interface to bind the Thrift RPC service and native transport
# server to.
#
# Set rpc_address OR rpc_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
#
# Leaving rpc_address blank has the same effect as on listen_address
# (i.e. it will be based on the configured hostname of the node).
#
# Note that unlike listen_address, you can specify 0.0.0.0, but you must also
# set broadcast_rpc_address to a value other than 0.0.0.0.
#
# For security reasons, you should not expose this port to the internet.
# Firewall it if needed.
#
# If you choose to specify the interface by name and the interface has an
# ipv4 and an ipv6 address
# you can specify which should be chosen using rpc_interface_prefer_ipv6.
# If false the first ipv4
# address will be used. If true the first ipv6 address will be used.
# Defaults to false preferring ipv4. If there is only one address
# it will be selected regardless of ipv4/ipv6.
rpc_address: localhost
# rpc_interface: eth1
# rpc_interface_prefer_ipv6: false

Example YAML for seeds

# any class that implements the SeedProvider interface and has a
# constructor that takes a Map<String, String> of parameters will do.
seed_provider:
    # Addresses of hosts that are deemed contact points.
    # Cassandra nodes use this list of hosts to find each other and learn
    # the topology of the ring.  You must change this if you are running
    # multiple nodes!
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          # seeds is actually a comma-delimited list of addresses.
          # Ex: "<ip1>,<ip2>,<ip3>"
          - seeds: "127.0.0.1"

Commonly Used YAML Settings

Here are some of the commonly used configuration settings

  • endpoint_snitch (Default: SimpleSnitch)
  • initial_token (Default: disabled) num_token (Default: 256)
  • commitlog_directory (Default: /var/lib/cassandra/commitlog)
  • data_file_directories (Default: /var/lib/cassandra/data)
  • hints_directory (Default: $CASSANDRA_HOME/data/hints)
  • saved_caches_directory (Default: /var/lib/cassandra/saved_caches)
  • endpoint_snitch - Necessary to make your cluster topologically aware
  • initial_token/num_token - The default is to run with VNodes
  • commit_log - The location where Cassandra writes commit logs
  • data_file_directory - The location where Cassandra writes the table data
  • hints_directory - The location for storing hints
  • saved_caches_directory - The location for the key and row caches

Notable YAML Settings

  • hinted_handoff_enabled (Default: true)
  • max_hint_window_in_ms (Default: 10800000 milliseconds [3 hours])
  • row_cache_size_in_mb (Default: 0 - disabled)
  • concurrent_reads/concurrent_writes (Default: 32)
  • file_cache_size_in_mb (Default: Smaller of 1/4 heap or 512)
  • memtable_heap_space_in_mb/memtable_offheap_space_in_mb (Default: 1/4 of heap size)
  • hinted_handoff_enabled - Set to true to enable hinted handoffs
  • max_hint_window_in_ms - Defines the maximum amount of time a dead host will have hints generated
  • row_cahce_size_in_mb - Maximum size of the row cache in memory
  • concurrent_reads/concurrent_writes - Number of (threads) reads/writes permitted to occur concurrently
  • file_cache_size_in_mb - Maximum memory to use for pooling sstable buffers
  • memtable_heap_space_in_mb/memtable_offheap_space_in_mb - Total on heap and off allowance for memtables

Exercise 1—​Setting up your environment