Opscenter

OpsCenter

DSE Management Services

opscenter\opscenter2

  • Performance Service
  • Backup & Restore Service
  • Repair Service
  • Best Practices Service

Lets talk about 4 specific management services that exists within Opsc.. Available in DSE Perf- Troubleshoot faster even for novice new DBAs Backup and Restore- Enterprise requirement. Repairs- ensures cluster performs smoothly. BPS- Rules based on our Subject matter expertise……helps you know what you MUST fix.

Key Capabilities

opscenter\opscenter3

Visual Monitoring and Management

  • Control automatic management services including transparent repair
  • Manage and schedule backup and restore operations
  • Perform capacity planning with historical trend analysis and forecasting capabilities
  • Proactively manage all clusters with threshold and timing-based alerts
  • Visually create new clusters with a few mouse clicks either on premise or in the cloud
  • Built-in Automatic Failover

Architecture

opscenter\opscenter arch

  • web application deployed on premise; load from anywhere
  • agent runs on each node & agents connect to central opscenter process
  • agent communicates with local node
  • backup instance of opscenter
  • built-in or LDAP authentication supported
  • encryption supported on all communication channels
  • manage multiple clusters

Performance Service - Overview

  • Collects key performance metrics to quickly troubleshoot a database cluster’s performance
  • Analyzes Query, Table & Cluster specific metrics
  • Correlates different metrics and provides custom recommendations to address specific issues
  • Eliminates the need for custom scripting and scheduling to detect problem nodes

Historical metrics, DSE perf objects, Recommendation and Alerts.

Performance Service - Slow Queries

  • Identify Slow Queries
  • Custom recommendations
  • Display contextual alerts
  • Visual Query tracing

opscenter\slow queries

Performance Service - Table Stats

opscenter\tables stats

  • Diagnose Table performance
  • Custom recommendations
  • Cluster level & node drill down

Performance Service - ThreadPool Stats

  • Investigate Thread Pools metrics
  • Historical Tracking
  • Eliminates custom scripts

opscenter\threadpool stats

Backup Service - Visual Backup Management

  • Automatic replication doesn’t preclude backups
  • Backup & Restore on distributed systems is hard
  • Enterprise-class visual backup service guards against data loss on managed database clusters.
  • Removes need for building custom scripts and tools.
  • First thought that comes to your mind- Why should I take backups/ Doesn’t Replication take of it? Replication does not mean that you should not take backups.
  • Take backups data corruption , data loss ( replication of disaster recovery) ( Sstables are immutable- snapshots takes hard links to sstables)
  • OpsCenter Backup services supports a wide range of functionality from adhoc snapshot backups, scheduled backups, Commitlog backups for PIT etc.,
  • Mike will explain you all the capabilities that we provide later, but you have to understand that if you don’t script correctly, you are going to be in a spot of bother.

Backup Service - Scheduled Backups

  • User-configurable
  • Sync across cluster
  • Automatic cleanup
  • Activity Reporting
  • User configurable recurring schedules

    • Multiple schedules

  • On Server backups use c* snapshot, which uses hardlinks

    • Fast, doesn’t use disk space initially

  • Automatic cleanup via retention policy
  • Detailed activity reporting

    • Alerts on ERROR
    • Details all the way down to sstable level

Backup Service - Remote Backups

opscenter\cloud cluster

  • Store backups in AWS S3
  • Optimized storage
  • Automatic Cleanup
  • Future Support
  • Automatically send backups to S3
  • Upload directly from nodes
  • Never upload the same sstable twice

    • Immutable files

  • Intelligent automatic cleanup

    • Handles optimized storage

  • Future

    • NFS, Generic S3 API, Other cloud providers

Backup Service - Restore

opscenter\cloud cluster3

  • Coordinated Restore
  • Different Topologies
  • Cloning
  • PIT Restore
  • restore

    • fix data with truncate
    • restore missing data-
    • handle c* internal sstable formats

  • different topologies

    • move/remove/add node

  • clone
  • PIT Restore - commitlog archiving & second granularity

Visual Repair Service

  • Automatically maintains data consistency across a cluster without impacting performance
  • Ensures that your cluster operates efficiently by optimally running repairs

The term “Repair” is overloaded. Think of Repairs as synchronizing replicas. Synchronization is a hard problem to solve. Repair is how C* solves this.

There are couple of ways automatically/dynamically perform repairs e.g read repairs. * where one needs to manually perform repairs.

OpsCenter Repair mechanism simplifies this for you.

node’s data can become inconsistent. node goes down need to bring it up. > max_hint_window. use repair to resynchronize it repairs need to be run before gc_grace_seconds to ensure that deleted data is not resurrected.

Repair Service - Under the Hood

opscenter\repair cluster

  • Repair within X days
  • Continuous Repairs
  • One repair per replica set
  • Parallel repairs
  • vnodes
  • repair time window - related to gc_grace
  • Repairing an entire node is expensive.
  • describe algorithm
  • repair service continuously runs
  • two goals:

    • repair data, don’t affect performance
    • only 1 repair per replica set

  • vnodes
  • Incremental Repairs- mention.

Best Practice Service

opscenter\best practices

  • Benefits

    • Periodically scans database clusters and automatically detects issues that threaten a cluster’s security, availability, and performance
    • Utilizes a set of expert rules that span a variety of different categories and summarizes the results

  • Under the Hood

    • OpsCenter Server orchestrates the Best Practice service to ensure relevant best practice alerts are displayed
    • OpsCenter Server either directly interacts with Cassandra or through OpsCenter Agents to get results
    • OpsCenter Server polls, aggregates and sends the results to OpsCenter UI for visualization/end user consumption

Conclusion - OpsCenter Management Services

  • Obviate error-prone, manual-based maintenance processes and help automate the protection of critical data
  • Eliminate the need for custom tooling and development
  • Help novice DBAs and operations personnel manage DSE Cassandra like seasoned professionals

Exercise --Opscenter