Folks working with Kubernetes would be able to relate to the word “K8ssandra” very quickly. Something related to Kubernetes (more fondly called K8s). Yes, that is right.
Apache Cassandra is a preferred database for many large-scale applications. Kubernetes has been providing orchestration tooling and infra for several of these applications. Combining these two under a single umbrella helps enterprises to meet their requirements easily.
What is K8ssandra?
It is a cloud-native distribution of Apache Cassandra that runs on Kubernetes. Pronounced as “kate”+”Sandra”, this is an open source project licensed under Apache Software License v2. This project provides a plethora of tools to provide data APIs and automated operations for Cassandra. This includes tools for monitoring, services for site reliability, and backup/restore tools.
Cassandra is a distributed database management system. It is a free and open source project that provides a scalable, highly available, fault-tolerant, column-oriented database to support large amounts of data across many commodity servers.
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available. (Source)
Creating, deploying managing various components of Kubernetes structure such pods, deployments and ConfigMaps could be time intensive and complex. As your architecture grows, this becomes all the more difficult. However, K8ssandra does the heavylifting for you. And as a infrastructure engineer or even a developer, it is always better if you have a reliable tool who does that job for you.
It provides a set of components which are glued together as part of the installation process itself. The following components are packaged and installed:
Reaper for Cassandra
Medusa for backup/restoration
Metrics collector with Prometheus integration and visuals via Grafana
Last decade has shown exponential growth in the data that is getting collected across various domains and sectors. All this data needs processing. Some applications are transaction-heavy, while others need a lot of analysis and computation and need to use OLAP technology.
As your data grows, it starts throwing up problems of scale. Sometimes only a set of tables grow at a very high rate as compared to others. In such cases, the indexes defined on these tables also start consuming more space. Searching through these tables becomes time-consuming. This is where you can benefit from database sharding.
SAP HANA is an in-memory RDBMS (Relational Database Management System) developed by SAP SE. Apart from being a database server, SAP HANA also provides quite a few other functionalities including advanced analytics and application server.
Graph Database is a database that uses graph structures. Ok, let me explain it more. A graph is made up of nodes, edges, and properties, which represent data. Nodes represent entities such as persons or businesses or any other object to be tracked. An edge is a relation between two nodes. Each node can have more than one relations. Property is a relevant information about the node. A database that makes use of such structures, is referred to as Graph Database.
So, what are the advantages of Graph Database?
It stores the data about nodes using edges and properties along with the record itself. This allows applications to retrieve the data much faster as compared to the relational database. It reduces the complexity of traditional “join” statements required in the relational database as data is already linked using edges and properties. Thus, it also improves the performance of the overall database and application as well.
You can find a good comparison of Graph Database and Relational Database here.
The underlying implementation of Graph Databases may vary. Some may use the relational engine and store the “graph” data in a separate table. Others may use Key-value store (like NoSQL) or document database for storage. As a result, to reap the benefits of the new structure a separate query language is required and one can’t use standard SQL for that. Some of the available query languages are Gremlin, SPARQL, and Cypher. Note that GraphQL is notthe query language that is used for Graph Databases.
Graph Database is good for highly connected data such as social networks, or recommendations in e-commerce. E.g. A user of social networking site – represented by Node – can have a membership with various groups and can have several friends. Each one of those friends, in turn, would have similar connections – relationships – represented by Edges. There would be attributes like birthdates, college etc – represented by Properties.