Folks working with Kubernetes would be able to relate to the word “K8ssandra” very quickly. Something related to Kubernetes (more fondly called K8s). Yes, that is right.
Apache Cassandra is a preferred database for many large-scale applications. Kubernetes has been providing orchestration tooling and infra for several of these applications. Combining these two under a single umbrella helps enterprises to meet their requirements easily.
What is K8ssandra?
It is a cloud-native distribution of Apache Cassandra that runs on Kubernetes. Pronounced as “kate”+”Sandra”, this is an open source project licensed under Apache Software License v2. This project provides a plethora of tools to provide data APIs and automated operations for Cassandra. This includes tools for monitoring, services for site reliability, and backup/restore tools.
Cassandra is a distributed database management system. It is a free and open source project that provides a scalable, highly available, fault-tolerant, column-oriented database to support large amounts of data across many commodity servers.
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available. (Source)
Creating, deploying managing various components of Kubernetes structure such pods, deployments and ConfigMaps could be time intensive and complex. As your architecture grows, this becomes all the more difficult. However, K8ssandra does the heavylifting for you. And as a infrastructure engineer or even a developer, it is always better if you have a reliable tool who does that job for you.
It provides a set of components which are glued together as part of the installation process itself. The following components are packaged and installed:
Reaper for Cassandra
Medusa for backup/restoration
Metrics collector with Prometheus integration and visuals via Grafana
These days a typical deployment consists of at least a couple of servers, behind a load balancer and a database. As your application complexity grows, you continue to add more components. Then you start collecting tons of messages from logs of all these applications. And the problem starts there! It is not easy to keep track of logs from all these components and detect patterns or anomalies which could trigger further action. Logstash helps solve this problem.
Nowadays, having several servers, load-balancers, API Gateways, and storage devices is a common thing. However, keeping the infrastructure up and running is still a complex task. The administrators need to identify potential problems due to the high availability requirement of several businesses. They strive to have the best tools at their disposal to ensure that they can see the trends and patterns about the health of the infrastructure. Prometheus is an open-source tool that is used to record real-time metrics and making it available through a flexible query model.
As your application grows, you can either scale the infrastructure horizontally or vertically. Usually, vertical scaling has a lot of limitations whereas horizontal scaling could potentially have unlimited capacity. In horizontal scaling, one problem that arises quickly is related to common storage, which needs to be accessed by several servers that too efficiently. A typical solution is to have NFS (network file system) configured. If you are on AWS, you could consider taking advantage of EFS i.e. Elastic File System.
Everyone is aware of “cloud” now. The cloud provider sets up huge infrastructure and makes it available to their customers either as bare infrastructure or a managed service. In either case, it is a huge investment for the provider. All the servers, network bandwidth, and other resources are their inventory. As is the case with any business, idle inventory is a problem. And that is where Spot Instance comes into the picture.