Growing data has given rise to several open-source projects resulting in world-class frameworks. Apache Spark is one such open-source cluster computing framework, which originated in 2009 at Berkeley. This framework has gained popularity amongst developers and data scientists because of its speed. Let’s know more about this framework and how it compares with MapReduce.
Sharding
As your data grows, it starts throwing up problems of scale. Sometimes only a set of tables grow at a very high rate as compared to others. In such cases, the indexes defined on these tables also start consuming more space. Searching through these tables becomes time-consuming. This is where you can benefit from database sharding.
Hadoop
Big Data – the buzzword making rounds for last few years! Big Data refers to data sets so voluminous and complex that traditional data processing applications become inadequate. The data needs special provisioning for storage, analysis, sharing, and querying etc. This is where Apache Hadoop, Apache Spark come to your rescue.
SAP HANA
SAP HANA is an in-memory RDBMS (Relational Database Management System) developed by SAP SE. Apart from being a database server, SAP HANA also provides quite a few other functionalities including advanced analytics and application server.
Data Science
Data science is also known as data-driven science. It is nothing but a data collected in various forms, either structured or unstructured. Different methods used are machine learning, data mining, analysis of a data, visualization of a data etc.
It is an umbrella that contains many other fields like Machine learning, Data Mining, Big Data, statistics, Data visualization, data analytics etc. It is a “concept to unify statistics, data analysis and their related methods” in order to “understand and analyze actual phenomena” with data.