Apache Spark

Growing data has given rise to several open-source projects resulting in world-class frameworks. Apache Spark is one such open-source cluster computing framework, which originated in 2009 at Berkeley. This framework has gained popularity amongst developers and data scientists because of its speed. Let’s know more about this framework and how it compares with MapReduce.

Continue reading “Apache Spark”

Sharding

As your data grows, it starts throwing up problems of scale. Sometimes only a set of tables grow at a very high rate as compared to others. In such cases, the indexes defined on these tables also start consuming more space. Searching through these tables becomes time-consuming. This is where you can benefit from database sharding.

Continue reading “Sharding”

Hadoop

Big Data – the buzzword making rounds for last few years! Big Data refers to data sets so voluminous and complex that traditional data processing applications become inadequate. The data needs special provisioning for storage, analysis, sharing, and querying etc. This is where Apache Hadoop, Apache Spark come to your rescue.

Continue reading “Hadoop”

SAP HANA

SAP HANA is an in-memory RDBMS (Relational Database Management System) developed by SAP SE. Apart from being a database server, SAP HANA also provides quite a few other functionalities including advanced analytics and application server.

Continue reading “SAP HANA”

Data Science

Data science is also known as data-driven science. It is nothing but a data collected in various forms, either structured or unstructured. Different methods used are machine learning, data mining, analysis of a data, visualization of a data etc.

It is an umbrella that contains many other fields like Machine learning, Data Mining, Big Data, statistics, Data visualization, data analytics etc. It is a “concept to unify statistics, data analysis and their related methods” in order to “understand and analyze actual phenomena” with data.

Continue reading “Data Science”