Overview
Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. It was originally developed at AmpLab at UC Berkeley back in 2009 and it is fully open-sourced under the Apache software foundation.
This blog points you to some good articles that make you well-versed on Apache Spark.
Pre-requisites
-
- SQL and data background
- With a Data background you will already be knowing how to do transformation, joins, etc. It is just you need to perform the same in a distributed way using the Spark APIs on your choice of language
-
- Any programming knowledge
- Even though you can write SQL queries to perform data transformation, this is not the approach followed mostly in the industry. Familiarity with a programming language such as Scala, Python, Java, or R will give you an edge. If you don’t have any programming background, I would recommend learning python
-
- Spark Environment
- You need a spark cluster to perform spark operations. There are various options to accomplish this such as
- Installing spark on your local machine
- Using any managed services from a cloud provider ( eg: Amazon EMR)
- Try Databricks community edition and so on …
I would recommend using Databricks Community Edition which is free of charge and you will get all the additional benefits of using the cloud and databricks unified platform.
Video resources
Watch the below playlist in the order
Do handson
Perform the exercises discussed in the video sessions in your spark environment
Books
Every spark developer should have any of the below books. You will love reading the books if you are doing it post watching the videos.
- Spark – The Definitive Guide: Big data processing made simple
- Learning Spark - Lightning-fast Data Analytics
More Handson…
Try out all the code samples given in the books
Familiarize yourself with Spark Documentation
When you use a specific spark transformation/action, try to understand more details of the same by using the Spark documentation.
And that’s a wrap, yo! You survived the first waves of learning Apache Spark…
More to follow and Good luck!!
Related Posts
- Unlocking the power of cost-effective development for your hobby projects: Raspberry Pi 4
- Hot skills in 2023 that will help you land your dream data engineering job
- How I passed the Databricks Certified Associate Developer for Apache Spark 3.0
- Beginners guide to Apache Spark, a lightning-fast unified analytics engine
If you found this blog post helpful or informative, please consider sharing it with your friends and followers on social media.Thank you for your support!