-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Learn to use the Apache SystemML API while using Apache Spark Shell to complete computations in Scala.
Data Analytics
Apache SystemML is an open source machine learning tool that can run locally or in conjunction with big data tools such as Apache Spark. This journey highlights the positive user experience of the Apache SystemML API when working with the Spark Shell. This journey also runs through some foundational steps in a data scientist’s workflow such as parallelizing information, reading in matrices as RDDs, getting sums of messages and getting your data into Apache Spark. This journey is built for beginners who are not as familiar with Apache Spark or Jupyter Notebook and who are brand new to Apache SystemML. This journey will demonstrate the ease of use of the API and the efficiency it gives when working on your data science pipeline!
by Madison J. Myers
https://github.com/MadisonJMyers/Using-the-Apache-SystemML-API-on-a-Spark-Shell-
N/A
N/A
Apache SystemML and Apache Spark are invaluable big data tools, but are sometimes confusing to use and take a long time to get used to, especially when there are few beginner tutorials out there. In this journey I will demonstrate how to set up your environment for Apache SystemML API and use it on the Spark Shell locally on your computer. This flow is ideal for data exploration and initial testing in your data science project. After getting the API set up, I will then quickly show you how to do some basic scala steps to help you get up and running on your project!
When you have completed this journey, you will understand how to:
Set up your environment for SystemML and its API.
Download data into the Spark Shell.
Parallelize the information, read in two matrices as RDDs and get the output.
Execute your script.
The user sets up their environment following the highlighted steps.
The user opens a new Jupyter Notebook with Spark and SystemML using the code provided.
The user downloads a data and loads it into a data frame.
The user starts a SystemML context.
The user defines a kernel for poisson nonnegative matrix factorization (PNMF) in DML.
Get familiar with Apache SystemML and its API while also using Apache Spark Shell.
Apache SystemML API: an API to access the machine learning platform, optimal for big data.
Apache Spark: a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
https://spark.apache.org/
https://www.ibm.com/analytics/us/en/technology/spark/
http://researcher.watson.ibm.com/researcher/view_group.php?id=3174
http://systemml.apache.org/
SystemML on Spark Shell? Yes!
A very simple way of using SystemML for all of your machine learning and big data needs. This tutorial will get you set up and running SystemML on the Spark Shell like a star. But first, to refresh your memory, let me remind you that I am on a quest to create a life-changing app! I am new to the world of data science and am currently tackling the challenge of building an app using Apache SystemML and Apache Spark one step at a time. If you haven't already, make sure to check out my previous tutorials, which start here.
So far we've daydreamed about delightful data, complained about how hard it is to find good data, found good data, learned how to write Scala and NOW we will learn how to access SystemML from the Spark Shell.
Not familiar with the Spark shell? Here's a great tutorial. Not sure what SystemML is? Look here!
At a high-level, SystemML is what is used for the machine learning and mathematical part of your data science project. You can log into Spark Shell, load SystemML on the shell, load your data and write your linear algebra, statistical equations, matrices, etc. in code much shorter than it would be in the Spark shell syntax. It helps not only with mathematical exploration and machine learning algorithms, but it allows you to be on Spark where you can do all of the above with really big data that you couldn't use on your local computer. Focusing on this step of your project, let's walk through how to set your computer up for all of SystemML's assumptions, how to load Spark Shell, load SystemML, load data and do a few examples in scala. (I promise a PySpark tutorial will come in the future!)