2017-11-08 3 Apache Spark APIs RDDS Dataframes Datasets
Loyola University Water Tower Campus (Chicago/Michigan Area)
111 E. Pearson Street, Chicago IL 60611
Beane Ballroom (13th Floor, Lewis Towers) Campus map
Admission: Free, General Admission, open to the public
Apache Spark is an open-source cluster-computing framework that provides programming interfaces for large-scale data processing with parallelism and fault-tolerance.
Of all the developers’ delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and are both intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three key takeaways:
Why and when one should use each set as best practices
Outline Apache Spark's performance and optimization benefits
Underscore scenarios of when to use DataFrames and Datasets instead of RDDs for your big data distributed processing.
Through simple notebook demonstrations with API code examples, you will learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. This will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs.
Jules S. Damji is an Apache Spark Community Evangelist and Developer Advocate at Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively.
While there will be light refreshments available, feel free to "brown bag" it and bring in food from the outside to eat during the social hour.
Reservations:
Click here to Reserve for Wednesday, November 8
or send an e-mail to greg@neumarke.net
Proposed Future
Meeting Dates:
12/13/2017
Subscribe to the Chicago Chapter ACM e-mail list. (Look for an e-mail after pressing the button)