Scala Workshop

Functional Data Science

Neville Li
Jul 2014

Quick Sort

def quickSort(a: List[Double]): List[Double] = a match {
  case Nil      => Nil
  case x :: xs  =>
    val (lt, gt) = xs.partition(_ < x)
    quickSort(lt) ++ List(x) ++ quickSort(gt)
}

Pattern matching → match/case
Powerful collections → partition/::/Nil
Anonymous functions, a.k.a. λ → (_ < x)
Tuple decomposition → val (lt, gt) = ...

About Scala

The Good Parts

Functional & object-oriented
Statically typed + type inference
Concise, expressive and flexible syntax
Java ecosystem
Great for modelling data flow

About Scala

The Not So Good Parts

Complexity - traits, implicits, advanced types, ...
Slow compilation and crazy stack trace
Performance overhead
Many ways to do one thing

Most can be avoided for data application
We will visit the rest during this workshop

Scala in Big Data

Spark - high performance in memory computing
Kafka - messaging and logging, from LinkedIn
Scalding - Twitter, eBay, Etsy, Airbnb, Square, Stripe...
Summingbird - unify Scalding + Storm, Twitter
Scrunch - Scala wrapper for Crunch
Scoobi - type-safe MapReduce framework, FourSquare
Algebird - Abstract algebra for Scala, Twitter
Shark, GraphX - Hive, graph algorithms on Spark
ScalaNLP - numerical (Breeze) and NLP (Epic) tools
ScalaLab - Matlab-like scientific computing in Scala
BIDMach - ML on GPU

About The Workshop

Python and Java/C++ knowledge helps
Functional data structure and composition
Common patterns in Scalding/Scrunch/Spark/etc.
Even those in Java (through Guava)
Does not cover web frameworks (Play, Lift, Scalatra, etc.)
Nor distributed systems (Akka, Finagle, Kestrel, etc.)
Not an endorsement for production backend

Resources

Scala Tutorial: Getting Started with Scala - beginner tutorial from udemy
Scala for the Impatient - recommended for beginners
Scala School - Twitter's online course
Functional Programming Principles in Scala
- taught by Martin Odersky, slightly academic
Scala in Depth - advanced topics

For the Impatient

Chapters and sections with skill levels
A1-3 for application programmers
L1-3 for library designers
We will cover mostly A1 and A2

Scala Workshop

Functional Data Science

Quick Sort

About Scala

The Good Parts

About Scala

The Not So Good Parts

Scala in Big Data

About The Workshop

Resources

For the Impatient

Outline