Scala Workshop

Functional Data Science

Neville Li
Jul 2014

Scala Why Functional? Why Scala?

Quick Sort

def quickSort(a: List[Double]): List[Double] = a match {
  case Nil      => Nil
  case x :: xs  =>
    val (lt, gt) = xs.partition(_ < x)
    quickSort(lt) ++ List(x) ++ quickSort(gt)
}
  • Pattern matching → match/case
  • Powerful collections → partition/::/Nil
  • Anonymous functions, a.k.a. λ → (_ < x)
  • Tuple decomposition → val (lt, gt) = ...

About Scala

The Good Parts

  • Functional & object-oriented
  • Statically typed + type inference
  • Concise, expressive and flexible syntax
  • Java ecosystem
  • Great for modelling data flow

About Scala

The Not So Good Parts

  • Complexity - traits, implicits, advanced types, ...
  • Slow compilation and crazy stack trace
  • Performance overhead
  • Many ways to do one thing
Most can be avoided for data application
We will visit the rest during this workshop

Scala in Big Data

  • Spark - high performance in memory computing
  • Kafka - messaging and logging, from LinkedIn
  • Scalding - Twitter, eBay, Etsy, Airbnb, Square, Stripe...
  • Summingbird - unify Scalding + Storm, Twitter
  • Scrunch - Scala wrapper for Crunch
  • Scoobi - type-safe MapReduce framework, FourSquare
  • Algebird - Abstract algebra for Scala, Twitter
  • Shark, GraphX - Hive, graph algorithms on Spark
  • ScalaNLP - numerical (Breeze) and NLP (Epic) tools
  • ScalaLab - Matlab-like scientific computing in Scala
  • BIDMach - ML on GPU

About The Workshop

  • Python and Java/C++ knowledge helps
  • Functional data structure and composition
  • Common patterns in Scalding/Scrunch/Spark/etc.
  • Even those in Java (through Guava)
  • Does not cover web frameworks (Play, Lift, Scalatra, etc.)
  • Nor distributed systems (Akka, Finagle, Kestrel, etc.)
  • Not an endorsement for production backend

Resources

For the Impatient

Scala for the Impatient
  • Chapters and sections with skill levels
  • A1-3 for application programmers
  • L1-3 for library designers
  • We will cover mostly A1 and A2

Outline

  1. Getting started
  2. Functional patterns
  3. Advanced topics
  4. Case studies - Scalding and Spark