Scala Workshop
Functional Data Science
Neville Li
Jul 2014
Quick Sort
def quickSort(a: List[Double]): List[Double] = a match {
case Nil => Nil
case x :: xs =>
val (lt, gt) = xs.partition(_ < x)
quickSort(lt) ++ List(x) ++ quickSort(gt)
}
- Pattern matching →
match
/case
- Powerful collections →
partition
/::
/Nil
- Anonymous functions, a.k.a. λ →
(_ < x)
- Tuple decomposition →
val (lt, gt) = ...
About Scala
The Good Parts
- Functional & object-oriented
- Statically typed + type inference
- Concise, expressive and flexible syntax
- Java ecosystem
- Great for modelling data flow
About Scala
The Not So Good Parts
- Complexity - traits, implicits, advanced types, ...
- Slow compilation and crazy stack trace
- Performance overhead
- Many ways to do one thing
Most can be avoided for data application
We will visit the rest during this workshop
Scala in Big Data
- Spark - high performance in memory computing
- Kafka - messaging and logging, from LinkedIn
- Scalding - Twitter, eBay, Etsy, Airbnb, Square, Stripe...
- Summingbird - unify Scalding + Storm, Twitter
- Scrunch - Scala wrapper for Crunch
- Scoobi - type-safe MapReduce framework, FourSquare
- Algebird - Abstract algebra for Scala, Twitter
- Shark, GraphX - Hive, graph algorithms on Spark
- ScalaNLP - numerical (Breeze) and NLP (Epic) tools
- ScalaLab - Matlab-like scientific computing in Scala
- BIDMach - ML on GPU
About The Workshop
- Python and Java/C++ knowledge helps
- Functional data structure and composition
- Common patterns in Scalding/Scrunch/Spark/etc.
- Even those in Java (through Guava)
- Does not cover web frameworks (Play, Lift, Scalatra, etc.)
- Nor distributed systems (Akka, Finagle, Kestrel, etc.)
- Not an endorsement for production backend
For the Impatient
- Chapters and sections with skill levels
A1-3
for application programmers
L1-3
for library designers
- We will cover mostly
A1
and A2