Install Scala & SBT on Mac OS X
brew install scala sbt
A basic Scala console
neville@foobar:~$ scala Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55). Type in expressions to have them evaluated. Type :help for more information. scala>
Scala as a scripting language
#!/usr/bin/env scala
// hello.scala
object World {
def hello = "Hello"
}
println(World.hello)
neville@foobar:~$ scala hello.scala
Hello
neville@foobar:~$ chmod +x hello.scala
neville@foobar:~$ ./hello.scala # look mum, no compiling!
Hello
Global repository setting
~/.sbt/global.sbt
(for SBT < 0.13)~/.sbt/0.13/global.sbt
(for SBT >= 0.13)
resolvers ++= Seq(
"Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository"
)
Override JVM memory flags
~/.sbtconfig
SBT_OPTS="-XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=1024M"
~/.sbt/plugins/build.sbt
(for SBT < 0.13)~/.sbt/0.13/plugins/build.sbt
(for SBT >= 0.13)
addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.5.2")
hello/project/build.properties
sbt.version=0.13.5
hello/build.sbt
organization := "com.spotify"
name := "hello"
version := "0.1.0" // com.spotify:hello_2.10-0.1.0.jar
scalaVersion := "2.10.4" // per project Scala version
libraryDependencies ++= Seq(
// Scala library, SBT will append language version
"org.scalanlp" %% "breeze" % "0.8.1", // breeze_2.10-0.8.1.jar
// Java library
"org.jblas" % "jblas" % "1.2.3", // jblas-1.2.3.jar
// Scala lbrary, scope = test
"org.specs2" %% "specs2" % "2.3.13" % "test" // specs2_2.10-2.3.13.jar
)
hello/src/main/scala/com/spotify/hello/Hello.scala
package com.spotify.hello
import breeze.linalg.DenseVector // Scala library
import org.jblas.DoubleMatrix // Java library
object Hello {
def main(args: Array[String]) { // public static void main
println("Hello " + args.mkString(" "))
}
// DenseVector.dot(DenseVector) -> infix operator
def d1(): Double = DenseVector(0.1, 0.2) dot DenseVector(1.0, 2.0)
def vec(a: Double*) = new DoubleMatrix(a.toArray)
def d2(): Double = vec(0.1, 0.2) dot vec(1.0, 2.0) // infix operator
}
Scala allows multiple classes in the same file
hello/src/test/scala/com/spotify/hello/HelloTest.scala
package com.spotify.hello
import org.specs2.mutable._ // Specs2: Software specification for Scala
class HelloTest extends Specification {
"Hello.s1()" should {
"be 0.5" in {
Hello.s1() must_==(0.5)
}
}
"Hello.s2()" should {
"be 0.5" in {
Hello.s2() must_==(0.5)
}
}
}
Also see ScalaTest, both have IntelliJ IDEA support
Run Forrest Run!
sbt compile
sbt run # run default main class without argument
sbt "run-main com.spotify.hello.Hello ARG ..."
Other useful SBT tasks
sbt clean
sbt test # run unit tests
sbt console # Scala console with project dependencies loaded
sbt package # generate JAR file
sbt assembly # JAR with bundled deps, requires sbt-assembly plugin
Continuous mode (for compile/test)
Executes task when source changes detectedneville@foobar hello $ sbt [info] Loading global plugins from /Users/neville/.sbt/0.13/plugins [info] Set current project to hello (in build file:/Users/neville/hello/) > ~ compile [success] Total time: 1 s, completed Jan 22, 2014 4:13:05 AM 1. Waiting for source changes... (press enter to interrupt)
IntelliJ IDEA >= 13 → "Open Project" directly
< 13 → sbt-idea
plugin
neville@foobar hello $ sbt gen-idea [info] Loading global plugins from /Users/neville/.sbt/0.13/plugins [info] Set current project to hello (in build file:/Users/neville/hello/) [info] Creating IDEA module for project 'hello' ... [info] Created /Users/neville/hello/.idea/IdeaProject.iml [info] Created /Users/neville/hello/.idea [info] Created /Users/neville/hello/.idea_modules/hello.iml [info] Created /Users/neville/hello/.idea_modules/hello-build.iml
IntelliJ is your best friend
Java counterparts
Int
, Double
, Boolean
, etc.int
, double
, boolean
, etc.String
, Array
same class as in Javascala> val i: Int = 1
i: Int = 1
scala> val s: String = "Hello"
s: String = Hello
scala> val b: Boolean = true
b: Boolean = true
Collections
scala> val a1: Array[Double] = Array[Double](1.0, 2.0, 3.0)
a1: Array[Double] = Array(1.0, 2.0, 3.0)
scala> val a2 = Array(1.0, 2.0, 3.0) // type inference
a2: Array[Double] = Array(1.0, 2.0, 3.0)
scala> a1(0)
res1: Double = 1.0
scala> val a2 = Array(1.5, 2, "Hello") // more type inference
a2: Array[Any] = Array(1.5, 2, Hello) // Any == Object in Java
Immutability
val
is immutable (cannot be reassigned), var
is notTuples
Tuple1[T1]
, Tuple2[T1, T2]
, Tuple3[T1, T2, T3]
, ...
scala> val a = (1, 2.5, "three")
a: (Int, Double, String) = (1,2.5,three)
scala> a._1
res0: Int = 1
scala> a._2
res1: Double = 2.5
scala> a._3
res2: String = three
Also see Crunch Tuple, e.g.
Pair<K,V>
, Tuple3<V1,V2,V3>
, Tuple4<V1,V2,V3,V4>
, TupleN
Tuple decomposition
Think like lightweight, type safe struct
scala> val t = (1, "a", (Array(1.0, 2.0), true))
t: (Int, String, (Array[Double], Boolean)) = (1,a,(Array(1.0, 2.0),true))
scala> val (i, s, (a, b)) = t
i: Int = 1
s: String = three
a: Array[Double] = Array(1.0, 2.0)
b: Boolean = true
Some
or None
Both sub-classes of Option[T]
val nothing: Option[String] = Option(null) // None
val nil = None
val some = Some("data") // Option[String]
val callMe: Option[String] = nil
callMe.isDefined
callMe.getOrElse("bye")
val maybe: Option[String] = some
maybe.isDefined
maybe.getOrElse("what?")
Have methods that behave like other Scala collections e.g. map, filter, foreach
Also see Guava Optional (JavaDoc)
Type gymnastics
import scala.collection.SortedSet
val a = Set(1, 2)
a.getClass // Set[Int] ...
val b = SortedSet(1, 2) // sub-class of Set
b.getClass // SortedSet[Int] ...
a.isInstanceOf[Set[Int]] // true
b.isInstanceOf[Set[Int]] // also true
a.isInstanceOf[SortedSet[Int]] // false, default Set is not sorted
b.isInstanceOf[SortedSet[Int]] // true
// cast up class hierarchy
b.asInstanceOf[Set[Int]] // SortedSet is also Set
b.asInstanceOf[Iterable[Int]] // most collections are iterable
Type aliases
List<ListenableFuture<Map<String, Metadata>>>
class Track
class ListenableFuture[T]
type TrackFuture = ListenableFuture[Map[String, Track]]
Exercises
("pi" -> 3.14)
, ("e", 2.72)
,("one", 2, 3.0)
Some("one")
, and None
isInstanceOf[]
Think in expressions, not statements
// use :paste mode in Scala console
val volume = 11
var result = ""
if (volume >= 11)
result = "loud"
else
result = "suck"
val a = Array(1.0, 2.0, 3.0)
for (i <- 0 until a.length) a(i) *= 10
Statements - assign value, call function, loop over counter
// if/else evaluates to "passed"/"failed"
val result = if (volume >= 11) "loud" else "suck"
val a = Array(1.0, 2.0, 3.0)
a.map(_ * 10.0) // more on this later
Expressions - given data, apply logic, evaluates to new value
Loops are expressions too
for (i <- 0 until 10) yield i * i
for (i <- 0 until 10 if i % 2 == 0) yield i
for (i <- 1 to 10;
j <- i to 10
if j % i == 0) yield (i, j)
// no C/C++/Java style - for (int i = 0; i < 10; i++) { ... }
// i is mutable and loop body is side effect, impure
// No break (possible but ugly) or continue, and you do not need them
Yes Scala for loops are slow, but you will almost never use themSo are try/catch/finally blocks
def safeGet(a: Array[Int], n: Int): Int = try {
a(n)
} catch {
case e: ArrayIndexOutOfBoundsException => 0
} finally {
println(s"safeGet($n)") // string interpolation
}
val a = Array(1, 2, 3)
safeGet(a, 2)
safeGet(a, 3)
No worries, we almost never use exceptions in data applications :)
Pattern matching
Way more powerful than switch/case or if/else if/else
val color = "black"
val noun = color match {
case "red" => "sox"
case "blue" => "sky"
case "yellow" => "cab"
case "green" => "bay"
case "black" => "metal"
case _ => "crap" // wildcard
}
println(color + " " + noun)
More on this later
Exercises
for/yield
math.sin
for integers between [0, 10)Lots of things optional
def addOne(m: Int): Int = m + 1 // no {}
def factorial(m: Int) = { // no return type
var f = 1
for (i <- 1 to m) f *= i // not idiomatic, more later
f // last expression is return value
}
// java.lang.* (including System) imported by default
def t = System.currentTimeMillis // no (), either declaring or invoking
def hello(name: String) { // no =, return type Unit (void in C/C++)
println("Hello " + name)
}
return
statementOptional and named arguments
def trainSuperFancyUberAweomeModel(
data: Array[Array[Double]],
rank: Int,
alpha: Double = 0.1,
beta: Double = 0.2,
lambda: Double = 0.5) = {
42
}
val data = Array(Array(1.0))
trainSuperFancyUberAweomeModel(data, 10)
trainSuperFancyUberAweomeModel(data, 10, 0.3)
trainSuperFancyUberAweomeModel(data, 10, beta=0.9)
trainSuperFancyUberAweomeModel(rank=5, data=null)
Variable arguments
def plus(l: Int*) = l.sum
plus()
plus(1)
plus(1, 2)
plus(1, 2, 3)
plus(Array(1, 2, 3): _*)
Anonymous functions - a.k.a. λ
Key component of Scala or any FP languageval addOne = (x: Int) => x + 1 // function is data
addOne(1)
List(1, 2, 3).map(addOne)
List(1, 2, 3).map((x: Int) => x + 1)
List(1, 2, 3).map(x => x + 1)
List(1, 2, 3).map(_ + 1)
// commonly seen in Scalding/Spark
List((1, 2), (2, 3), (3, 4)).map { t => // t is (Int, Int)
val (a, b) = t // tuple decomposition
val r = a + b
println(s"$a + $b = $r") // side-effect
r // return value
} // {} and () interchangeable but () recommended for one-liners
High order functions
def applyFn(x: Int, fn: Int => Int) = fn(x)
applyFn(10, _ + 10) // x + 10
def applyFn2(x: Int, y: Int, fn2: (Int, Int) => Int) = fn2(x, y)
applyFn2(10, 20, _ + _) // x + y
def applyFn3(x: Int, y: Int, z: Int, fn3: (Int, Int, Int) => Int) =
fn3(x, y, z)
applyFn3(10, 20, 30, _ + _ + _) // x + y + z
Currying
Curried functions
def add(m: Int, n: Int) = m + n // function (Int, Int) => Int
// partially apply one argument
val add2 = add(2, _: Int) // new function Int => Int
add2(3)
def multiply(m: Int)(n: Int): Int = m * n // function (Int)(Int) => Int
multiply(2)(3)
def multiplyByTwo = multiply(2)(_) // new function Int => Int
multiplyByTwo(3)
map('inFields -> 'outFields)(function)
Also fold(z)(function)
& mapReduceMap(mapFn1)(reduceFn)(mapFn2)
That's why Haskell function signatures look like Int -> Int -> Int
Exercises
n: Int
n
square(x: Double): Double
cube(x: Double): Double
Packages
// AwesomeAlgorithm.scala
package com.spotify.prism
import com.spotify.common._ // wildcard
import scala.io.{Codec, Source} // import many
// rename Java collections to avoid collision
import java.util.{List => JList, Map => JMap}
class AwesomeAlgorithm { // com.spotify.prism.AwesomeAlgorithm
def compute = 42
}
Objects
Basically singletons with static methods
object ProblemFactory {
private var count = 0 // private mutable member
def newProblem() = { // static method
count += 1
count * scala.util.Random.nextDouble
}
}
ProblemFactory.newProblem
Apply method
object RandomDouble {
def apply() = scala.util.Random.nextDouble
}
RandomDouble() // syntactic sugar for RandomDouble.apply()
// classes can have apply member methods too
val v = Vector(10, 20, 30)
v.apply(1)
v(1) // -> 20, same as above
Constructors
class Universe { // default constructor with no argument
val answer = 42 // public immutable member
}
new Universe().answer
class Square(x: Double) { // constructor with one argument
// constructor body {
val size = x * x
// }
}
new Square(10).size
new Square(10).x // error: value x is not a member of Square
More constructors
// x and y are public immutable members
class Vector(val x: Double, val y: Double) { // default constructor
val magnitude = math.sqrt(x * x + y * y)
def this() = this(0.0, 0.0) // constructor overloading
}
val v = new Vector(2, 3)
(v.x, v.y, v.magnitude)
val v = new Vector()
(v.x, v.y, v.magnitude)
Companion objects
Tired of typingnew SomeObject(arg1, arg2)
?
class Vector(val x: Double, val y: Double) {
val magnitude = math.sqrt(x * x + y * y)
println("|($x, $y)| = $magnitude")
}
object Vector {
def apply(x: Double = 0.0, y: Double = 0.0) = new Vector(x, y)
}
Vector()
Vector(1.0, 2.0)
Functions are objects
// Function1[T1, R], Function2[T1, T2, R], ...
// Function22[T1, T2, ... T22, R]
class AddOne extends Function1[Int, Double] {
def apply(x: Int): Double = x + 1.0
}
val plusOne = new AddOne()
plusOne(10)
// shorthand
class AddOne extends (Int => Double) {
def apply(x: Int): Double = x + 1.0
}
// even shorter
val plusOne: Int => Double = _ + 1.0
plusOne(10)
Functions in Java
Function<Integer, Double> addOne = new Function<Integer, Double>() {
@Nullable
@Override
public Double apply(@Nullable Integer x) {
if (x == null) return 1.0;
return x + 1.0;
}
};
addOne.apply(10);
Guava function explained
DoFn<Integer, Double> addOne = new DoFn<Integer, Double>() { // Crunch
@Override
public void process(Integer x, Emitter<Double> emitter) {
emitter.emit(x + 1.0);
}
};
Crunch (not exactly due to internal state, and hence not compatible with Java 8 lambda)Case classes
// public immutable fields, toString and apply method for free
case class Metal(name: String, country: String, femaleSinger: Boolean = false,
violin: Boolean = false, flameThrower: Boolean = false)
Think C struct or Python namedtuple
val inflames = Metal("In Flames", "SE") // apply, no need for new
println(inflames) // free toString
val epica = Metal("Epica", "NL", true)
val delain = epica.copy(name="Delain") // free copy constructor
val korpiklaani = Metal("Korpiklaani", "FI", violin=true)
val rammstein = Metal("Rammstein", "DE", flameThrower=true)
def classify(m: Metal) = m match {
case Metal(_, "SE", false, false, false) => "melodic death"
case Metal(_, "FI", false, true, _) => "folk"
case Metal(_, "NL", true, _, _) => "goth"
case Metal(_, _, _, _, true) => "Tanz Metal"
case _ => "unknown"
}
Powerful with pattern matching
Exercises
Time
with hour, minute, and secondAM
to the constructorMidnight
and Noon
companion objects with apply()
Further reading