Overview
scala.collection
mutable
and immutable
List
, Set
, Map
, ... imported by defaultimport scala.collection.mutable
val s1 = Set("A", "B") // default immutable version
val s2 = mutable.Set("A", "B") // imported mutable version
s1.add("C") // error, no such method
s2.add("C") // true, success!
Abstract class hierarchy
Set
/Map
→ HashSet
/HashMap
SortedSet
/SortedMap
→ TreeSet
/TreeMap
IndexedSeq
→ Vector
, Array
, String
, ...LinearSeq
→ List
, Stream
, Queue
, Stack
, ...Unified API
import scala.collection.immutable._
// some common collection interfaces
// can traverse, iterate, access linearly (head/tail)
Traversable(1, 2, 3) // creates a List by default
Iterable("x", "y", "z") // also List
LinearSeq(1.0, 2.0, 3.0) // List again
// can also random access
IndexedSeq(1.0, 2.0) // creates a Vector by default
Set("a", "b", "c")
Map("a" -> 1, "b" -> 2, "c" -> 3)
// more on these later
List(1, 2, 3).map(_ + 1)
Set(1, 2, 3).map(_ + 1)
Lists
val list = List(1, 2, 3)
list.head // element, 1
list.tail // rest, List(2, 3)
list.tail.tail.tail // List()
// Haskell style syntax
1 :: 2 :: Nil // head :: (head :: tail), right associative
def getHead(list: List[Int]) = list match {
case 1 :: _ => "one" // decompose list in pattern matching
case 2 :: _ => "two"
case _ => "many"
}
getHead(list)
getHead(list.tail)
getHead(list.tail.tail)
Concatenation
// ++ for 2 collections
List(1, 2) ++ List(3, 4)
Set(1, 2) ++ Set(2, 3)
Map("A" -> 1, "B" -> 2) ++ Map("C" -> 3)
// +:/:+ for preppending/appending an element in Seq (List, Vector, ...)
val list = List(1, 2, 3)
0 +: list // colon facing the collection
list :+ 4 // linked list, append is O(n)!
// +/- for adding/substracting an element in set/map
val set = Set(1, 2, 3)
set + 4 - 3
val map = Map("A" -> 1, "B" -> 2, "C" -> 3)
map + ("D" -> 4) - "A"
map + ("A" -> 10) // overwrites existing key
Set/Map as functions
val s = Set(1, 3, 5)
(s(1), s(2)) // (true, false), actually s.apply(1) & s.apply(2)
(1 to 5).map(s) // Vector(true, false, true, false, true)
val m = Map(1 -> "a", 2 -> "b").withDefaultValue(null)
(m(1), m(3)) // (a, null), actually m.apply(1) & m.apply(3)
(1 to 5).map(m) // Vector(a, b, null, null, null)
Ranges
Range(0, 10) // Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Range(0, 10, 2) // Range(0, 2, 4, 6, 8)
0 until 10 // Range(0, 10)
0 until 10 by 2 // Range(0, 10, 2)
1 to 10 // Range(1, 11)
1 to 10 by 2 // Range(1, 11, 2)
(1 to 10).toList
Commonly used in for
comprehensions
Java interoperability
val s = List(1, 2, 3)
val j = java.util.Collections.singletonList(1)
// implicits for pimp my library pattern, more later
import scala.collection.JavaConverters._
s.asJava // extra pimped method for scala.collection.List
j.asScala // extra pimped method for java.util.List
// implicit conversion methods
import scala.collection.JavaConversions._
def countS(s: Seq[Int]) = s.size
def countJ(j: java.util.List[Int]) = j.size
countS(j) // java.util.List -> Seq
countJ(s) // List -> java.util.List
Exercises
List[Int]
List[Int]
and Set[Int]
Set[Int]
and keys in a Map[Int, String]
s: Set[String]
and m: Map[String, Int]
, build a new Map with only keys from s
Basics
val l1 = List(1, 2, 3)
val l2 = l1.map(_ + 1) // new List with +1 applied to every element
// break down
l1.map { x =>
val r = x + 1
println(s"$x + 1 = $r") // side-effect
r
}
// like map, but for side-effects only
l1.foreach(println) // returns Unit
Also see Lambdas and Streams in Java 8 Libraries
Slicing
val l1 = List(1, 2, 3, 4, 5)
val l2 = l1.filter(_ > 2)
l1.filter { x =>
val r = x > 2
println(x + " " + (if (r) "keep" else "drop"))
r
}
l1.drop(1) // drop first item
l1.drop(2) // drop first 2 items
l1.dropWhile(_ % 2 == 1) // drop first X consecutive even numbers
l1.take(2)
l1.takeWhile(_ % 2 == 1)
// also see takeRight and dropRight
Combining/splitting
val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3)
l1.zip(l2) // -> List[(String, Int)]
l1.zipWithIndex // -> List[(String, Int)], i.e. Python enumerate
// Map entries are Tuple2:s (Pair)
l1.zip(l2).toMap // -> Map[String, Int]
Map("a" -> 1, "b" -> 2).toList // -> List[(String, Int)]
(1 to 10).partition(_ % 2 == 0) // -> (List[Int], List[Int])
Flatten out
// List[List[Int]] -> List[Int]
List(List(1, 2, 3), List(4, 5, 6), List()).flatten // empty member out
// List[Map[String, Int]] -> List[(String, Int)]
List(Map("A" -> 1, "B"-> 2), Map("C" -> 3, "D" -> 4)).flatten
val lyrics = List("We all live in Amerika", "Amerika ist wunderbar")
lyrics.flatMap(_.split(" ")) // flatMap = map then flatten
lyrics.map(_.split(" ")).flatten // equivalent
// break down
lyrics.flatMap { line =>
val tokens = line.split(" ") // split returns Array[String]
println(tokens.mkString(" | ")) // and Java Array has ugly .toString()
tokens
}
Reduce
// reduce function is (T, T) => T for List[T]
List(2.0, 3.0, 4.0).reduce(math.pow)
// ((2.0 ^ 3.0) ^ 4.0)
Combining values into a single result of the same type
// break down
List(2.0, 3.0, 4.0).reduce { (x, y) =>
val r = math.pow(x, y)
println(s"$x ^ $y = $r")
r
}
List("A", "B", "C").reduce("(%s, %s)".format(_, _))
List("A", "B", "C").reduceRight("(%s, %s)".format(_, _))
Reduce visualized
☜ and ☞
fn
is associative and commutative (Monoid!)Fold
List(1, 1, 1, 2, 2, 3, 4).foldLeft(Set[Int]())(_ + _) // Set[Int] + Int
Combining values into an initial value (of possibly a different type)
val bytes = List(222, 173, 190, 239) // List[Int]
// 2 arguments, start value for the accumulator: String
// and binary operator: (String, Int) => String
// operator folds each item into accumulator
bytes.foldLeft("0x")(_ + _.toHexString.toUpperCase)
// break down
bytes.foldLeft("0x") { (str, byte) =>
println("byte = \"%d\", str = %s".format(byte, str))
str + byte.toHexString.toUpperCase
}
foldRight
Fold visualized
☜ and ☞
Folding a linked-list of1 → 2 → 3 → 4 → 5 → []
Scan
val range = (1 to 10)
val partials = range.scanLeft("List")(_ + ":" + _)
partials.foreach(println)
scanRight
Grouping
val lyrics = List("We all live in Amerika", "Amerika ist wunderbar")
val tokens = lyrics.flatMap(_.split(" ")) // List[String]
// use :paste mode in Scala console
tokens
.groupBy(identity) // Map[String, List[String]]
.map(p => (p._1, p._2.size)) // Map[String, Int]
.toVector // Vector[(String, Int)]
.sortBy(_._2) // sort by second item
.reverse
.take(3)
// longest token
tokens.groupBy(_.length).toVector.sortBy(_._1).reverse.take(1)
Exercises
Range
and reduce
lyrics
has most number of tokensMap[String, Int]
Map[Int, Int]
)Functions are objects
object addOne extends Function1[Int, Int] {
def apply(x: Int): Int = x + 1
}
object add extends Function2[Int, Int, Int] {
def apply(x: Int, y: Int): Int = x + y
}
object mul extends ((Double, Double) => Double) {
def apply(x: Double, y: Double): Double = x * y
}
val div = (x: Double, y: Double) => x / y
Chaining functions
def sqSum(v: Iterable[Double]) = v.map(math.pow(_, 2.0)).reduce(_ + _)
val l2norm1 = math.sqrt _ compose sqSum _ // l^2norm = sqrt(sqSum(v))
val l2norm2 = sqSum _ andThen math.sqrt _ // same as above
l2norm1(List(3.0, 4.0))
l2norm2(List(3.0, 4.0))
Functions are data
val square = math.pow(_: Double, 2.0) // partially applied function
square.getClass // Double => Double
// math.* are class methods (JVM), _ converts them to functions (object)
val functions = List(square, math.sqrt _, math.log _, math.log10 _)
// List[Double => Double]
functions.map(_(10.0)) // apply same argument to all functions
// chaining rightward
functions.reduce(_ compose _)(10.0) // square(sqrt(log(log10(10.0))))
// chaining leftward
functions.reduce(_ andThen _)(10.0) // log10(log(sqrt(square(10.0))))
Predicates
// Int => Boolean
val isEven = { x: Int => x % 2 == 0 }
val isSquare = { x: Int => math.pow(math.sqrt(x).toInt, 2.0) == x }
val range = 1 to 30
range.filter(isEven)
range.filter(isSquare)
// take 2 predicates, create a new one
// more on templates later
def and[A](fn1: A => Boolean, fn2: A => Boolean) = {
x: A => fn1(x) && fn2(x)
}
range.filter(and(isEven, isSquare))
See also Guava function explained
Composing predicates
// function returning function
def not[A](fn: A => Boolean) = { x: A => !fn(x) }
range.filter(not(isEven))
range.filter(not(isSquare))
// function with variable number of arguments
def and[A](fns: (A => Boolean)*) = { x: A => fns.forall(fn => fn(x)) }
def or[A](fns: (A => Boolean)*) = { x: A => fns.exists(fn => fn(x)) }
range.filter(and(isEven, isSquare))
range.filter(or(isEven, isSquare))
Now you can programmatically control predicates, e.g. query parameter, streams
Closure
// a function that creates new functions
def makePowFn(n: Double): Double => Double = {
x: Double => math.pow(x, n) // n is from enclosing scope
}
val square = makePowFn(2.0) // n = 2.0 in double's closure
val cube = makePowFn(3.0) // n = 3.0 in triple's closure
square(10.0)
cube(10.0)
Partial functions
val one: PartialFunction[Int, String] = { case 1 => "one" }
one.isDefinedAt(1) // true
one.isDefinedAt(2) // false
val two: PartialFunction[Int, String] = { case 2 => "two" }
val three: PartialFunction[Int, String] = { case 3 => "three" }
val wildcard: PartialFunction[Int, String] = { case _ => "something else" }
// chaining partial functions
val oneTwoThree = one orElse two orElse three
val number = one orElse two orElse three orElse wildcard
List(1, 2, 3, 4, 5).map(oneTwoThree.isDefinedAt)
List(1, 2, 3, 4, 5).map(number.isDefinedAt)
Lifting partial functions
val one: PartialFunction[Int, String] = { case 1 => "one" }
one(1) // defined
one(2) // error!
val plainOne = one.lift // plain function that returns Option
plainOne(1) // Some(one)
plainOne(2) // None
val partialOne = Function.unlift(plainOne)
partialOne(1) // defined
partialOne(2) // error!
Anonymous partial functions
val lyrics = List("We all live in Amerika", "Amerika ist wunderbar")
// use :paste mode in Scala console
lyrics
.flatMap(_.split(" ")) // List[String]
.groupBy(identity) // Map[String, List[String]]
.map { case (token, list) => (token, list.size) }
// more readable than .map(p => (p._1, p._2.size))
case class Band(name: String, members: Int)
val bands = List(
Band("Pungent Stench", 3),
Band("Rammstein", 6),
Band("Haggard", 18))
// don't care about band name
bands.filter { case Band(_, members) => members > 10 }
Exercises
mkLogWithBase(b: Double): Double => Double
and
, find captalized tokens that has 3+ characters and contains 'i'Lazy val and views
def plog(x: Int) = {
println(x)
math.log(x)
}
lazy val data = plog(10)
println(data) // Computing...
println(data) // cached
// Scala collections are strictly (eagerly) evaluated
(1 to 100).map(plog).take(5) // waste of CPU cycles
// .view to convert to lazy view, .force to strictly evaluate
(1 to 100).view.map(plog).take(5).force
Streams
// recursive, lazy, and infinite, n, n+1, n+2, ...
def intsFrom(n: BigInt): Stream[BigInt] = n #:: intsFrom(n + 1)
intsFrom(10).take(5).force // Stream(10, 11, 12, 13, 14)
Stream.from(10).take(5).force // same as above
// use :paste mode in Scala console
val factorial = Stream
.from(2) // 2, 3, ...
.scanLeft(1)(_ * _) // 1, 2 * 1, 3 * 2 * 1, ...
factorial.take(10).force
Examples
// Fibonacci take 1
val fibs1: Stream[Int] = 0 #:: 1 #:: fibs1.zip(fibs1.tail).map { n => n._1 + n._2 }
fibs1.take(10).force
// Fibonacci take 2
val fibs2: Stream[Int] = 0 #:: fibs2.scanLeft(1)(_ + _)
fibs2.take(10).force
// prime numbers
def sieve(s: Stream[Int]): Stream[Int] = {
s.head #:: sieve(s.tail.filter(_ % s.head != 0))
}
val primes: Stream[] = sieve(Stream.from(2))
primes.take(10).force
See Project Euler for more challenges
Choose the right data structures
Unnecessary copies
val m = Map("A" -> 1, "B" -> 2, "C" -> 3)
m.toList.map(t => (t._1, t._2 + 1)).toMap
for ((k, v) <- m) yield (k, v + 1)
m.map { case (k, v) => (k, v + 1) }
m.mapValues(_ + 1)
How many copies now?
val m1 = Map("A" -> 1.0, "B" -> 2.0, "C" -> 3.0)
val m2 = Map("A" -> 1.5, "B" -> 2.5, "D" -> 3.5)
def addMaps(m1: Map[String, Double], m2: Map[String, Double]) {
val i = m1.keySet intersect m2.keySet // Set[String]
val m = i.map { k => k -> (m1(k) + m2(k)) } // Set[(String, Double)]
(m1 -- i) ++ (m2 -- i) ++ m // better: m1 ++ m2 ++ m
}
// 50 million Map:s
val pipe = FancyBigDataPipe[Map[String, Double]]("hdfs://...")
// how many copies now?
pipe.foldLeft(Map[String, Double]())(addMaps)
Slightly fewer
val m1 = Map("A" -> 1.0, "B" -> 2.0, "C" -> 3.0)
val m2 = Map("A" -> 1.5, "B" -> 2.5, "D" -> 3.5)
// how many copies?
(m1.keySet ++ m2.keySet) map { k =>
k -> (m1.getOrElse(k, 0.0) + m2.getOrElse(k, 0.0))
}
Harder Faster Better Shorter
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0.0)) }
Cheat with mutable collections
import scala.collection.mutable. { Map => MMap }
def addMaps(m1: MMap[String, Double], m2: MMap[String, Double]) = {
m2.foreach { case (k, v) => m1(k) = v + m1.getOrElse(k, 0.0) }
m1
}
pipe.foldLeft(MMap[String, Double]())(addMaps)
Impure but gets the job done, but beware of side-effects!
Third party libraries
vs. agility
Further reading