Three Reasons a Data Engineer Should Learn Scala
This article was written in collaboration with Hakka Labs (original link)
There has been a lot of debate over Scala lately, including criticisms like this, this, this, and defenses like this and this. Most of the criticisms seem to focus on the language’s complexity, performance, and integration with existing tools and libraries, while some praise its elegant syntax, powerful type system, and good fit for domain-specific languages.
However most of the discussions seem based on experiences building production backend or web systems where there are a lot of other options already. There are mature, battle tested options like Java, Erlang or even PHP, and there are Go, node.js, or Python for those who are more adventurous or prefer agility over performance.
Here I want to argue that there’s a best tool for every job, and Scala shines for data processing and machine learning, for the following reasons:
- good balance between productivity and performance
- integration with big data ecosystem
- functional paradigm
Productivity without sacrificing performance
In the big data & machine learning world where most developers are from Python/R/Matlab background, Scala’s syntax, or the subset needed for the domain, is a lot less intimidating than that …
more ...