I’m kind of known as a polyglot among coworkers. We would often argue that instead of hiring great Java/Python/C++ developers, we should rather strive to hire great engineers with strong CS fundamentals who can pick up any language easily. I came from scientific computing background, doing mostly C/C++/Python many years ago. Over the course of the last three years at my current job I coded seven languages professionally, some out of interest and some necessity. I enjoyed the experience learning all these different things and want to share my experience here, what I learned from each one of them and how it helps me becoming a better engineer.
The first language I used seriously, apart from LOGO & BASIC when I was a kid of course. It’s probably the closest thing one can get to the operating system and bare metal without dropping down to assembly (while you still can in C). It’s a simple language whose syntax served as the basis of many successors like C++ & Java. It doesn’t offer any fancy features like OOP or namespaces, but rather depends on the developer’s skill for organizing large code base (think …
more ...One topic that came up a lot when optimizing Scala data applications is the performance of standard collections, or the hidden cost of temporary copies. The collections API is easy to learn and maps well to many Python concepts where a lot of data engineers are familiar with. But the performance penalty can be pretty big when it’s repeated over millions of records in a JVM with limited heap.
Let’s take a look at one most naive example first, mapping the values of a Map
.
val m = Map("A" -> 1, "B" -> 2, "C" -> 3)
m.toList.map(t => (t._1, t._2 + 1)).toMap
Looks simple enough but obviously not optimal. Two temporary List[(String, Int)]
were created, one from toList
and one from map
. map
also creates 3 copies of (String, Int)
.
There are a few commonly seen variations. These don’t create temporary collections but still key-value tuples.
for ((k, v) <- m) yield k -> (v + 1)
m.map { case (k, v) => k -> (v + 1) }
If one reads the ScalaDoc closely, there’s a mapValues
method already and it probably is the shortest and most performant.
m.mapValues(_ + 1)
Similar problem exists …
more ...