Ramping up your DevOps-fu for big data developers


Reproducible setups for test & deployment are hard. Harder on a cluster. This talk presents lessons learned making a Spark distribution.


We developers work hard on applications that we sometimes like to forget when they ship. Yet not only are modern applications part of a stack, but when dealing with big data, cluster deployment can become a headache.

Typesafe recently developed a distribution of Spark, a next-generation distributed computation engine. This meant testing it with a dizzying array of persistence layers, cluster managers, and input sources. Not to mention versioning hell.

This talk will explain the lessons we learned, and which tools helped us make the test and deployment process easy. We'll see how orchestration scripts helped us manage many machines at once, both on the cloud and on premises. We'll see how virtualization let us create reproducible configurations, without tying us with a vendor. We will also explain how containers and fine-grained resource managers helped us make the best use of our machines.

And while this tour will show famous tools like Docker in action, it will also report on niche tools that perform great in critical parts.

Modern DevOps tools let developers test and deploy their App on a cluster, in an automated and reproducible way. The audience should come for a better grasp of that fast-moving environment.


► Regarder la vidéo


#TALK en Anglais

François Garillot

Working on Spark, passionate about data analysis in Scala.