PySpark Functional Programming: Stop Writing Imperative Spark Pipelines

Fri, 10 Apr 2026 10:00:00 +0100

In my recent project I ran into a situation where I had to review a set of PySpark notebooks in Microsoft Fabric — 14 notebooks, some of them over 3000 lines long, hundreds of cells, multiple data domains crammed into a single file. The code worked, but reading it felt like archaeology. Every notebook started the same way: df = spark.read..., then df = df.withColumn(...) repeated dozens of times, sprinkled with display(df) calls and bare except: blocks. I kept asking myself — how did we end up writing Spark code like this?

Functional Programming on KaPa Consulting

PySpark Functional Programming: Stop Writing Imperative Spark Pipelines