PySpark 101: Introduction to Big Data with Spark
A beginner-friendly introduction to using PySpark for big data processing with Apache Spark, covering the fundamentals.
A beginner-friendly introduction to using PySpark for big data processing with Apache Spark, covering the fundamentals.
A guide to performing data operations using PySpark, Pandas, DuckDB, Polars, and DataFusion within a pre-configured Docker environment.
A guide on using the alexmerced/datanotebook Docker image for a quick data notebook environment with pre-installed libraries like pandas, Polars, and PySpark.
A troubleshooting guide for fixing the 'java.lang.ClassNotFoundException: delta.DefaultSource' error when using Delta Lake with PySpark in Jupyter.
A hands-on tutorial exploring LakeFS for data versioning and branching using PySpark and Jupyter notebooks in a data engineering context.