Learning Apache Flink S01E02: What *is* Flink?
An introductory overview of Apache Flink, explaining its core concepts as a distributed stream processing framework, its history, and primary use cases.
An introductory overview of Apache Flink, explaining its core concepts as a distributed stream processing framework, its history, and primary use cases.
Author explains their move to Decodable to dive deeper into stream processing, Apache Flink, and work with experts in the field.
A weekly tech learning digest covering Microsoft Fabric, AI topics, computer vision, Azure AI Document Intelligence, embeddings, and vector search.
An introduction to analytical data warehouses, explaining their purpose, differences from transactional databases, and their role in team-based analytics.
Analysis of Hacker News job posts shows the Data Scientist role declining while ML Engineer roles rise, indicating a shift in the data job market.
Interview with Chad Sanderson on data platform leadership, experimentation culture, data quality, and the rise of data contracts.
Explains Project Nessie, an open-source data catalog for Apache Iceberg tables, and its importance for data engineers and architects building data lakehouses.
How to handle mismatched Parquet file schemas when querying multiple files in DuckDB using the UNION_BY_NAME option.
An update on how Monzo integrated machine learning across its organization in 2022, covering team structure, growth, and new initiatives.
Explores the shift to ELT in data engineering, focusing on modern tools like dbt, Fivetran, and Airbyte for loading and transforming data.
A software engineer explains their decision to join Decodable, a startup building a serverless real-time data platform, focusing on stream processing.
A technical walkthrough of using dbt and DuckDB to clean and analyze session feedback data from a tech conference.
A hands-on exploration of using dbt (data build tool) with DuckDB for local data engineering, based on a tutorial project.
Explains the evolution from ETL to ELT in data engineering, clarifying the role of modern tools like dbt in the transformation process.
A hands-on tutorial exploring LakeFS for data versioning and branching using PySpark and Jupyter notebooks in a data engineering context.
A curated list of essential resources for data engineering, including articles, newsletters, podcasts, and tools.
Explores modern data engineering trends in 2022, focusing on analytical data storage formats, organization, and access patterns.
A data engineer explores the evolution of the data ecosystem, comparing past practices with modern tools and trends in 2022.
A PhD student reflects on the complexities of ML engineering, distinguishing between Task MLEs and Platform MLEs, and shares practical lessons from production systems.
An introduction to modern data systems, explaining OLTP, OLAP, data warehouses, data lakes, and the roles of data engineers, analysts, and scientists.