Delta Lake essential Fundamentals: Part 3 - compaction and checkpoint

Let’s understand what are Delta Lake compact and checkpoint and why they are important. Checkpoint There are two known checkpoints mechanism in Apache Spark that can confuse us with DeltaLake checkpoint, so let’s understand them and how they differ from each other: Spark RDD Checkpoint Checkpoint in Spark RDD is a mechanism to persist current RDD to a file in a dedicated checkpoint directory while all references to its parent RDDs are removed.

Read more →

Delta Lake essential Fundamentals: Part 2 - The DeltaLog

In the previous part, you learned what ACID transactions are. In this part, you will understand how Delta Transaction Log, named DeltaLog, is achieving ACID. Transaction Log A transaction log is a history of actions executed by a (TaDa 💡) database management system with the goal to guarantee ACID properties over a crash. DeltaLake transaction log - DetlaLog DeltaLog is a transaction log directory that holds an ordered record of every transaction committed on a Delta Lake table since it was created.

Read more →