Common Data loading models

1. Full Load – Truncate & Reload

2. Incremental Load – Append or Upsert

3. Differential Load – Snapshot Comparison

4. Change Data Capture (CDC) – Log or Trigger Based

1. Full Load – Truncate & Reload

Definition:

  • In this approach, the target table is completely emptied (usually via TRUNCATE) and then fully reloaded with the current source data.

How It Works:

  • Delete all records in the data warehouse table.
  • Read the full dataset from the source.
  • Load it into the warehouse table from scratch.

When to Use:

  • Data volume is relatively small.
  • No reliable method exists to detect changes.
  • Business requires full refresh due to compliance or data integrity.

Pros:

  • Simplicity: Easy to implement.
  • Consistency: Guarantees target is always in sync with source.

Cons:

  • Inefficient for large datasets: Wastes resources by reloading unchanged data.
  • Downtime: Table is unavailable during reload.
  • Heavy I/O and compute usage.

2. Incremental Load – Append or Upsert

Definition:

  • Only the new or modified records since the last load are processed and written to the warehouse.

Types:

  • Append: Only new records are added.
  • Upsert: Existing records are updated, and new ones are inserted.

How It Works:

  • Track changes using mechanisms like:
    • last_updated timestamp.
    • Surrogate keys (monotonically increasing IDs).
    • Audit columns in source system.

When to Use:

  • Large datasets with few changes per load.
  • High frequency data refresh required.

Pros:

  • Efficient and fast: Only processes deltas.
  • Scalable: Supports large, growing datasets.

Cons:

  • Change detection must be accurate.
  • Complex logic for deduplication and conflict resolution.

3. Differential Load – Delta Based on Snapshot Comparison

Definition:

  • Compares current source snapshot with a previously stored snapshot to detect inserts, updates, and deletes.

How It Works:

  1. Extract a full snapshot from the source system.
  2. Store it and compare it row-by-row with the previous snapshot.
  3. Identify:
    • Inserts: Rows in current snapshot but not in the previous one.
    • Updates: Rows with matching keys but different values.
    • Deletes: Rows in previous snapshot but not in the current one.

When to Use:

  • No last_updated or CDC mechanisms available.
  • Source system doesn’t support incremental logic.

Pros:

  • Independent of source system features.
  • Flexible: Works even with legacy systems.

Cons:

  • High compute due to comparison logic.
  • Additional storage needed for snapshots.
  • Latency: Cannot support real-time loads.

4. Change Data Capture (CDC) – Log-based or Trigger-based

Definition:

  • CDC captures row-level changes (INSERT, UPDATE, DELETE) from database logs or triggers, and replicates them to the target.

Types:

  • Log-based CDC: Reads changes from database transaction logs (e.g., MySQL binlog, SQL Server CDC, Oracle redo logs).
  • Trigger-based CDC: Uses database triggers to capture changes and write them to tracking tables.

How It Works:

  • The CDC process continuously reads the change log.
  • Translates changes into actionable operations.
  • Applies changes to the data warehouse with minimal latency.

When to Use:

  • Real-time or near real-time use cases.
  • Modern data pipelines (streaming ETL/ELT).
  • Systems with high data change velocity.

Pros:

  • Real-time updates: Keeps target highly synchronized.
  • Accurate: Captures precise changes.

Cons:

  • Complex setup: Requires deep integration with source DB.
  • Permissions and performance: May require elevated privileges and impact source DB performance.
  • Not universally supported: Some databases don’t expose log access.

Comparison Table:

ModelData VolumeComplexityPerformanceReal-TimeChange Tracking Needed?
Full LoadSmallLowLowNoNo
Incremental LoadMedium–HighMediumHighNoYes (e.g., timestamps)
Differential LoadMedium–HighHighMediumNoNo (needs snapshots)
CDCHighHighVery HighYesYes (via DB logs)

For Code Please REFER HERE

Leave a comment

About the author

Sophia Bennett is an art historian and freelance writer with a passion for exploring the intersections between nature, symbolism, and artistic expression. With a background in Renaissance and modern art, Sophia enjoys uncovering the hidden meanings behind iconic works and sharing her insights with art lovers of all levels.

Get updates

Spam-free subscription, we guarantee. This is just a friendly ping when new content is out.