Skip to content

Data Transfer Workflow: Source to Target in DBConvert Streams ​

Overview ​

DBConvert Streams is a high-performance data transfer system designed to efficiently move data between different database systems. This document outlines the conceptual workflow of how data flows from a source database to a target database, explaining the key components, processes, and mechanisms involved.

Architecture Components ​

The data transfer process involves several key components:

  1. Source Reader: Reads data from the source database
  2. Event Hub: Manages the flow of events between components
  3. Target Writer: Processes events and writes data to the target database
  4. Sequencer: Ensures events are processed in the correct order
  5. Statistics Tracker: Monitors and reports on the transfer process

Detailed Workflow ​

1. Initialization Phase ​

  1. Connection Setup:

    • Source establishes connection to the source database
    • Target establishes connection to the target database
    • Both components connect to the Event Hub (NATS JetStream)
  2. Stream Creation:

    • Event Hub creates streams for data and control messages
    • Consumers are set up to process these streams
  3. Table Structure Creation (if needed):

    • Target may create necessary table structures in the target database

2. Data Reading Phase ​

  1. Source Reading:

    • Source reads data from tables in batches
    • Data is converted into events containing:
      • Table ID
      • Log position
      • Action type (INSERT, UPDATE, DELETE)
      • Row data (before/after states)
  2. Event Publishing:

    • Events are published to the Event Hub
    • Events are marked as ordered or unordered based on requirements
    • Source tracks and reports statistics on read events

3. Data Transfer Phase ​

  1. Event Consumption:

    • Target subscribes to data streams from the Event Hub
    • Events are received in batches
  2. Deduplication:

    • Each event batch is assigned a unique key based on:
      • First and last row IDs
      • Number of rows
      • Action type
      • Table ID
      • Data hash
    • Duplicate batches are detected and skipped
  3. Event Sequencing:

    • Ordered events are processed through the sequencer
    • Sequencer ensures events are processed in the correct order
    • Unordered events bypass the sequencer for faster processing

4. Data Writing Phase ​

  1. Event Processing:

    • Events are dispatched to worker goroutines
    • Workers process events concurrently
  2. Database Operations:

    • For INSERT events: Bulk insert or upsert operations
    • For UPDATE events: Update existing records
    • For DELETE events: Remove records based on primary keys
  3. Transaction Management:

    • Operations are executed within transactions
    • Transactions are committed or rolled back based on success

5. Monitoring and Reporting ​

  1. Statistics Tracking:

    • Per-table statistics (events, data size)
    • Overall transfer statistics (total events, processing rate)
  2. Progress Reporting:

    • Periodic reporting of transfer progress
    • Final report of completed transfer
  3. Event Count Monitoring:

    • Continuous monitoring of source vs. processed event counts
    • Detection of potential issues (large differences, stuck processing)

6. Completion Phase ​

  1. Finalization:

    • Verification that all events have been processed
    • Grace period for processing remaining events
    • Status set to FINISHED when complete
  2. Cleanup:

    • Connections closed
    • Resources released
    • Final statistics reported

Data Flow Diagram ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             β”‚     β”‚               β”‚     β”‚             β”‚
β”‚   Source    │────▢│   Event Hub   │────▢│   Target    β”‚
β”‚  Database   β”‚     β”‚  (NATS/JS)    β”‚     β”‚  Database   β”‚
β”‚             β”‚     β”‚               β”‚     β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                    β–²                    β”‚
       β”‚                    β”‚                    β”‚
       β–Ό                    β”‚                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             β”‚     β”‚               β”‚     β”‚             β”‚
β”‚   Source    │────▢│  Event Streams│────▢│   Target    β”‚
β”‚   Reader    β”‚     β”‚  & Consumers  β”‚     β”‚   Writer    β”‚
β”‚             β”‚     β”‚               β”‚     β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                β”‚
                                                β”‚
                                                β–Ό
                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                          β”‚             β”‚
                                          β”‚  Worker     β”‚
                                          β”‚   Pool      β”‚
                                          β”‚             β”‚
                                          β”‚             β”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Mechanisms ​

Event Batching ​

Events are processed in batches to optimize performance. Each batch contains:

  • Table identifier
  • Log position
  • Action type
  • Row data (before/after states)
  • Size information

Deduplication Strategy ​

To prevent duplicate processing:

  1. Each batch is assigned a unique key
  2. The key includes first ID, last ID, row count, action type, table ID, and data hash
  3. Processed batch keys are stored in a sync.Map
  4. Periodic cleanup prevents memory leaks

Sequencing Logic ​

For ordered events:

  1. Events are checked against the sequencer
  2. If not ready for processing, they're requeued
  3. When ready, they're processed and marked as complete
  4. Unordered events bypass this check

Error Handling ​

  1. Transactional Safety: Database operations use transactions
  2. Retry Mechanism: Failed operations can be retried
  3. Pause/Resume: Processing can be paused and resumed
  4. Monitoring: Continuous monitoring detects issues

Performance Considerations ​

  1. Worker Pool: Up to 40 concurrent workers process events
  2. Channel Buffering: Channels are buffered to handle bursts
  3. Batch Processing: Data is processed in batches for efficiency
  4. Ordered vs. Unordered: Events can be processed in order or out of order

Statistics and Monitoring ​

The system tracks and reports:

  1. Per-Table Statistics:

    • Number of events processed
    • Data size transferred
    • Percentage of total transfer
  2. Overall Statistics:

    • Total events from source
    • Total events processed
    • Processing rate
    • Elapsed time
  3. Health Metrics:

    • Difference between source and processed events
    • Channel utilization
    • Processing status

DBConvert Streams - event driven replication for databases