Data Transfer Workflow: Source to Target in DBConvert Streams β
Overview β
DBConvert Streams is a high-performance data transfer system designed to efficiently move data between different database systems. This document outlines the conceptual workflow of how data flows from a source database to a target database, explaining the key components, processes, and mechanisms involved.
Architecture Components β
The data transfer process involves several key components:
- Source Reader: Reads data from the source database
- Event Hub: Manages the flow of events between components
- Target Writer: Processes events and writes data to the target database
- Sequencer: Ensures events are processed in the correct order
- Statistics Tracker: Monitors and reports on the transfer process
Detailed Workflow β
1. Initialization Phase β
Connection Setup:
- Source establishes connection to the source database
- Target establishes connection to the target database
- Both components connect to the Event Hub (NATS JetStream)
Stream Creation:
- Event Hub creates streams for data and control messages
- Consumers are set up to process these streams
Table Structure Creation (if needed):
- Target may create necessary table structures in the target database
2. Data Reading Phase β
Source Reading:
- Source reads data from tables in batches
- Data is converted into events containing:
- Table ID
- Log position
- Action type (INSERT, UPDATE, DELETE)
- Row data (before/after states)
Event Publishing:
- Events are published to the Event Hub
- Events are marked as ordered or unordered based on requirements
- Source tracks and reports statistics on read events
3. Data Transfer Phase β
Event Consumption:
- Target subscribes to data streams from the Event Hub
- Events are received in batches
Deduplication:
- Each event batch is assigned a unique key based on:
- First and last row IDs
- Number of rows
- Action type
- Table ID
- Data hash
- Duplicate batches are detected and skipped
- Each event batch is assigned a unique key based on:
Event Sequencing:
- Ordered events are processed through the sequencer
- Sequencer ensures events are processed in the correct order
- Unordered events bypass the sequencer for faster processing
4. Data Writing Phase β
Event Processing:
- Events are dispatched to worker goroutines
- Workers process events concurrently
Database Operations:
- For INSERT events: Bulk insert or upsert operations
- For UPDATE events: Update existing records
- For DELETE events: Remove records based on primary keys
Transaction Management:
- Operations are executed within transactions
- Transactions are committed or rolled back based on success
5. Monitoring and Reporting β
Statistics Tracking:
- Per-table statistics (events, data size)
- Overall transfer statistics (total events, processing rate)
Progress Reporting:
- Periodic reporting of transfer progress
- Final report of completed transfer
Event Count Monitoring:
- Continuous monitoring of source vs. processed event counts
- Detection of potential issues (large differences, stuck processing)
6. Completion Phase β
Finalization:
- Verification that all events have been processed
- Grace period for processing remaining events
- Status set to FINISHED when complete
Cleanup:
- Connections closed
- Resources released
- Final statistics reported
Data Flow Diagram β
βββββββββββββββ βββββββββββββββββ βββββββββββββββ
β β β β β β
β Source ββββββΆβ Event Hub ββββββΆβ Target β
β Database β β (NATS/JS) β β Database β
β β β β β β
βββββββββββββββ βββββββββββββββββ βββββββββββββββ
β β² β
β β β
βΌ β βΌ
βββββββββββββββ βββββββββββββββββ βββββββββββββββ
β β β β β β
β Source ββββββΆβ Event StreamsββββββΆβ Target β
β Reader β β & Consumers β β Writer β
β β β β β β
βββββββββββββββ βββββββββββββββββ βββββββββββββββ
β
β
βΌ
βββββββββββββββ
β β
β Worker β
β Pool β
β β
β β
βββββββββββββββ
Key Mechanisms β
Event Batching β
Events are processed in batches to optimize performance. Each batch contains:
- Table identifier
- Log position
- Action type
- Row data (before/after states)
- Size information
Deduplication Strategy β
To prevent duplicate processing:
- Each batch is assigned a unique key
- The key includes first ID, last ID, row count, action type, table ID, and data hash
- Processed batch keys are stored in a sync.Map
- Periodic cleanup prevents memory leaks
Sequencing Logic β
For ordered events:
- Events are checked against the sequencer
- If not ready for processing, they're requeued
- When ready, they're processed and marked as complete
- Unordered events bypass this check
Error Handling β
- Transactional Safety: Database operations use transactions
- Retry Mechanism: Failed operations can be retried
- Pause/Resume: Processing can be paused and resumed
- Monitoring: Continuous monitoring detects issues
Performance Considerations β
- Worker Pool: Up to 40 concurrent workers process events
- Channel Buffering: Channels are buffered to handle bursts
- Batch Processing: Data is processed in batches for efficiency
- Ordered vs. Unordered: Events can be processed in order or out of order
Statistics and Monitoring β
The system tracks and reports:
Per-Table Statistics:
- Number of events processed
- Data size transferred
- Percentage of total transfer
Overall Statistics:
- Total events from source
- Total events processed
- Processing rate
- Elapsed time
Health Metrics:
- Difference between source and processed events
- Channel utilization
- Processing status