Project Design Document

SeEDM (Secure Enterprise Data Masking & Migration) Platform

Version: 1.0

Date: January 26, 2026

1. Executive Summary

1.1 The Business Problem

In the modern software lifecycle, enterprises face a critical bottleneck: Data Friction. Developers need realistic, "production-like" data to test applications effectively. However, stringent privacy regulations (GDPR, HIPAA, DPDP, CCPA) strictly prohibit copying raw PII (Personally Identifiable Information) to lower environments (Staging/QA).

Current solutions—manual SQL scripts or legacy ETL tools—are slow, error-prone, and expensive. This results in developers waiting days for data or using poor-quality synthetic data that hides bugs.

1.2 The SeEDM Solution

SeEDM (Secure Efficient Data Migration) is a purpose-built infrastructure platform that automates the secure streaming of data from Production to Non-Production environments.

By leveraging Java 25 Virtual Threads and Spring Boot 4, SeEDM moves Terabytes of data while applying military-grade obfuscation in real-time, in-memory. It ensures that sensitive data never lands on a disk in plain text, reducing data provisioning time from days to minutes.

Key Value Propositions:

2. Solution Architecture

The platform follows a Hub-and-Spoke architecture, designed for high throughput and security isolation.

2.1 Component Breakdown

  1. The SeEDM Engine (Backend):
  1. The Cockpit (Frontend):
  1. The Vault (Metadata DB):
  1. Connectors & Adaptors:

3. Technical Specifications

3.1 The "Power Stack"

Layer

Technology

Strategic Justification

Framework

Spring Boot 4.0.0

Uses Project Leyden for instant startup and modular architecture.

Language

Java 25 (LTS)

Native support for Scoped Values, allowing context sharing across 100k+ threads with zero memory overhead.

Batch Core

Spring Batch 6.x

Optimized for fault tolerance and auto-cleanup of failed sub-tasks.

Frontend

React 19

High-performance concurrent rendering for real-time dashboards.

Security

Spring Security 7

OAuth2 / RBAC for accessing the dashboard.

3.2 Hardware Requirements (Host Node)

4. Core Functional Features

4.1 High-Performance Migration

4.2 Security & Data Integrity

Gemini_Generated_Image_m50jenm50jenm50j.png

5. Legacy System Migration Strategy

Legacy databases (Oracle 9i/10g, SQL Server 2008, DB2) often present unique challenges such as missing constraints or complex keys. SeEDM includes specific features to handle these "Brownfield" environments.

5.1 Virtual Foreign Keys (The "Implicit Link" Solver)

Legacy applications often enforce relationships in code (Java/C++) rather than in the database (FK Constraints) to improve write performance. Standard ETL tools fail here because they don't know the load order.

5.2 Composite Key Handling

Many legacy banking/telecom tables do not have a single ID column. Instead, they use a combination of columns as a Primary Key (e.g., Branch_ID + Sequence_No + Fiscal_Year).

SQL: SELECT * FROM Transactions ORDER BY Branch_ID, Sequence_No
OFFSET ? LIMIT ?

5.3 Legacy Data Type Support

6. Configuration & Rules Engine

SeEDM uses a Declarative Configuration model. Migration logic is decoupled from the code, allowing Data Engineers to manage rules via simple YAML files.

6.1 The Job Profile (job_config.yaml)

This file defines what to move and how to mask it.

YAML

job_profile:
  name: "End_of_Month_Refresh"
  batch_mode: "PARALLEL_PARTITIONING"
  chunk_size: 1000
  threads: 50

connections:
  source:
    url: "jdbc:oracle:thin:@prod-db:1521:ORCL"
    username: "${VAULT_PROD_USER}"
  target:
    url: "jdbc:postgresql://stage-db:5432/app_db"
    username: "${VAULT_STAGE_USER}"

# LEGACY SUPPORT: Define relationships not present in DB Schema
virtual_relationships:
  - parent: "Customer_Master.CIF_Number"
    child:  "Loan_Accounts.Cust_Ref_ID"

masking_rules:
  - table: "Customer_Master"
    columns:
      - name: "Email_Address"
        action: "FAKER_EMAIL"      # Generates realistic fake emails
       
      - name: "Phone_Number"
        action: "FAKER_PHONE_IN"   # Generates +91 format phones
       
      - name: "National_ID"
        action: "FPE_ENCRYPT"      # Format-Preserving Encryption
        key_ref: "master-key-v1"
       
      - name: "Salary"
        action: "NUMERIC_VARIANCE" # Varies value by +/- 10%
        variance_percent: 10

6.2 Internal Metadata DB Schema

The engine uses an internal PostgreSQL database (The Vault) to track state.

Gemini_Generated_Image_8vkt8e8vkt8e8vkt.png

7. Operational Monitoring (The Cockpit)

The User Interface is designed for Observability, not Data Browsing.

7.1 Blind Dashboard Features

7.2 Real-Time Architecture

uml diagram .jpg

8. Development Roadmap

Phase

Duration

Goals & Deliverables

Phase 1: The Core

Weeks 1-4

Deliverable: Engine.jar.

- Spring Batch Setup.

- Connection Management.

- Composite Key Readers.

Phase 2: Security

Weeks 5-6

Deliverable: Masking Service.

- Implement FPE & Faker Logic.

- Unit Tests for referential integrity.

Phase 3: The Cockpit

Weeks 7-9

Deliverable: React Dashboard.

- SSE Integration.

- YAML Configuration UI.

Phase 4: Hardening

Weeks 10-12

Deliverable: Production Release.

- End-to-End Stress Testing (1TB).

- Docker/Kubernetes Deployment Scripts.

Gemini_Generated_Image_h5r3efh5r3efh5r3.png

9. Future Roadmap (Version 2.0)

We have laid the groundwork for these future features in our architectural choices:

  1. Distributed Remote Workers: Using Project Leyden, we will create tiny, instant-start Docker containers. If a job is too big for one server, the system will spin up 50 worker containers across a Kubernetes cluster to share the load.
  2. AI PII Auto-Discovery: An intelligent scanner that looks at column data (not just names) and suggests: "This column looks like a Passport Number. Apply Masking?"
  3. Adaptive Throttling: The engine will monitor the Production Database's latency. If Production slows down, SeEDM will automatically pause or slow its reading to prevent impacting real users ("Good Neighbor Policy").

10. Conclusion

The SeEDM Platform transforms data migration from a high-risk manual task into a secure, automated infrastructure service. By solving the specific challenges of Legacy Database Compatibility and providing a flexible Configuration Engine, SeEDM fits seamlessly into complex enterprise environments while ensuring absolute data privacy.