Back to home

case study

Atlas Bridge

ETL
Backend
Node.js
Koa
Datadog
CronJob
Transactions
Conflict resolution

Built a bi-directional ETL bridge that keeps the legacy Atlas Phoenix monolith and the modern Atlas micro-services platform consistent during a long deprecation period.

Atlas Bridge cover

Overview & Problem

The Atlas ecosystem consists of a legacy Phoenix monolith and the new full-stack JavaScript Atlas Platform built as micro-services.

Both systems are temporarily running in parallel in production until we deprecate the legacy monolith. Without a sync mechanism, data between the systems would quickly become inconsistent, leading to operational issues and increased maintenance overhead.

Solution

I built Atlas Bridge, an ETL system in Node.js, using Koa, that keeps both systems in sync automatically. It handles bi-directional data flow, ensures transactional integrity, and leverages Datadog observability for production reliability.

Atlas Bridge runs on a scheduled CronJob that checks for data changes every minute using the updated_at field in both databases to determine the most recent change.

System Architecture

Atlas Bridge (ETL + Koa) system architecture diagram

Highlights

Technical Challenges & Solutions

Code Snippets

// Paginate tenant Postgres (Knex) with offset/limit. Process each page in parallel,
// recurse until a short page. Stages ordered: users/staff → teams/categories → geofences/members → reconcile deletes.
// e.g. Promise.all([ batchProcess(users...), batchProcess(staff...), ... ])

async function batchProcess<R, P>(
  retrieve: (offset: number, limit: number) => Promise<R[]>,
  processor: (row: R) => Promise<P>,
  offset = 0,
  limit = 100,
): Promise<P[]> {
  const records = await retrieve(offset, limit);
  const processed = await Promise.all(records.map(processor));
  if (records.length === limit) {
    return [...processed, ...(await batchProcess(retrieve, processor, offset + limit, limit))];
  }
  return processed;
}

Batched, ordered ETL from tenant DB into platform services with explicit reconciliation passes.

// Per-tenant CronJob: labels for org/tenant, schedule from config, tight job history,
// single completion. Container runs the compiled sync entrypoint (not the HTTP server).
const cronJob = {
  kind: 'CronJob',
  metadata: { labels: { /* org id, tenant name, component: sync-job */ } },
  spec: {
    schedule: config.cron.schedule,
    failedJobsHistoryLimit: 0,
    successfulJobsHistoryLimit: 1,
    startingDeadlineSeconds: 120,
    jobTemplate: {
      spec: {
        completions: 1,
        template: {
          spec: {
            containers: [{
              name: 'sync',
              image: config.job.image,
              command: ['/nodejs/bin/node', '--require=dd-trace/init', 'src/job.js'],
              env: [ /* TENANT_*, DATABASE_*, service hostnames, … */ ],
            }],
            restartPolicy: 'OnFailure',
          },
        },
      },
    },
  },
};

Kubernetes CronJobs created/updated from Node; sync workload runs under dd-trace.

Impact & Metrics

Skills Demonstrated