Why CI/CD Matters
Traditional Deployment (2010):
Developer codes → Manual testing → Manual build →
Manual deployment → Hope it works → It breaks →
Rollback manually → Repeat
Time: Days to weeks
Risk: High
Stress: Maximum
Modern CI/CD (2022):
Developer commits → Automated tests → Automated build →
Automated deployment → Monitoring alerts if issues →
Automated rollback if needed
Time: Minutes
Risk: Low
Stress: Minimal
Real Impact:
Company: Etsy (2022 Data)
- Deployments per day: 50+
- Time to deploy: 15 minutes
- Failed deployments: < 1%
- Rollback time: 2 minutes
Company: Amazon
- Deployments per second: 1 (during peak)
- Automated: 99%
- Downtime: Near zero
This guide shows you how.
CI/CD Fundamentals
Continuous Integration (CI):
- Automatically test every code change
- Merge to main branch frequently
- Catch bugs early
Continuous Deployment (CD):
- Automatically deploy passing builds
- Production updates multiple times daily
- Reduce deployment risk
Benefits:
Faster releases (hours → minutes)
Fewer bugs (caught automatically)
Less stress (automated, repeatable)
Better quality (tested every commit)
Quick rollbacks (automated)
Choosing CI/CD Platform
GitHub Actions (Best for GitHub repos)
- Free for public repos
- Integrated with GitHub
- Huge marketplace
- Easy YAML syntax
- Locked to GitHub
GitLab CI/CD (Best for GitLab)
- Integrated with GitLab
- Auto DevOps feature
- Built-in container registry
- Free tier generous
- Locked to GitLab
Jenkins (Most flexible)
- Open source & free
- Works with any VCS
- Massive plugin ecosystem
- Self-hosted (full control)
- Requires setup/maintenance
CircleCI (Easy to use)
- Fast builds
- Docker-native
- Good free tier
- Can get expensive
This guide covers GitHub Actions (most popular 2022) + Jenkins.
GitHub Actions Basics
Workflow File:
.github/workflows/ci.yml
name: CI Pipeline
# Trigger on push or PR to main
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
How It Works:
- Developer pushes code
- GitHub triggers workflow
- Checks out code
- Sets up Node.js
- Installs dependencies
- Runs linter & tests
- Reports results
Result: Automatic testing on every commit.
Complete Node.js CI/CD Pipeline
Full Pipeline:
- Lint code
- Run tests
- Build application
- Build Docker image
- Push to registry
- Deploy to staging
- Run integration tests
- Deploy to production
.github/workflows/deploy.yml:
name: Build and Deploy
on:
push:
branches: [ main ]
env:
NODE_VERSION: '18'
DOCKER_IMAGE: myapp
DOCKER_REGISTRY: ghcr.io
REGISTRY_USER: ${{ github.actor }}
jobs:
# Job 1: Test
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run unit tests
run: npm test
- name: Run integration tests
run: npm run test:integration
env:
DATABASE_URL: postgresql://test:test@localhost:5432/test
services:
postgres:
image: postgres:15
env:
POSTGRES_DB: test
POSTGRES_USER: test
POSTGRES_PASSWORD: test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
# Job 2: Build
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.DOCKER_REGISTRY }}
username: ${{ env.REGISTRY_USER }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.DOCKER_REGISTRY }}/${{ github.repository }}
tags: |
type=ref,event=branch
type=sha,prefix={{branch}}-
type=semver,pattern={{version}}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Job 3: Deploy to Staging
deploy-staging:
needs: build
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy to staging
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.STAGING_HOST }}
username: ${{ secrets.STAGING_USER }}
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
cd /app
docker-compose pull
docker-compose up -d
docker-compose exec -T app npm run migrate
- name: Wait for deployment
run: sleep 30
- name: Run smoke tests
run: |
curl -f https://staging.myapp.com/health || exit 1
curl -f https://staging.myapp.com/api/status || exit 1
# Job 4: Deploy to Production
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.PROD_HOST }}
username: ${{ secrets.PROD_USER }}
key: ${{ secrets.PROD_SSH_KEY }}
script: |
cd /app
docker-compose pull
docker-compose up -d --no-deps --build app
docker-compose exec -T app npm run migrate
- name: Verify deployment
run: |
curl -f https://myapp.com/health || exit 1
- name: Notify team
uses: 8398a7/action-slack@v3
if: always()
with:
status: ${{ job.status }}
text: 'Production deployment ${{ job.status }}'
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
Features:
- Parallel testing
- Docker image caching
- Staged deployment (staging → production)
- Smoke tests
- Slack notifications
- Manual approval for production
Multi-Environment Strategy
Environments:
1. Development
- Every branch deployed automatically
- Ephemeral (deleted after merge)
- Quick feedback
2. Staging
- Mirror of production
- All changes deployed here first
- Integration testing
3. Production
- Manual approval required
- Blue-green or canary deployment
- Rollback ready
Branch Strategy:
main (production)
├── develop (staging)
├── feature/user-auth
├── feature/payment
└── bugfix/login-issue
Workflow:
Feature branch → CI tests pass →
Merge to develop → Deploy to staging →
Integration tests pass →
Merge to main → Deploy to production
Database Migrations in CI/CD
Challenge: Database changes with zero downtime
Solution: Backward-Compatible Migrations
Bad approach:
-- Breaking change
ALTER TABLE users DROP COLUMN old_email;
ALTER TABLE users ADD COLUMN email VARCHAR(255);
Result: Application breaks between migration and code deploy.
Good Approach:
Phase 1: Add new column
-- Migration 1: Add new column
ALTER TABLE users ADD COLUMN email VARCHAR(255);
Phase 2: Dual-write (code change)
// Write to both columns
await db.query(
'UPDATE users SET old_email = $1, email = $1 WHERE id = $2',
[email, userId]
);
Phase 3: Backfill data
-- Migration 2: Copy data
UPDATE users SET email = old_email WHERE email IS NULL;
Phase 4: Switch reads (code change)
// Read from new column
const user = await db.query('SELECT email FROM users WHERE id = $1', [userId]);
Phase 5: Remove old column (weeks later)
-- Migration 3: Drop old column
ALTER TABLE users DROP COLUMN old_email;
In Pipeline:
- name: Run migrations
run: npm run migrate
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
- name: Verify migration
run: npm run migrate:verify
Blue-Green Deployment
Strategy: Two identical environments
Blue: Current production
Green: New version
Process:
1. Deploy to Green
2. Test Green
3. Switch traffic to Green
4. Keep Blue as rollback
5. If issues: Switch back to Blue (instant!)
6. If good: Blue becomes next Green
Implementation:
docker-compose.yml:
version: '3.9'
services:
app-blue:
image: myapp:stable
container_name: app-blue
ports:
- "3001:3000"
app-green:
image: myapp:latest
container_name: app-green
ports:
- "3002:3000"
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
nginx.conf (initially pointing to Blue):
upstream backend {
server app-blue:3000; # Traffic to Blue
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
Deployment Script:
#!/bin/bash
# Deploy to Green
docker-compose up -d app-green
# Wait for health check
sleep 10
curl -f http://localhost:3002/health || exit 1
# Switch traffic to Green
sed -i 's/app-blue/app-green/g' nginx.conf
docker-compose up -d --no-deps nginx
# Verify
curl -f http://localhost/health || {
# Rollback
sed -i 's/app-green/app-blue/g' nginx.conf
docker-compose up -d --no-deps nginx
exit 1
}
# Success - stop old Blue
docker-compose stop app-blue
Canary Deployment
Strategy: Gradual rollout
Process:
1. Deploy new version to 10% of servers
2. Monitor metrics (errors, latency)
3. If good: Increase to 50%
4. If good: Increase to 100%
5. If issues at any point: Rollback
Implementation with Kubernetes:
# Stable deployment (90% of traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
version: stable
template:
metadata:
labels:
app: myapp
version: stable
spec:
containers:
- name: myapp
image: myapp:1.0
# Canary deployment (10% of traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
version: canary
template:
metadata:
labels:
app: myapp
version: canary
spec:
containers:
- name: myapp
image: myapp:2.0
# Service (routes to both)
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp # Matches both stable and canary
ports:
- port: 80
targetPort: 3000
Gradual Promotion:
# 10% canary
kubectl scale deployment myapp-stable --replicas=9
kubectl scale deployment myapp-canary --replicas=1
# Monitor metrics for 30 minutes
# 50% canary
kubectl scale deployment myapp-stable --replicas=5
kubectl scale deployment myapp-canary --replicas=5
# Monitor metrics for 30 minutes
# 100% canary
kubectl scale deployment myapp-stable --replicas=0
kubectl scale deployment myapp-canary --replicas=10
Feature Flags
Problem: Want to deploy code without releasing feature
Solution: Feature flags
LaunchDarkly / Unleash / Flagsmith:
const featureFlags = require('./featureFlags');
app.get('/checkout', async (req, res) => {
const userId = req.user.id;
// Check if new checkout enabled for this user
const newCheckout = await featureFlags.isEnabled('new-checkout', userId);
if (newCheckout) {
return res.render('checkout-v2');
} else {
return res.render('checkout-v1');
}
});
Benefits:
- Deploy anytime (flag off)
- Test in production (flag on for internal users)
- Gradual rollout (flag on for 10%, then 50%, then 100%)
- Instant rollback (toggle flag off)
- A/B testing (compare metrics)
Flagsmith Example:
const Flagsmith = require('flagsmith-nodejs');
const flagsmith = new Flagsmith({
environmentKey: process.env.FLAGSMITH_KEY
});
async function checkoutHandler(req, res) {
// Get flags for this user
const flags = await flagsmith.getIdentityFlags(req.user.id);
// Check specific flag
if (flags.isFeatureEnabled('new_checkout')) {
// New checkout flow
return newCheckout(req, res);
}
// Old checkout flow
return oldCheckout(req, res);
}
Automated Rollback
Monitoring Triggers:
# prometheus-rules.yml
groups:
- name: deployment_alerts
rules:
# High error rate
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
labels:
severity: critical
annotations:
summary: "High error rate detected"
# Slow response time
- alert: SlowResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
labels:
severity: warning
annotations:
summary: "95th percentile latency > 2s"
Auto-Rollback Script:
const prometheus = require('prom-client');
const { exec } = require('child_process');
// Check metrics every minute
setInterval(async () => {
const errorRate = await getErrorRate();
const latency = await getLatency();
if (errorRate > 0.05 || latency > 2000) {
console.log('Metrics degraded - rolling back!');
// Rollback deployment
exec('kubectl rollout undo deployment/myapp', (error, stdout) => {
if (error) {
console.error('Rollback failed:', error);
// Alert team
sendSlackAlert('ROLLBACK FAILED - MANUAL INTERVENTION NEEDED');
} else {
console.log('Rollback successful');
sendSlackAlert('Automatic rollback executed');
}
});
}
}, 60000);
Security in CI/CD
Secret Management:
Never commit secrets:
// BAD
const API_KEY = 'sk_live_abc123';
Use environment variables:
// GOOD
const API_KEY = process.env.API_KEY;
GitHub Secrets:
# In workflow file
steps:
- name: Deploy
env:
API_KEY: ${{ secrets.API_KEY }}
DATABASE_URL: ${{ secrets.DATABASE_URL }}
run: npm run deploy
Add secret:
GitHub Repo → Settings → Secrets and variables → Actions →
New repository secret
Dependency Scanning:
# Scan for vulnerabilities
- name: Run Snyk security scan
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
# Or use npm audit
- name: Audit dependencies
run: npm audit --production --audit-level=moderate
SAST (Static Application Security Testing):
- name: Run CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
languages: javascript
Container Scanning:
- name: Scan Docker image
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:latest
format: 'sarif'
output: 'trivy-results.sarif'
Monitoring Deployments
Key Metrics:
1. Deployment Frequency
- How often deploying?
- Target: Multiple times per day (elite)
2. Lead Time
- Commit to production time
- Target: < 1 hour (elite)
3. Change Failure Rate
- % of deployments causing issues
- Target: < 15% (elite)
4. Time to Restore
- How fast can you recover?
- Target: < 1 hour (elite)
DORA Metrics Dashboard:
# .github/workflows/dora-metrics.yml
name: DORA Metrics
on:
deployment:
deployment_status:
jobs:
track-metrics:
runs-on: ubuntu-latest
steps:
- name: Track deployment
uses: dorametrics/track-deployment@v1
with:
api_key: ${{ secrets.DORA_API_KEY }}
environment: ${{ github.event.deployment.environment }}
status: ${{ github.event.deployment_status.state }}
Complete Real-World Example
Scenario: E-commerce platform
Stack: Node.js + PostgreSQL + Redis
Requirements: Zero-downtime deployments
Pipeline:
- Commit → Triggers GitHub Actions
- Lint → ESLint (1 min)
- Unit Tests → Jest (2 min)
- Integration Tests → With test DB (3 min)
- Build Docker → Multi-stage build (4 min)
- Security Scan → Snyk + Trivy (2 min)
- Deploy Staging → Auto-deploy (1 min)
- Smoke Tests → Health checks (1 min)
- Manual Approval → Slack notification
- Deploy Production → Blue-green (2 min)
- Monitor → 15 min observation
- Auto-Rollback → If metrics degrade
Total Time: ~16 minutes (commit to production)
Full Pipeline File:
.github/workflows/production.yml
name: Production Pipeline
on:
push:
branches: [ main ]
env:
NODE_VERSION: '18'
DOCKER_REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 10
services:
postgres:
image: postgres:15
env:
POSTGRES_DB: test
POSTGRES_USER: test
POSTGRES_PASSWORD: test
options: >-
--health-cmd pg_isready
--health-interval 10s
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- run: npm ci
- name: Lint
run: npm run lint
- name: Unit tests
run: npm run test:unit
- name: Integration tests
run: npm run test:integration
env:
DATABASE_URL: postgresql://test:test@localhost:5432/test
REDIS_URL: redis://localhost:6379
- name: Upload coverage
uses: codecov/codecov-action@v3
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Snyk
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
- name: Audit dependencies
run: npm audit --production
build:
needs: [test, security]
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v2
with:
registry: ${{ env.DOCKER_REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.DOCKER_REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix={{branch}}-
- uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ steps.meta.outputs.tags }}
deploy-staging:
needs: build
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.STAGING_HOST }}
username: ${{ secrets.STAGING_USER }}
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
cd /app
docker-compose pull
docker-compose up -d
docker-compose exec -T app npm run migrate
- name: Smoke tests
run: |
sleep 30
curl -f https://staging.myapp.com/health
curl -f https://staging.myapp.com/api/products
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy Blue-Green
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.PROD_HOST }}
username: ${{ secrets.PROD_USER }}
key: ${{ secrets.PROD_SSH_KEY }}
script: |
/app/scripts/blue-green-deploy.sh ${{ needs.build.outputs.image-tag }}
- name: Monitor deployment
run: |
sleep 60
# Check error rate
ERROR_RATE=$(curl -s http://prometheus:9090/api/v1/query?query=rate... | jq ...)
if [ $ERROR_RATE > 0.05 ]; then
echo "High error rate detected!"
exit 1
fi
- name: Notify team
uses: 8398a7/action-slack@v3
if: always()
with:
status: ${{ job.status }}
text: |
Production Deployment ${{ job.status }}
Image: ${{ needs.build.outputs.image-tag }}
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
Conclusion: CI/CD is Essential
2022 Reality:
- Manual deployments are obsolete
- CI/CD is standard practice
- Elite performers deploy multiple times daily
Benefits:
- Faster releases (minutes instead of days)
- Higher quality (automated testing)
- Less stress (automation reduces human error)
- Quick recovery (automated rollbacks)
- Better collaboration (everyone sees the pipeline)
Getting Started:
- Week 1: Set up basic CI (automated tests)
- Week 2: Add automated builds
- Week 3: Deploy to staging automatically
- Week 4: Deploy to production with approval
- Month 2: Add blue-green or canary deployments
- Month 3: Implement feature flags
- Month 4: Add automated rollbacks
Don’t try to do everything at once. Iterate and improve.
Key Takeaways:
- CI/CD pipelines automate testing, building, and deployment
- GitHub Actions provides easy pipeline creation for GitHub repos
- Blue-green deployments enable zero-downtime releases
- Canary deployments allow gradual rollouts with risk mitigation
- Feature flags decouple deployment from release
- Automated rollbacks reduce recovery time
- Database migrations must be backward-compatible
- Security scanning should be part of every pipeline
- Monitor DORA metrics to measure DevOps performance
- Start simple and iterate - perfect is the enemy of good
Need help building your CI/CD pipeline?
We’ve implemented CI/CD for 100+ companies. Free pipeline architecture consultation.
[Schedule DevOps Consultation →]