Back to RoadmapsDatabase & Infrastructure Engineer
Master the systems that power every application
10 milestones in this roadmap
Step 1beginner6-8 weeks
Relational Database Fundamentals
Build rock-solid relational database skills that form the foundation of every data-driven application and infrastructure role.
Curriculum
- 1SQL deep dive: complex joins, subqueries, window functions, CTEs, and recursive queries
- 2Normalisation: 1NF through BCNF, denormalisation trade-offs, and practical schema design
- 3Indexing strategies: B-tree indexes, composite indexes, covering indexes, and partial indexes
- 4Query execution plans: EXPLAIN ANALYZE output, sequential vs index scans, and join algorithms
- 5Data types: choosing appropriate types, domain constraints, and custom types
- 6Transactions: ACID properties, isolation levels, phantom reads, and deadlock prevention
Tools & Platforms
PostgreSQLMySQL / MariaDBDBeaver / pgAdminSQLite (learning)
Step 1beginner6-8 weeks
Relational Database Fundamentals
Build rock-solid relational database skills that form the foundation of every data-driven application and infrastructure role.
Curriculum
- 1SQL deep dive: complex joins, subqueries, window functions, CTEs, and recursive queries
- 2Normalisation: 1NF through BCNF, denormalisation trade-offs, and practical schema design
- 3Indexing strategies: B-tree indexes, composite indexes, covering indexes, and partial indexes
- 4Query execution plans: EXPLAIN ANALYZE output, sequential vs index scans, and join algorithms
- 5
Step 2intermediate6-8 weeks
PostgreSQL Mastery
Develop deep expertise in the world's most advanced open-source database, the backbone of modern application infrastructure.
Curriculum
- 1Advanced SQL: lateral joins, GROUPING SETS, FILTER clause, and JSON/JSONB operations
- 2PL/pgSQL: stored procedures, functions, triggers, and custom aggregates
- 3Extensions: PostGIS, pg_trgm, pgcrypto, pg_stat_statements, and extension development
- 4Table partitioning: range, list, and hash partitioning, partition pruning, and maintenance
Step 3intermediate6-8 weeks
NoSQL Databases
Understand when and how to use NoSQL databases, learning to model data for each paradigm and choose the right database for the job.
Curriculum
- 1MongoDB document modeling: embedding vs referencing, schema design patterns, and aggregation pipeline
- 2Redis data structures: strings, hashes, lists, sets, sorted sets, streams, and pub/sub
- 3Cassandra wide-column: partition key design, clustering columns, compaction strategies, and consistency levels
- 4DynamoDB: single-table design, GSI/LSI strategies, capacity modes, and DynamoDB Streams
Step 4intermediate4-6 weeks
Database Performance Tuning
Develop the diagnostic and optimisation skills to keep databases performant as data volume and query complexity grow.
Curriculum
- 1Query optimisation: rewriting queries, materialised views, and query plan hints
- 2Index design: index selection methodology, index-only scans, and index maintenance overhead
- 3Slow query analysis: identifying problematic queries, pg_stat_statements analysis, and query profiling
- 4Connection pooling: PgBouncer transaction vs session mode, pool sizing, and connection limits
Step 5intermediate6-8 weeks
High Availability & Replication
Design and operate database infrastructure that survives hardware failures, network partitions, and data centre outages without data loss.
Curriculum
- 1Primary-replica replication: streaming replication setup, WAL shipping, and slot management
- 2Multi-master replication: conflict resolution, BDR (Bi-Directional Replication), and use cases
- 3Synchronous vs asynchronous: consistency-availability trade-offs, quorum commit, and lag monitoring
- 4Automated failover: Patroni cluster management, etcd consensus, and fencing mechanisms
Step 6advanced6-8 weeks
Data Streaming & Event Stores
Build real-time data pipelines that capture, process, and distribute data changes across the entire infrastructure stack.
Curriculum
- 1Apache Kafka architecture: brokers, partitions, replication factor, and ISR (in-sync replicas)
- 2Consumer groups: partition assignment, rebalancing strategies, and offset management
- 3Exactly-once semantics: idempotent producers, transactional messaging, and EOS configuration
- 4Schema Registry: Avro/Protobuf schema management, compatibility modes, and schema evolution
Step 7advanced6-8 weeks
Search & Analytics Engines
Add powerful search and analytics capabilities to your infrastructure stack with distributed search engines.
Curriculum
- 1Elasticsearch architecture: nodes, shards, replicas, and cluster state management
- 2Inverted indexes: analyzer chains, tokenizers, filters, and custom analyzer design
- 3Relevance scoring: BM25, function score, boosting, and search result ranking optimisation
- 4Aggregations: metric, bucket, and pipeline aggregations for analytics dashboards
Step 8advanced4-6 weeks
Database Security
Protect the most valuable asset in any organisation: its data, with defence-in-depth security practices specific to database systems.
Curriculum
- 1Encryption at rest: transparent data encryption, tablespace encryption, and key management
- 2Encryption in transit: SSL/TLS configuration, certificate management, and cipher suite selection
- 3Row-level security: policy design, per-user data isolation, and multi-tenant security patterns
- 4Audit logging: pgAudit extension, statement logging, and compliance-ready audit trails
Step 9advanced4-6 weeks
Infrastructure Automation
Apply infrastructure-as-code principles to database management, enabling repeatable, auditable, and version-controlled infrastructure changes.
Curriculum
- 1Terraform for databases: RDS/Cloud SQL provisioning, parameter group management, and state handling
- 2Ansible: database configuration management, playbook design, and idempotent database operations
- 3Configuration management: parameter tuning automation, consistency enforcement, and drift detection
- 4Blue-green database migrations: zero-downtime schema changes, dual-write patterns, and cutover strategies
Step 10advanced6-8 weeks
Distributed Database Systems
Explore the frontier of database technology with distributed systems that offer global scale while maintaining transactional consistency.
Curriculum
- 1Sharding strategies: hash-based, range-based, geographic, and directory-based sharding
- 2Consistent hashing: virtual nodes, rebalancing, and minimal data movement during scaling
- 3CAP theorem trade-offs: CP vs AP systems, real-world implications, and hybrid approaches
- 4CockroachDB: distributed SQL, multi-region deployment, and serialisable isolation at scale
Ready to start this journey?
Browse our courses and books to begin your learning path.
Data types: choosing appropriate types, domain constraints, and custom types
6Transactions: ACID properties, isolation levels, phantom reads, and deadlock preventionTools & Platforms
PostgreSQLMySQL / MariaDBDBeaver / pgAdminSQLite (learning)
5Logical replication: publication/subscription model, selective replication, and conflict resolution6Vacuum and autovacuum: dead tuple cleanup, bloat prevention, and autovacuum tuning parametersTools & Platforms
PostgreSQL 16+PgBouncer / Pgpool-IIpg_stat_statementspgBadger (log analysis)
Step 2intermediate6-8 weeks
PostgreSQL Mastery
Develop deep expertise in the world's most advanced open-source database, the backbone of modern application infrastructure.
Curriculum
- 1Advanced SQL: lateral joins, GROUPING SETS, FILTER clause, and JSON/JSONB operations
- 2PL/pgSQL: stored procedures, functions, triggers, and custom aggregates
- 3Extensions: PostGIS, pg_trgm, pgcrypto, pg_stat_statements, and extension development
- 4Table partitioning: range, list, and hash partitioning, partition pruning, and maintenance
- 5Logical replication: publication/subscription model, selective replication, and conflict resolution
- 6Vacuum and autovacuum: dead tuple cleanup, bloat prevention, and autovacuum tuning parameters
Tools & Platforms
PostgreSQL 16+PgBouncer / Pgpool-IIpg_stat_statementspgBadger (log analysis)
5Graph databases: Neo4j, property graph model, Cypher queries, and graph traversal patterns6Database selection: CAP theorem trade-offs, workload analysis, and polyglot persistence strategiesTools & Platforms
MongoDB Atlas / CompassRedis / Redis InsightApache Cassandra / ScyllaDBAmazon DynamoDB
Step 3intermediate6-8 weeks
NoSQL Databases
Understand when and how to use NoSQL databases, learning to model data for each paradigm and choose the right database for the job.
Curriculum
- 1MongoDB document modeling: embedding vs referencing, schema design patterns, and aggregation pipeline
- 2Redis data structures: strings, hashes, lists, sets, sorted sets, streams, and pub/sub
- 3Cassandra wide-column: partition key design, clustering columns, compaction strategies, and consistency levels
- 4DynamoDB: single-table design, GSI/LSI strategies, capacity modes, and DynamoDB Streams
- 5Graph databases: Neo4j, property graph model, Cypher queries, and graph traversal patterns
- 6Database selection: CAP theorem trade-offs, workload analysis, and polyglot persistence strategies
Tools & Platforms
MongoDB Atlas / CompassRedis / Redis InsightApache Cassandra / ScyllaDBAmazon DynamoDB
5Read replicas: read/write splitting, replica lag monitoring, and application-level routing6Caching layers: application cache, query result cache, Redis caching patterns, and cache invalidationTools & Platforms
pg_stat_statementsPgBouncerRedis (caching)Datadog / New Relic (DB monitoring)
Step 4intermediate4-6 weeks
Database Performance Tuning
Develop the diagnostic and optimisation skills to keep databases performant as data volume and query complexity grow.
Curriculum
- 1Query optimisation: rewriting queries, materialised views, and query plan hints
- 2Index design: index selection methodology, index-only scans, and index maintenance overhead
- 3Slow query analysis: identifying problematic queries, pg_stat_statements analysis, and query profiling
- 4Connection pooling: PgBouncer transaction vs session mode, pool sizing, and connection limits
- 5Read replicas: read/write splitting, replica lag monitoring, and application-level routing
- 6Caching layers: application cache, query result cache, Redis caching patterns, and cache invalidation
Tools & Platforms
pg_stat_statementsPgBouncerRedis (caching)Datadog / New Relic (DB monitoring)
5pgpool-II: connection pooling, load balancing, and watchdog for high availability6Disaster recovery: backup strategies (pg_basebackup, pgBackRest), PITR, and RTO/RPO planningTools & Platforms
PatronipgBackRest / Barmanetcd (consensus store)pgpool-IIrepmgr
Step 5intermediate6-8 weeks
High Availability & Replication
Design and operate database infrastructure that survives hardware failures, network partitions, and data centre outages without data loss.
Curriculum
- 1Primary-replica replication: streaming replication setup, WAL shipping, and slot management
- 2Multi-master replication: conflict resolution, BDR (Bi-Directional Replication), and use cases
- 3Synchronous vs asynchronous: consistency-availability trade-offs, quorum commit, and lag monitoring
- 4Automated failover: Patroni cluster management, etcd consensus, and fencing mechanisms
- 5pgpool-II: connection pooling, load balancing, and watchdog for high availability
- 6Disaster recovery: backup strategies (pg_basebackup, pgBackRest), PITR, and RTO/RPO planning
Tools & Platforms
PatronipgBackRest / Barmanetcd (consensus store)pgpool-IIrepmgr
5Kafka Connect: source and sink connectors, single message transforms, and dead letter queues6Debezium CDC: change data capture from PostgreSQL, MySQL, and MongoDB with outbox patternTools & Platforms
Apache Kafka / Confluent PlatformDebeziumSchema RegistryKafka ConnectksqlDB
Step 6advanced6-8 weeks
Data Streaming & Event Stores
Build real-time data pipelines that capture, process, and distribute data changes across the entire infrastructure stack.
Curriculum
- 1Apache Kafka architecture: brokers, partitions, replication factor, and ISR (in-sync replicas)
- 2Consumer groups: partition assignment, rebalancing strategies, and offset management
- 3Exactly-once semantics: idempotent producers, transactional messaging, and EOS configuration
- 4Schema Registry: Avro/Protobuf schema management, compatibility modes, and schema evolution
- 5Kafka Connect: source and sink connectors, single message transforms, and dead letter queues
- 6Debezium CDC: change data capture from PostgreSQL, MySQL, and MongoDB with outbox pattern
Tools & Platforms
Apache Kafka / Confluent PlatformDebeziumSchema RegistryKafka ConnectksqlDB
5Index lifecycle management: hot-warm-cold architecture, rollover policies, and data retention6Cluster management: shard allocation, node roles, rolling upgrades, and capacity planningTools & Platforms
Elasticsearch / OpenSearchKibana / OpenSearch DashboardsLogstash / FluentdCerebro (cluster admin)
Step 7advanced6-8 weeks
Search & Analytics Engines
Add powerful search and analytics capabilities to your infrastructure stack with distributed search engines.
Curriculum
- 1Elasticsearch architecture: nodes, shards, replicas, and cluster state management
- 2Inverted indexes: analyzer chains, tokenizers, filters, and custom analyzer design
- 3Relevance scoring: BM25, function score, boosting, and search result ranking optimisation
- 4Aggregations: metric, bucket, and pipeline aggregations for analytics dashboards
- 5Index lifecycle management: hot-warm-cold architecture, rollover policies, and data retention
- 6Cluster management: shard allocation, node roles, rolling upgrades, and capacity planning
Tools & Platforms
Elasticsearch / OpenSearchKibana / OpenSearch DashboardsLogstash / FluentdCerebro (cluster admin)
5Backup strategies: full, incremental, and differential backups, backup verification, and offsite storage6Point-in-time recovery: WAL archiving, PITR procedures, and recovery testing drillsTools & Platforms
pgAuditHashiCorp Vault (key management)pgBackRestAWS KMS / Azure Key Vault
Step 8advanced4-6 weeks
Database Security
Protect the most valuable asset in any organisation: its data, with defence-in-depth security practices specific to database systems.
Curriculum
- 1Encryption at rest: transparent data encryption, tablespace encryption, and key management
- 2Encryption in transit: SSL/TLS configuration, certificate management, and cipher suite selection
- 3Row-level security: policy design, per-user data isolation, and multi-tenant security patterns
- 4Audit logging: pgAudit extension, statement logging, and compliance-ready audit trails
- 5Backup strategies: full, incremental, and differential backups, backup verification, and offsite storage
- 6Point-in-time recovery: WAL archiving, PITR procedures, and recovery testing drills
Tools & Platforms
pgAuditHashiCorp Vault (key management)pgBackRestAWS KMS / Azure Key Vault
5Schema versioning: Flyway, Liquibase, and sqitch for version-controlled database migrations6GitOps for databases: schema review workflows, migration CI/CD, and automated testingTools & Platforms
TerraformAnsibleFlyway / LiquibaseSchemaHeroAtlas (schema management)
Step 9advanced4-6 weeks
Infrastructure Automation
Apply infrastructure-as-code principles to database management, enabling repeatable, auditable, and version-controlled infrastructure changes.
Curriculum
- 1Terraform for databases: RDS/Cloud SQL provisioning, parameter group management, and state handling
- 2Ansible: database configuration management, playbook design, and idempotent database operations
- 3Configuration management: parameter tuning automation, consistency enforcement, and drift detection
- 4Blue-green database migrations: zero-downtime schema changes, dual-write patterns, and cutover strategies
- 5Schema versioning: Flyway, Liquibase, and sqitch for version-controlled database migrations
- 6GitOps for databases: schema review workflows, migration CI/CD, and automated testing
Tools & Platforms
TerraformAnsibleFlyway / LiquibaseSchemaHeroAtlas (schema management)
5TiDB: MySQL-compatible distributed database, TiKV storage layer, and HTAP workloads6Global databases: multi-region replication, conflict-free data types, and global transaction protocolsTools & Platforms
CockroachDBTiDBYugabyteDBVitess (MySQL sharding)Citus (PostgreSQL sharding)
Step 10advanced6-8 weeks
Distributed Database Systems
Explore the frontier of database technology with distributed systems that offer global scale while maintaining transactional consistency.
Curriculum
- 1Sharding strategies: hash-based, range-based, geographic, and directory-based sharding
- 2Consistent hashing: virtual nodes, rebalancing, and minimal data movement during scaling
- 3CAP theorem trade-offs: CP vs AP systems, real-world implications, and hybrid approaches
- 4CockroachDB: distributed SQL, multi-region deployment, and serialisable isolation at scale
- 5TiDB: MySQL-compatible distributed database, TiKV storage layer, and HTAP workloads
- 6Global databases: multi-region replication, conflict-free data types, and global transaction protocols
Tools & Platforms
CockroachDBTiDBYugabyteDBVitess (MySQL sharding)Citus (PostgreSQL sharding)