addc01 · domain controller · promoted
Windows Server 2025 Server Manager showing AD DS, DNS, and File and Storage Services roles online after the first domain controller was promoted
// cover · ADDC01 minutes after promotion — Active Directory and DNS online, the forest root alive. Nothing on this screen was clicked by hand.

Most people stand up a Windows network by clicking through setup wizards for the better part of a day — install the server, promote it, configure DNS, set up address handouts, join the PCs, then do the second location all over again. This project does all of it from code. You point it at blank installer files and walk away; about an hour later there's a complete two-site corporate network — two domain controllers, DNS, DHCP, a certificate authority, and Windows and Linux workstations, all joined to the same domain and talking to each other over a simulated long-distance link. Then, to prove it's real, it deliberately takes a whole site offline and shows the network keeps running.

Built to production-grade standards — every machine is described in Ansible, every step re-runs safely, nothing is configured by hand, and the finished environment checks itself at the end. This page is the plain-English tour; the code and the deep technical detail live on GitHub.

01 // The Challenge

Active Directory is the backbone of most company networks. It's what decides who can log in, what they're allowed to touch, and which security rules apply to every machine. Setting one up well — let alone two locations that back each other up — is slow, manual, and easy to get subtly wrong in ways you won't notice until something breaks.

And there's a part almost everyone skips: actually proving it survives a failure. A domain controller you've never tested failing over is a domain controller you're hoping will fail over. So the goal here was twofold: build a realistic, multi-site Active Directory environment entirely from code — and then actually rehearse the disaster, on purpose, before a real one ever happens.

02 // From a Script to a Self-Building Data Center

An earlier project of mine automated a single domain controller with a PowerShell script — it turned a half-day of click-ops into a 15-minute run for a managed-services team. (That write-up is here: Automating AD Domain Controller Provisioning for MSPs.) It solved one machine, on a server that already existed.

This is the next order of magnitude. Not one server configured by a script, but an entire data center built from nothing: two sites, two domain controllers, multiple operating systems, the network router between them, and the disaster-recovery story — all described as code and assembled from bare install media. The jump is from "automate a task on a box" to "automate the box, the network, and the recovery plan too."

03 // How It Works

The whole thing runs as virtual machines on a single physical host. Here's the shape of what gets built:

   corp.markandrewmarquez.com   ·   one forest, one domain, built from code

   HQ SITE — 10.10.0.0/24
     ADDC01   · Windows Server 2025 · all 5 FSMO roles + Global Catalog
              · DNS · DHCP · AD Certificate Services · WSUS · NTP · GPO
     CLIENT01 · Windows 11 Enterprise        (domain-joined)
     UBUNTU01 · Ubuntu 24.04                 (domain-joined via realmd/sssd)
        │
        │   VyOS router  ·  ~40 ms simulated wide-area link
        │   AD replication every 15 min · cross-site DNS · DHCP failover routes
        ▼
   BRANCH SITE — 10.20.0.0/24
     ADDC02   · Windows Server 2025 · replica DC + Global Catalog
              · self-first DNS · standby DHCP failover
              · clients fail over here automatically when HQ goes dark

   ───────────────────────────────────────────────────────────────────────
   Every box starts as blank install media. Ansible (23 roles) provisions the
   whole topology on KVM/libvirt — UEFI Secure Boot + TPM 2.0 — and verifies
   it end to end.  Full two-site build: 60–75 minutes from bare media.

The moving parts, in plain terms:

04 // The Build, in Pictures

It starts with nothing but an installer file. From there, no human touches the keyboard.

server 2025 · setup · launching
Windows Server 2025 setup starting unattended with the 'We're getting a few things ready' screen
// the VM powers on and Windows Setup begins on its own — driven by an answer file, no clicks

An Autounattend.xml answer file drives Windows setup start to finish (Linux uses the equivalent, cloud-init). The installer runs through unattended:

installing · 83% · unattended
Windows Server installing unattended at 83 percent complete
// unattended install in progress — it reboots itself as many times as it needs to

About twelve minutes after power-on, the machine is sitting at a working desktop with remote management (WinRM for Windows, SSH for Linux) already switched on — which is how Ansible takes over from here.

first boot · auto-logon · desktop
Windows Server reaching the desktop automatically after an unattended install and auto-logon
// ~12 minutes from blank media to a logged-in, remotely-manageable desktop

Before it's promoted to a domain controller, the server is already fully patched — the cumulative updates were slipstreamed straight into the install image, so it's current on its very first boot. At this point it's just a clean, up-to-date server with a single role.

patched · pre-promotion · 1 role
Server Manager showing a single role on a fully patched Windows Server before domain controller promotion
// fully patched, one role — the calm before promotion

Then Ansible promotes it into the first domain controller of a brand-new forest, turning on Active Directory and DNS in the process — that's the cover shot at the top of this page. With HQ's controller alive, the workstations join the domain. A Windows 11 machine joins the normal way:

client01 · windows 11 · domain-joined
A domain-joined Windows 11 Enterprise desktop in the lab
// CLIENT01 — a Windows 11 Enterprise workstation, joined to the domain

…and so does a Linux box. The Ubuntu server authenticates against Active Directory through realmd and sssd — same domain logins, same security groups, real Kerberos — exactly like the Windows machines. The terminal below is the proof: it's a recognized domain member, it can see an admin's AD group memberships, and it even picked up its reserved address from the same DHCP server:

ubuntu01 · realmd / sssd · domain-joined
Ubuntu terminal showing realm list confirming Active Directory membership, an AD user's group memberships via id, a reserved DHCP address, and sssd active
// UBUNTU01 — Linux logging into Active Directory: kerberos-member, AD groups resolved, DHCP-reserved IP, sssd running

05 // The Real Test: Disaster Recovery

Building it is only half the project. The other half is breaking it — on purpose, in a controlled way — to prove the redundancy isn't just on paper. There are two drills.

Drill one — the site goes dark. I gracefully power off the HQ domain controller, simulating an outage at the main office, and then check that the branch carries the load: logins still work, DNS still resolves, addresses still get handed out, and clients automatically find the branch domain controller across the wide-area link. The screenshot below is a client proving exactly that — it follows its DHCP-delivered backup route over the VyOS link and resolves the whole directory through the branch DC.

cross-site failover · verified
PowerShell on CLIENT01 verifying the cross-site failover path: a DHCP-delivered route to the branch over the VyOS link, the branch DC reachable, and the forest resolving via the branch DC
// HQ is down, and CLIENT01 still works — reaching the Branch DC over the WAN via its DHCP backup route, resolving the forest cross-site. Failover VERIFIED.

Drill two — HQ isn't coming back. This is the nastier scenario, and the one most teams have never practiced. I clone the branch domain controller into a sealed-off network and rehearse seizing all five FSMO roles — think of those as the domain's crown-jewel responsibilities that normally live on one controller. Doing it for the first time during a real outage is how mistakes happen; doing it here, in isolation, turns it into muscle memory. Because the rehearsal runs in a sandbox, it can never accidentally touch the live forest.

The point of both drills is simple: backups you've never restored aren't really backups, and failover you've never triggered is just a hope. Both are scripted and written down as a runbook, so the recovery steps are tested and repeatable rather than improvised.

06 // Results & What I Took From It

▸ build
Two sites in ~70 minA complete two-site Active Directory forest — DCs, DNS, DHCP, certificates, and clients — from bare install media to verified, in about an hour.
▸ code
23 Ansible rolesEvery machine and the network between them is described as code. Re-running is safe — the control host is never touched by the automation.
▸ boot
~12 min, bare media → managedUnattended Windows and Linux installs reach a working, remotely-manageable state with no one at the keyboard.
▸ patched
Current on first bootUpdates are slipstreamed into the install image, so the domain controllers are fully patched before they're ever promoted.
▸ dr
2 disaster drillsA live site-failover and an isolated FSMO-seizure rehearsal — the recovery plan is tested, not assumed.
▸ verify
Checks itself at the endReplication round-trips and smoke tests confirm the two sites are really in sync and authenticating across the link.

The full source — the Ansible roles and playbooks, the unattended-install assets, the router and DHCP configuration, the disaster-recovery runbook, and the architecture notes — is on GitHub at github.com/marky224/windows-ad-ansible-kvm. The repository is source-available for review: proprietary, all rights reserved, no reuse without prior written permission.