Production

Operations

March 26, 2026

Defines how GPU compute capacity is provisioned, operated, and scaled across 9 production stages with 3 FTE remote team, $45M CAPEX, $1M/yr OPEX.

Icon
9
Production Stages
288
GPUs per Facility
6,912
GPU-hr / Day Max
3
FTE (Remote)
$4.3M
Revenue / FTE / Year
85%
Target Utilization
01 Production Process
1
Site Selection & Energy Validation
Identify HPP with ≥1 MW minimum guaranteed output. Validate BTM interconnection, river water cooling (4-12°C), seismic profile, connectivity corridor.
2
JV Structuring & Legal Setup
Establish Joint Venture with HPP owner. Ring-fence operational entity (1For AI LLC, Georgia Virtual Zone). Formalize $0 effective energy cost.
3
Infrastructure Procurement
Order 4x NVIDIA GR200 NVL72 racks, InfiniBand NDR800, NVMe storage (~1.5 PB), DLC cooling, power distribution (UPS 200 kW).
4
Civil Works & Facility Build
Deploy prefab modular containers (seismic-rated). Power infra (grid-ready, no transformer CAPEX). Fiber corridor ($500K) + Starlink backup.
5
Hardware Integration
Rack installation (4x NVL72 @ ~155 kW/rack). DLC loop to river water heat exchanger. InfiniBand fat-tree topology. Storage provisioning.
6
Software Stack Deployment
OS/firmware, NVIDIA AI Enterprise licensing, Kubernetes/Slurm orchestration, monitoring & telemetry, remote NOC tooling.
7
Testing & Burn-in
Full-load GPU stress test (288 GPUs, ~640 kW). Cooling validation. Network benchmarking. Failover/UPS testing. Security audit.
8
Client Onboarding
Anchor client integration. Network peering / VPN setup. SLA finalization. Workload migration or initial deployment.
9
Steady-State Operations
Remote NOC 24/7. NVIDIA hardware maintenance. Cooling maintenance. Performance reporting. SLA compliance tracking.
02 CAPEX Breakdown - $45M Total
$45MTOTAL CAPEX
GPU Compute (4x GR200 NVL72)
$26.5M
Networking (IB NDR800)
$4.8M
Storage (NVMe 1.5 PB)
$2.1M
DLC Cooling System
$1.6M
Power Infrastructure
$1.06M
Civil / Prefab Containers
$1.6M
Connectivity + Software + Integration
$2.0M
Contingency
$5.34M
03 Annual OPEX - $1.0M / Year
NVIDIA Hardware Maintenance$400K (40%)
40%
Insurance + Connectivity + Visits$185K (18.5%)
19%
Software Licenses$150K (15%)
15%
Remote NOC + Managed Ops$120K (12%)
12%
Additional (Starlink, consumables, security)~$30K (3%)
Electricity: $0 (BTM hydro JV) · OPEX Growth: 3% / yr · Energy = $0 is structural, not market-dependent
04 Process Strengths & Limitations
Strengths
$0 energy cost - BTM hydro eliminates 30-50% industry OPEX
Zero-carbon structural - verifiable at source, not offset-based
DLC + river water cooling - PUE <1.11, winter = optimal season
Modular prefab - replicable at new HPP sites without redesign
Remote NOC - 3 FTE, $120K/yr, no on-site permanent staff
Dedicated client model - no multi-tenant noise, deep SLA alignment
Single-vendor NVIDIA - simplified maintenance and spare parts
Limitations
NVIDIA single-vendor dependency + BIS export control risk
Client concentration - 1-2 clients/facility = revenue SPOF
Remote mountain location - longer hardware replacement lead times
HPP output ceiling - 1 MW per site = hard capacity limit
Connectivity - $500K fiber build + Starlink latency limitations
Per-site revenue ceiling - scaling requires new HPP sites + $45M each
05 Production Workforce
RoleCountTypeKey Skills
Remote NOC Operator2Full-time, RemoteLinux, InfiniBand, DCGM, incident response
Infrastructure Engineer1Full-time, RemoteDLC cooling, power, NVIDIA-certified, HW lifecycle
Total Production FTE3Flat structure → Operations Lead → CEO
Build-phase temporary: 5-10 construction + 3-5 integration · Zero temp workers in steady-state
Cross-functional requirement: energy infrastructure knowledge (BTM hydro, PUE)
06 Capacity Utilization
85%
GPU Compute
91%
Power (1 MW)
75%
DLC Cooling
65%
Workforce
07 Capacity Track Allocation
20%
Pay-per-Use
~1,382 GPU-hr/day
$5-7 / GPU-hr
30%
Reserved
~2,074 GPU-hr/day
$3.5-4.5 / GPU-hr
50%
Private AI Factory
~3,456 GPU-hr/day
$1.0M / month
08 Quality Control Cycle
Continuous
GPU DCGM telemetry, InfiniBand health, DLC temperature, power/UPS monitoring, NVMe SMART metrics
Weekly
GPU stress test on uncommitted capacity. DLC coolant quality check.
Monthly
Full infrastructure health report. SLA performance review with client.
Quarterly
Planned maintenance window: firmware updates, hardware inspection, cooling service.
Annual
Comprehensive facility audit: seismic, electrical, cooling systems integrity.
09 Expansion Model
Within Current Site
Not feasible - 1 MW BTM fully allocated
New HPP Sites
Replicate 1 MW facility
$45M CAPEX / site
12-18 months to ops
Key Constraints
Capital ($45M/site)
HPP availability
NVIDIA supply
Anchor client before FID