VRAM ESTIMATION
Ghost Scan analyzes your training script to estimate peak VRAM usage. No GPU required. Runs locally in seconds. Know whether your job will fit before you burn a dollar of compute.
FROM ALLOCGUY
Here's a number that surprised me: the average ML team runs 10–30 trial configurations before finding hardware that works. Each failed run costs real money. OOM after 20 minutes of provisioning and queuing, wrong dtype, batch size too aggressive. On an A100-80GB at $2.50/hr, even short failed runs add up to hundreds of dollars a week.
The irony is that most of these failures are predictable. If you know the model's parameter count, dtype, optimizer, and batch size, you can estimate peak VRAM within a reasonable range without ever touching a GPU. That's what Ghost Scan does. It's not magic. It's arithmetic that most people just don't do.
Training a 7B model in bf16 with AdamW? You're looking at ~14 GB weights + ~14 GB gradients + ~28 GB optimizer states + activations + overhead. That's 67+ GB before your code even starts. An A100-40GB was never going to work, but a lot of people find that out the expensive way.
Ghost Scan runs in seconds, costs nothing, and catches the obvious mismatches before they waste real GPU hours. We think every training job should start with a pre-flight check.
– allocguy
Ghost Scan is Alloc's VRAM estimation tool. Point it at your training script and it analyzes your model architecture (parameter count, data type, optimizer choice, and batch size) to produce an itemized breakdown of GPU memory usage. It does not require access to a GPU. Everything runs locally on your machine, which means it works in air-gapped and VPC-locked environments with no outbound internet.
Think of it as a pre-flight check for your training job. The same way you'd run terraform plan before provisioning infrastructure, Ghost Scan tells you what resources your job is likely to need before it hits the queue.
Itemized estimate for weights, gradients, optimizer states, activations, and buffer overhead.
Estimated peak VRAM as a range, not a single number. Honest about the uncertainty inherent in static analysis.
Whether your target GPU can handle the workload, or which GPUs are viable alternatives.
Run Ghost Scan from the command line. Point it at your training script and specify your target dtype and batch size:
$ alloc ghost train_7b.py --dtype bf16 --batch-size 32
Alloc Ghost Scan v0.3.0
Analyzing: train_7b.py
VRAM Breakdown (estimated)
──────────────────────────────────────
Model weights 14.0 GB (7B params x 2 bytes/bf16)
Gradients 14.0 GB
Optimizer states 28.0 GB (AdamW, fp32 master + m + v)
Activations ~8.2 GB (batch=32, seq=2048)
Buffer / overhead ~3-6 GB
──────────────────────────────────────
Total estimate ~67-70 GB
⚠ Will not fit on A100-40GB (40 GB).
✓ Fits on A100-80GB (80 GB) with ~10-13 GB headroom.
✓ Fits on H100-80GB (80 GB) with ~10-13 GB headroom.
Use Ghost Scan programmatically in notebooks or scripts. Pass your model's parameter count and dtype to get a VRAM report object:
import
from alloc import ghost
# Estimate VRAM for a 7B parameter model in bf16
report = alloc.ghost(
param_count_b=7.0,
dtype="bf16",
batch_size=32,
seq_len=2048
)
# Access the breakdown
print(report.total_gb) # e.g. 67.2
print(report.fits_gpu("A100-80GB")) # True
print(report.breakdown) # dict of components
Don't have the model locally? Use alloc scan to estimate VRAM for well-known model architectures via Alloc's API. No local setup needed:
$ alloc scan --model llama-3-70b --gpu A100-80GB
Scanning: llama-3-70b on A100-80GB
⚠ Estimated peak VRAM ~142 GB. Does not fit on a single A100-80GB.
✓ Consider: 2x A100-80GB with FSDP, or 1x H100-80GB with quantization (4-bit).
GPU memory during training is consumed by several distinct components. Understanding what each one is helps you reason about where memory goes and what levers you have to reduce it.
The learned parameters of your model. Memory usage scales directly with parameter count and the bytes-per-element of your chosen data type. A 7B parameter model in fp32 uses ~28 GB for weights alone (7 billion x 4 bytes). In bf16 or fp16 that halves to ~14 GB. Quantized formats (int8, int4) reduce it further.
During training, a gradient tensor is stored for each trainable parameter. Gradients are the same shape as the weights, so they consume roughly the same amount of memory. If your weights are 14 GB in bf16, expect another ~14 GB for gradients.
Adaptive optimizers like Adam and AdamW maintain additional state for each parameter. Adam keeps a first-moment estimate (m) and a second-moment estimate (v) alongside a master copy of the weights, all in fp32. For a model with P parameters, that means Adam stores 3 copies of P in fp32, often the single largest consumer of VRAM during training.
Intermediate tensors stored during the forward pass so they can be reused in the backward pass. Activation memory depends on batch size, sequence length, hidden dimension, and the number of layers. Doubling your batch size roughly doubles activation memory. Techniques like gradient checkpointing trade compute for memory by recomputing activations instead of storing them.
GPU memory allocators (like PyTorch's caching allocator) reserve extra memory beyond what tensors strictly need. This includes memory fragmentation, temporary buffers for operations like matrix multiplications, CUDA context overhead, and framework bookkeeping. Ghost Scan adds a headroom buffer to account for this, because the theoretical minimum rarely matches what you actually observe.
Ghost Scan is a static estimator. It gives you a strong starting point, but it cannot account for everything. Here is what it cannot do:
For workloads where Ghost Scan's static analysis is not enough, use Alloc Probe to run a short calibration on real hardware and get measured VRAM usage.
Install the CLI and run your first scan in under two minutes.
Read guide →Learn how Alloc Probe measures real hardware utilization and recommends optimal GPU configurations.
Read guide →Sign up for a free account. Ghost Scan is included on every tier. No credit card required.
Sign up →Want to see it in action? Try the interactive demo