See What Alloc Does

From pre-flight VRAM checks to run analysis, code diagnosis, and cost tracking. Try the interactive scan, then scroll to see what a real Alloc workflow looks like.

Pre-flight check: will your model fit?

Pick a model and GPU below. Alloc estimates VRAM, cost, and feasibility instantly.

Live|8.03B params

GPU Comparison

You just saw the pre-flight. Here's what happens when you train.

Run Analysis

Representative example

After training, Alloc surfaces bottlenecks, phase breakdowns, and right-sizing recommendations.

alloclabs.com/runs/llama3-8b-finetune

llama3-8b-finetune

completedUnderutilizedStep Timing

4x A100-80GB · FSDP · feat/llama-finetune

Peak VRAM

51.2 GB

/ 80 GB per GPU

GPU Busy %

47%

across 4 GPUs

Step Time (p50)

284 ms

p90: 312 ms

Dataloader Wait

42%

of step time

Step Phase Breakdown

42%
31%
22%
Dataloader 42%
Forward 31%
Backward 22%
Optimizer 5%

GPU Utilization

46%

VRAM Usage (GB)

51 GB

Recommendations

DataLoader bottleneck detected

high

42% of step time spent waiting on data loading. Your GPUs are idle during this time.

  • Set num_workers=8 (currently 0)
  • Enable pin_memory=True
  • Set prefetch_factor=4

Strategy optimization available

medium

FSDP with gradient checkpointing could reduce per-GPU VRAM by ~30%, enabling larger batch sizes.

  • Enable gradient checkpointing
  • Consider FSDP cpu_offload for optimizer states

Consider GPU right-sizing

low

Peak VRAM is 51.2 GB on 80 GB GPUs (64% utilization). With FSDP sharding across 4 GPUs, per-GPU usage drops to ~13 GB — an A10G (24 GB) could handle this at lower cost.

  • Run pre-flight scan on L40S

Config Comparison

GPUStrategyEst. CostStatus
4x A100-80GBFSDP$12.40/hrcurrent
4x A10G-24GBFSDP$5.40/hrexplore
2x A100-80GBFSDP$6.20/hrin fleet
4x H100-80GBFSDP$16.80/hrin fleet

Now let's catch issues before they cost you GPU hours.

Code Diagnosis

Representative example

Alloc statically analyzes your training code to find common performance issues and suggests improvements.

terminal

$ alloc diagnose train.py

DL001num_workers=0 (default)hightrain.py:47

PREC001No mixed precision detectedhightrain.py:82

DIST002FSDP configured correctlytrain.py:31

2 issues found, 1 check passed

Run alloc diagnose --diff for patches

$  

train.pyDL001 fix

@@ -45,3 +45,5 @@

train_dataset = load_dataset("custom/data")

-train_loader = DataLoader(train_dataset, batch_size=32)

+train_loader = DataLoader(

+ train_dataset, batch_size=32,

+ num_workers=8, pin_memory=True, prefetch_factor=4

+)

All of this rolls up into budget and savings tracking.

Budget & Cost Tracking

Representative example

Track GPU spend, see realized savings, and set budget guardrails for your team.

Monthly Budget

$2,847 / $5,000

$2,153 remaining

57% burned

Potential Savings

$1,200

Realized Savings

$680

Jobs Right-Sized

4

OOMs Prevented

2

Recent Runs

RunGPUCostStatus
llama3-8b-finetune4x A100-80GB$4.23underutilized
mistral-7b-eval1x A100-80GB$0.89balanced
qwen-72b-pretrain8x H100-80GB$18.41compute bound
llama3-70b-lora4x A100-80GB$6.72failed

Ready to stop guessing?

Install Alloc in one line. Get VRAM estimates, bottleneck detection, and cost tracking from your first run.

Sign up free