QUICKSTART

Get Started with Alloc in 60 Seconds

Profile GPU workloads, estimate VRAM requirements, and get right-sizing suggestions, without modifying your training code.

FROM ALLOCGUY

The single biggest time sink in ML isn't training. It's the iteration loop before training. Picking a GPU, guessing at batch sizes, waiting in a cloud queue, hitting OOM, adjusting, re-queuing. I've watched teams burn through an entire week just finding a config that doesn't crash.

Cloud GPU costs are steep: H100s at $3–5/hr, A100s at $2–4/hr, even a modest L4 at $0.50–0.80/hr. Every wasted hour has a real price tag. A single failed 8-GPU training run can easily waste $100+ before you even see an error message.

Alloc takes 60 seconds to install and run. That first scan could save you hours of trial-and-error and hundreds in wasted compute. I'd rather you find out your job needs 67 GB of VRAM on your laptop for free than on an A100 for $3/hr.

– allocguy

1

Install

Install the Alloc CLI from PyPI. Works on Linux and macOS with Python 3.8+.

pip install alloc

For GPU monitoring with NVIDIA Management Library (NVML) support, install the GPU extras:

pip install alloc[gpu]

Verify the installation:

alloc version
2

Ghost Scan: Static Analysis

Ghost scan analyzes your training script and model definition without executing anything. It produces a VRAM breakdown estimate showing parameters, optimizer states, activations, and framework overhead.

alloc ghost train_7b.py --dtype bf16

Ghost scan will output:

  • Estimated VRAM range for your model and dtype
  • Parameter count and memory breakdown by category
  • Whether your model fits on the target GPU
  • Suggestions for reducing memory if it does not fit

No GPU required. Ghost scan runs entirely on CPU using static analysis.

3

Probe Run: Live GPU Profiling

Wrap your training command with alloc run to profile actual GPU utilization during execution. By default, Alloc uses calibrate-and-exit mode: it automatically stops when GPU metrics stabilize, so you don't need to wait for a full training run.

alloc run python train.py

The probe captures:

  • Real-time VRAM usage, GPU utilization, and memory bandwidth
  • Multi-GPU process-tree discovery
  • Hardware context (driver version, CUDA version, SM architecture)
  • A fit/no-fit verdict with right-sizing suggestions

For a full-duration profile instead of calibrate-and-exit:

alloc run --full python train.py

Alloc never modifies your training code or changes its exit code. If Alloc encounters an error, your training continues unaffected.

4

Remote Scan: No GPU Needed

Run a VRAM estimation from anywhere: your laptop, a CI runner, or a Slack bot. No GPU required. Alloc uses its model catalog to estimate memory requirements for known architectures.

alloc scan --model llama-3-70b --gpu A100-80GB

Returns an estimated VRAM breakdown and fit verdict for the specified model and GPU combination. Useful for capacity planning before provisioning hardware.

5

Upload: Dashboard Integration

Optionally, log in and upload your profiling artifacts to the Alloc dashboard for historical tracking, team sharing, and AI-powered optimization suggestions.

alloc login
alloc upload alloc_artifact.json.gz

Uploads are optional and never block your training. All profiling works fully offline. Upload when you're ready.

All Commands

CommandDescription
alloc ghostStatic VRAM estimation from a training script or model definition
alloc runLive GPU profiling with auto-calibrate (wraps your training command)
alloc scanRemote VRAM scan using the model catalog (no GPU needed)
alloc loginAuthenticate with your Alloc account
alloc uploadUpload a profiling artifact to the Alloc dashboard
alloc catalogList supported models and GPU configurations
alloc versionPrint the installed Alloc CLI version

Configuration

Alloc can be configured through environment variables. All are optional. The CLI works out of the box with sensible defaults.

VariableDescription
ALLOC_API_URLAPI endpoint for uploads and remote scans. Defaults to the Alloc cloud API. Set this to your own endpoint for air-gapped deployments.
ALLOC_TOKENAuthentication token. Automatically set by alloc login. Can also be set manually for CI/CD environments.
ALLOC_UPLOADSet to 1 to automatically upload artifacts after each run. Disabled by default.

Python API

You can also use Alloc programmatically from Python. The alloc.ghost() function accepts a PyTorch model and returns a VRAM estimation report.

import alloc

# Pass your model to ghost for static VRAM estimation
report = alloc.ghost(model)

# Access the estimation results
print(report.estimated_vram)
print(report.breakdown)
print(report.verdict)

The Python API performs the same static analysis as the CLI ghost command, making it easy to integrate VRAM checks into your training pipelines or notebooks.

Next Steps

Want to see it in action first? Try the interactive demo