QUICKSTART
Profile GPU workloads, estimate VRAM requirements, and get right-sizing suggestions, without modifying your training code.
FROM ALLOCGUY
The single biggest time sink in ML isn't training. It's the iteration loop before training. Picking a GPU, guessing at batch sizes, waiting in a cloud queue, hitting OOM, adjusting, re-queuing. I've watched teams burn through an entire week just finding a config that doesn't crash.
Cloud GPU costs are steep: H100s at $3–5/hr, A100s at $2–4/hr, even a modest L4 at $0.50–0.80/hr. Every wasted hour has a real price tag. A single failed 8-GPU training run can easily waste $100+ before you even see an error message.
Alloc takes 60 seconds to install and run. That first scan could save you hours of trial-and-error and hundreds in wasted compute. I'd rather you find out your job needs 67 GB of VRAM on your laptop for free than on an A100 for $3/hr.
– allocguy
Install the Alloc CLI from PyPI. Works on Linux and macOS with Python 3.8+.
pip install allocFor GPU monitoring with NVIDIA Management Library (NVML) support, install the GPU extras:
pip install alloc[gpu]Verify the installation:
alloc versionGhost scan analyzes your training script and model definition without executing anything. It produces a VRAM breakdown estimate showing parameters, optimizer states, activations, and framework overhead.
alloc ghost train_7b.py --dtype bf16Ghost scan will output:
No GPU required. Ghost scan runs entirely on CPU using static analysis.
Wrap your training command with alloc run to profile actual GPU utilization during execution. By default, Alloc uses calibrate-and-exit mode: it automatically stops when GPU metrics stabilize, so you don't need to wait for a full training run.
alloc run python train.pyThe probe captures:
For a full-duration profile instead of calibrate-and-exit:
alloc run --full python train.pyAlloc never modifies your training code or changes its exit code. If Alloc encounters an error, your training continues unaffected.
Run a VRAM estimation from anywhere: your laptop, a CI runner, or a Slack bot. No GPU required. Alloc uses its model catalog to estimate memory requirements for known architectures.
alloc scan --model llama-3-70b --gpu A100-80GBReturns an estimated VRAM breakdown and fit verdict for the specified model and GPU combination. Useful for capacity planning before provisioning hardware.
Optionally, log in and upload your profiling artifacts to the Alloc dashboard for historical tracking, team sharing, and AI-powered optimization suggestions.
alloc loginalloc upload alloc_artifact.json.gzUploads are optional and never block your training. All profiling works fully offline. Upload when you're ready.
| Command | Description |
|---|---|
| alloc ghost | Static VRAM estimation from a training script or model definition |
| alloc run | Live GPU profiling with auto-calibrate (wraps your training command) |
| alloc scan | Remote VRAM scan using the model catalog (no GPU needed) |
| alloc login | Authenticate with your Alloc account |
| alloc upload | Upload a profiling artifact to the Alloc dashboard |
| alloc catalog | List supported models and GPU configurations |
| alloc version | Print the installed Alloc CLI version |
Alloc can be configured through environment variables. All are optional. The CLI works out of the box with sensible defaults.
| Variable | Description |
|---|---|
| ALLOC_API_URL | API endpoint for uploads and remote scans. Defaults to the Alloc cloud API. Set this to your own endpoint for air-gapped deployments. |
| ALLOC_TOKEN | Authentication token. Automatically set by alloc login. Can also be set manually for CI/CD environments. |
| ALLOC_UPLOAD | Set to 1 to automatically upload artifacts after each run. Disabled by default. |
You can also use Alloc programmatically from Python. The alloc.ghost() function accepts a PyTorch model and returns a VRAM estimation report.
import alloc # Pass your model to ghost for static VRAM estimation report = alloc.ghost(model) # Access the estimation results print(report.estimated_vram) print(report.breakdown) print(report.verdict)
The Python API performs the same static analysis as the CLI ghost command, making it easy to integrate VRAM checks into your training pipelines or notebooks.
Full reference for all Alloc CLI commands and configuration options.
Learn how ghost scan estimates VRAM without executing your training script.
Understand how Alloc surfaces GPU right-sizing suggestions for your workloads.
Sign up to access the dashboard, historical tracking, and team collaboration.
Want to see it in action first? Try the interactive demo