50% off all plans, limited time. Starting at $2.48/mo

GPU VPS Hosting

RTX 6000 Pro. A100. RTX 5090.
Dedicated, not sliced.

Full GPU passthrough. RTX 6000 Pro, A100, RTX 5090, RTX 4090. Pre-installed CUDA, cuDNN, PyTorch ready.
NVMe + 40 Gbps networking. Independent cloud since 2008.

4.6 · 705 reviews on Trustpilot

Starting at $506.35/mo · 35% off annual · No credit card required

~ ssh root@gpu-train-001 connected
root@gpu-train-001:~# nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv
name, memory.total, driver_version
NVIDIA RTX 6000 Pro, 49152 MiB, 560.94
root@gpu-train-001:~# python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
True NVIDIA RTX 6000 Pro
root@gpu-train-001:~# python train.py --model llama-3-8b --epochs 3
Training step 1/2400 · 4.2s/step · loss=2.143
root@gpu-train-001:~# _

GPU VPS at a glance

Cloudzy sells GPU VPS plans with dedicated RTX 6000 Pro, Nvidia A100, RTX 5090, and RTX 4090 cards in 1× to 4× configurations, starting at $506.35 per month. Each plan ships pre-installed with the latest CUDA, cuDNN, and Nvidia drivers, runs on AMD EPYC + DDR5 with NVMe-only storage and 40 Gbps uplinks, and provisions in 60 seconds. GPUs are dedicated passthrough, not vGPU, not MIG, not shared. Cloudzy has operated independently since 2008 and is rated 4.6 / 5 by 705+ reviewers on Trustpilot.

Starting price
$506.35 / mo
GPU types
6000 Pro · A100 · 5090 · 4090
Configs
1× to 4×
CUDA
Pre-installed
Annual discount
35% off
Money-back
14 days

Why ML teams pick Cloudzy

GPU compute the unboring way.

The four reasons teams move to Cloudzy from AWS / GCP / hyperscaler GPUs.

Dedicated GPU passthrough

The full physical card is yours, no vGPU slicing, no MIG partitions, no contention with other tenants. CUDA cores, VRAM, PCIe lanes, all dedicated.

CUDA-ready images

Latest Nvidia drivers, CUDA toolkit, and cuDNN pre-baked into the Ubuntu image. PyTorch, TensorFlow, JAX, Hugging Face, pip install and you're training.

NVMe + 40 Gbps

Pure NVMe storage so dataset loading isn't the bottleneck. 40 Gbps networking means pulling a 100 GB Hugging Face model finishes in seconds, not minutes.

24/7 human support

Real engineers on chat. We've helped enough teams set up multi-GPU training, debug CUDA OOMs, and tune Llama inference that the answers come back fast.

GPU lineup

Four families.
Nine ways to scale.

RTX 6000 Pro for pro-grade inference and rendering with 48 GB ECC VRAM. A100 for training and large-VRAM workloads. RTX 5090 for the newest inference. RTX 4090 for cost-effective inference up to 70B (4-bit). Multi-GPU plans available, pick what your VRAM budget needs.

Full GPU passthrough, not sliced, not shared
RTX 6000 Pro
48 GB GDDR6 ECC · Pro-grade
Nvidia A100
80 GB HBM2e · ML training
RTX 5090
32 GB GDDR7 · Blackwell
RTX 4090
24 GB GDDR6X · cost-effective
1× to 4× GPU
Multi-GPU plans available
CUDA preinstalled
PyTorch · TF · JAX ready
Pure NVMe
Fast dataset I/O
40 Gbps uplink
Pull 100 GB models in 30s

Use cases

The workloads our
customers actually train.

LLM inference

Serve Llama 3, Mistral, DeepSeek, or Qwen with vLLM or Text Generation Inference. RTX 4090 handles 70B at 4-bit, RTX 5090 handles 70B at 8-bit, A100 handles unquantized.

Stable Diffusion · image gen

Run SDXL, Flux, or fine-tuned Stable Diffusion checkpoints with ComfyUI or Automatic1111. RTX 4090 hits 30+ images/min on standard 1024×1024 SDXL.

ML training + fine-tuning

LoRA, QLoRA, full fine-tuning. A100 is the sweet spot for 7B-13B unquantized fine-tuning; 4× A100 handles up to 70B with proper sharding (FSDP / DeepSpeed).

3D rendering · Blender

Cycles + OptiX on RTX cards is the fastest path for animation studios. The 24 GB VRAM on RTX 4090 covers the vast majority of single-frame production scenes.

Speech + vision pipelines

Whisper Large, Faster-Whisper, YOLO, Segment Anything. Even the RTX 4090 plan runs real-time inference on these models with comfortable headroom.

Long-running batch jobs

Embedding generation, retrieval pipelines, dataset preprocessing. Pay hourly, run the job, snapshot the output, destroy the box, cheaper than renting on AWS/GCP for the same workload.

80 GB
A100 VRAM
40 Gbps
Uplink
CUDA-ready
Image
4 ×
Max GPUs
35%
Annual off
14 days
Money-back

Pricing

Featured GPU plans. Hourly or annual.

Annual billing is currently 35% off on every GPU plan.

FAQ. GPU VPS

Common questions, straight answers.

Which GPUs does Cloudzy offer?

Four families: RTX 6000 Pro (1×, 48 GB GDDR6 ECC VRAM, pro-grade for inference and rendering), Nvidia A100 (1× / 2× / 4×, for ML training, fp16/bf16 workloads, and 80 GB HBM2e per card), RTX 5090 (1× / 2×, newer Blackwell architecture, ideal for inference workloads and rendering), and RTX 4090 (1× / 2× / 4×, cost-effective for Stable Diffusion, LLM inference, and 3D rendering).

Are the GPUs dedicated or shared?

Dedicated. Each plan is a passthrough of the full physical GPU(s), not a slice, not vGPU, not MIG. The CUDA cores, the VRAM, the PCIe bandwidth, all yours. Multi-GPU plans use NVLink where the physical hardware supports it (A100 multi-GPU plans).

Is CUDA pre-installed?

Yes. Every GPU VPS ships with the latest stable CUDA toolkit, cuDNN, and Nvidia drivers pre-baked into the Ubuntu image. PyTorch, TensorFlow, JAX, and the Hugging Face stack run out of the box. You can re-image to a clean Ubuntu without CUDA if you want to install a specific version.

How much VRAM do I get?

Per GPU: RTX 6000 Pro = 48 GB GDDR6 ECC, A100 = 80 GB HBM2e, RTX 5090 = 32 GB GDDR7, RTX 4090 = 24 GB GDDR6X. Multi-GPU plans aggregate that, a 4× A100 plan has 320 GB total VRAM. The plan list above shows system RAM separately.

Can I run Stable Diffusion / Llama / Whisper on a GPU VPS?

Yes. The 1× RTX 4090 plan is a good starting point: enough VRAM for SDXL inference, Llama 3 70B (4-bit quantized), or Whisper Large. Bump to RTX 5090 or A100 if you need to run unquantized 70B models or train LoRAs.

How does the pricing compare to AWS / Google Cloud / Lambda Labs?

Generally cheaper for steady-state workloads, we don't price-discriminate by 'on-demand' vs 'spot' and we don't have egress fees. We won't quote competitor numbers (those change monthly). The 14-day money-back guarantee lets you A/B against your current provider with your own benchmarks.

Is there an annual discount?

Yes, 35% off annual billing on every GPU plan (lower than the 50% on regular CPU because GPU hardware costs more to amortize). No auto-renewal; you'll get an invoice before each yearly cycle so you can downgrade, upgrade, or cancel without surprise charges.

What about networking? Is it really 40 Gbps?

Yes. Same 40 Gbps uplinks as our flagship Cloud VPS, with no egress fees on monthly transfer up to the plan allowance. Useful for moving large datasets in and out of the GPU node, pulling a 100 GB Hugging Face model takes about 30 seconds at line rate.

Can I run multi-node training (multiple GPU VPS together)?

Yes within a region. VPS in the same datacenter share the local network with sub-millisecond latency. We don't currently offer InfiniBand interconnect, multi-node training over standard Ethernet is fine for fine-tuning and small-scale distributed jobs but isn't competitive with bare-metal HPC for large pre-training.

Money-back guarantee on GPU plans?

14 days, no questions asked. Refund within one billing cycle. Plenty of time to benchmark CUDA throughput, run a real training step, and decide if Cloudzy is the right fit for your workload.

Stop paying hyperscaler prices.
Train on dedicated GPUs.

Pick a card, pick a region, click. CUDA is already installed.

No credit card required · 14-day money-back guarantee · Cancel anytime