Best Used GPU for Local LLMs: Buying Guide 2026
The used GPU market in March 2026 is still a strong source of VRAM for home lab LLM builders. Crypto mining is dead, gamers are upgrading to RTX 40-series and 50-series cards, and there is plenty of used NVIDIA hardware available. But prices have climbed across the board — the days of sub-$1,000 RTX 3090s and $200 Tesla P40s are over.
But buying used GPUs comes with real risks. Dead fans, degraded VRAM, missing thermal pads, no warranty, and sellers who conveniently forget to mention their card spent two years in a mining rig. This guide covers where to buy, what to check, which cards offer the best value for local LLM inference, and how to test your purchase before the return window closes.
If you’re still deciding between new and used, start with our best GPU for local LLMs roundup. If you’ve already decided to buy used, keep reading.
Where to Buy Used GPUs
Not all used GPU marketplaces are equal. Your priority is buyer protection — the ability to return a card that doesn’t work as advertised.
eBay
The default recommendation for most buyers. eBay’s buyer protection program covers you for 30 days on most listings, and the platform sides with buyers in disputes more often than sellers. Look for listings with the “eBay Money Back Guarantee” badge. Filter for sellers with 98%+ positive feedback and at least 100 transactions.
Pricing benchmarks (March 2026): eBay tends to run 5-10% above the absolute cheapest prices you’ll find, but the buyer protection is worth the premium.
r/hardwareswap (Reddit)
The enthusiast marketplace. Prices on r/hardwareswap run 10-20% below eBay because sellers avoid platform fees. The catch: protection depends entirely on using PayPal Goods & Services (never Friends & Family). The subreddit’s reputation system (confirmed trades) helps, but you’re trusting individuals, not a platform.
Best for: Experienced buyers who know exactly what card they want and can evaluate seller history. Worst for first-time used GPU buyers.
Amazon Renewed
Amazon’s refurbished program offers cards with a 90-day warranty and Amazon’s standard return process. The selection is limited and prices run 10-20% above eBay, but the warranty and hassle-free returns make it the safest option if you’re risk-averse.
Best for: Buyers who want warranty coverage and don’t mind paying a premium for peace of mind.
Local Marketplaces (Facebook Marketplace, Craigslist, OfferUp)
The advantage is testing in person before paying. Bring a laptop with GPU-Z installed, a USB-C to DisplayPort adapter, and a lightweight portable PSU if possible. The disadvantage is zero buyer protection once cash changes hands.
Best for: Buyers in metro areas who can inspect and test in person. Avoid shipping deals on local platforms — you lose the only advantage (in-person verification) while gaining none of the protection eBay or Amazon offer.
What to Check Before Buying
Every used GPU listing should be evaluated against these criteria before you commit.
VRAM Capacity and Type
For LLM inference, VRAM is the single most important spec. Models that fit entirely in VRAM run at 30-200+ tokens per second. Models that spill to system RAM drop to 2-5 tok/s. There’s no middle ground.
Know exactly how much VRAM you need before shopping. Our how much VRAM for LLMs guide has the full breakdown, but the short version:
- 12 GB: Runs 7B-8B models comfortably at Q4
- 16 GB: Fits 13B models at Q4
- 24 GB: Handles up to 32B at Q4 — the sweet spot
VRAM Junction Temperature History
Ask the seller for GPU-Z screenshots showing VRAM junction temperature under load. On GDDR6X cards (RTX 3090, 3080 Ti), the junction temp should be under 100°C. Cards that consistently ran above 105°C may have degraded VRAM thermal pads.
If the seller can’t provide temperature data, assume the worst and factor in the cost of replacing thermal pads (~$15 in materials, 30-60 minutes of work).
Visual Artifacts
Ask the seller to run a stress test (FurMark or Unigine Heaven) and send a video of the screen. Any colored dots, horizontal lines, or flickering during stress testing indicates VRAM or GPU die degradation. Walk away — this is not fixable.
Mining History
Mining cards are not inherently bad. Miners typically undervolted their GPUs and ran them at steady temperatures, which is actually gentler on the silicon than gaming workloads with constant thermal cycling. The real concerns with mining cards are:
- Fan bearings — 24/7 operation wears fans faster than intermittent use
- Thermal pads — sustained heat compresses pads over time, reducing cooling effectiveness
- Cosmetic condition — mining rigs often skip backplates and cases, leading to dusty cards
A mining card with good temps and working fans is a perfectly fine purchase for LLM inference. Just price in potential fan replacement (~$20-30) or thermal pad replacement.
Warranty and Return Policy
- eBay: 30-day return on most listings (check individual listing terms)
- Amazon Renewed: 90-day warranty
- r/hardwareswap: PayPal Goods & Services gives you 180 days of buyer protection
- Local: None — test before paying
Never buy a used GPU without at least a 30-day return window. Period.
Best Used GPUs Ranked by Value
These five cards represent the best value on the used market for local LLM inference in March 2026. Prices reflect actual sold listings, not aspirational asks.
1. RTX 3090 (~$1,730) — Best Overall for 24 GB VRAM
The used RTX 3090 remains the go-to GPU for home LLM inference when you need 24 GB of VRAM.
Specs: 24 GB GDDR6X · 936 GB/s bandwidth · 10,496 CUDA cores · 350W TDP
LLM Performance:
- Llama 3 8B Q4: ~112 tok/s
- 13B models Q4: ~85 tok/s
- 32B models Q4: Fits in VRAM, ~20-25 tok/s
24 GB of VRAM at ~$1,730 used. The discontinued RTX 4090 with identical VRAM capacity now commands ~$2,755. Used 3090 prices have climbed significantly from the sub-$1,000 range of 2024, driven by sustained AI demand, but it’s still the cheapest path to 24 GB GDDR6X — same model sizes as the 4090, 85% of the inference speed, at roughly 63% of the price.
The RTX 3090 launched during the crypto boom, so the used market has plenty of units available. Look for cards with original backplates, documented temperature history, and sellers offering returns.
The 350W TDP is the main drawback. At ~200W actual draw during inference, running 24/7 costs roughly ~$265/year at $0.15/kWh. Budget for a 750W+ PSU and a UPS that can handle the load.
For a detailed head-to-head with the current-gen alternative, see RTX 3090 vs 4090 for LLMs.
2. RTX 3080 Ti (~$650) — Best for 7B-13B Models
The used RTX 3080 Ti hits a sweet spot for builders who primarily run 7B-13B models and don’t need the headroom for 32B.
Specs: 12 GB GDDR6X · 912 GB/s bandwidth · 10,240 CUDA cores · 350W TDP
LLM Performance:
- Llama 3 8B Q4: ~105 tok/s
- 13B models Q4: Fits with tight context window, ~55-65 tok/s
- 32B+ models: Does not fit
12 GB of VRAM is enough for 7B-8B models with full context windows and 13B models with reduced context. At ~$650 used, it costs well under half what a 3090 commands and delivers nearly identical bandwidth (912 vs 936 GB/s). If you know you’ll stick with smaller models, the 3080 Ti is a strong option.
The caveat: 12 GB is a tight ceiling. As models grow and 13B becomes the baseline rather than the ceiling, you may find yourself wanting that extra VRAM sooner than expected. If there’s any chance you’ll want to run 32B models, the extra ~$1,080 for a 3090 gets you to 24 GB.
3. RTX 3060 12GB (~$428) — Budget CUDA Entry Point
The used RTX 3060 12GB is the cheapest way to get 12 GB of VRAM on a CUDA card. At ~$428 used, it remains an accessible entry point for local AI — though no longer the impulse buy it once was.
Specs: 12 GB GDDR6 · 360 GB/s bandwidth · 3,584 CUDA cores · 170W TDP
LLM Performance:
- Llama 3 8B Q4: ~40-50 tok/s
- 13B models Q4: Fits with reduced context, ~15-20 tok/s
- 32B+ models: Does not fit
The 3060 12GB has the same VRAM capacity as the 3080 Ti but roughly one-third the bandwidth. That bandwidth gap shows up in raw tok/s numbers — expect about half the speed of a 3080 Ti on equivalent models. But 40-50 tok/s on 8B models is perfectly usable for interactive chat and coding assistants.
The 170W TDP makes this an excellent choice for always-on inference servers. Annual power costs at 24/7 operation run roughly ~$135, about half what a 3090 draws. If you’re building a dedicated Ollama box that runs 7B-8B models, the 3060 12GB is a solid economic choice.
Important: Make sure you’re buying the 12 GB variant, not the 8 GB RTX 3060 Ti. The 3060 Ti has better gaming performance but 4 GB less VRAM — a terrible trade-off for LLM work.
4. Tesla P40 (~$400) — Datacenter Workhorse, 24 GB
The used Tesla P40 is the wildcard pick. 24 GB of VRAM for ~$400 is still the best VRAM-per-dollar in the market, though the price has doubled from its 2024 lows. The trade-offs are real — but for pure LLM inference on a budget, nothing else comes close on VRAM-per-dollar.
Specs: 24 GB GDDR5X · 346 GB/s bandwidth · 3,840 CUDA cores · 250W TDP
LLM Performance:
- Llama 3 8B Q4: ~15-20 tok/s
- 13B models Q4: Fits in VRAM, ~10-15 tok/s
- 32B models Q4: Fits in VRAM, ~5-8 tok/s
The P40 has Pascal-era bandwidth (346 GB/s) which makes it significantly slower than any Ampere card on a per-token basis. But it fits the same model sizes as an RTX 3090 because VRAM capacity, not speed, determines what models load. If your use case tolerates slower generation — batch processing, background summarization, or scenarios where you’re not staring at the screen waiting for tokens — the P40 at less than a quarter the price of a 3090 is compelling.
Critical caveats:
- No video output. The Tesla P40 is a datacenter card with no display connectors. You need a separate GPU (even a cheap GT 710) for display, or run headless.
- No fan. The P40 uses a passive heatsink designed for server airflow. You’ll need a server chassis with front-to-back airflow, or an aftermarket blower-style cooler (~$30-50).
- Older CUDA compute capability. Pascal (compute 6.1) is still supported by llama.cpp and Ollama, but some newer AI frameworks are dropping Pascal support. Verify compatibility with your stack before buying.
- Power connector. Uses an 8-pin PCIe power connector, but some P40 units have non-standard mounting. Verify the listing shows a standard PCIe bracket.
The P40 is best suited for builders who already have a headless server, want to experiment with larger models without spending ~$1,730 on a 3090, or want a second GPU for offloading layers in a multi-GPU setup.
5. Tesla P100 16GB (~$150) — Older but Still Useful
The used Tesla P100 16GB is the absolute bottom of the price curve for a usable LLM card. At ~$150, it’s nearly disposable.
Specs: 16 GB HBM2 · 732 GB/s bandwidth · 3,584 CUDA cores · 250W TDP
LLM Performance:
- Llama 3 8B Q4: ~25-30 tok/s
- 13B models Q4: Fits with reduced context, ~15-20 tok/s
- 32B+ models: Does not fit
The P100’s HBM2 memory gives it surprisingly good bandwidth (732 GB/s) — more than double the P40’s GDDR5X. This translates to noticeably faster inference despite an older architecture. The 16 GB capacity fits 13B models at Q4, placing it between the 12 GB consumer cards and the 24 GB options.
Same caveats as the P40: no video output, passive cooling only, datacenter form factor. The P100 also comes in PCIe and SXM2 variants — only buy PCIe unless you have a server with an SXM2 baseboard, which you almost certainly don’t.
At ~$150, the P100 is a great card for learning, experimenting, and running 7B-8B models at acceptable speed. If you find yourself wanting more, sell it for what you paid and upgrade to a 3090.
Red Flags: When to Walk Away
These are deal-breakers regardless of price.
Visual artifacts under stress testing. Colored dots, flickering, horizontal lines, or screen corruption during FurMark or gaming benchmarks indicate dying VRAM or GPU die. This is hardware failure and is not fixable.
Missing backplate with no explanation. The backplate provides structural support and aids cooling. Cards sold without backplates were often racked in mining rigs where backplates were removed for airflow. The missing backplate itself isn’t the problem — the implication about the card’s history and the seller’s transparency is.
Suspiciously low prices. If a listing is 30%+ below the going rate with no obvious reason (like a cosmetic flaw the seller discloses), it’s likely a scam, a card with undisclosed damage, or a bait-and-switch where you receive a different card. Check sold listings on eBay for real price benchmarks.
No return policy. Any seller confident in their card’s condition will offer at least a 14-day return window. A seller who refuses returns is telling you something.
BIOS modifications. Some mining cards had their BIOS flashed to alter power limits or fan curves. Ask the seller if the BIOS is stock. Modified BIOS can cause stability issues and may void RMA eligibility on cards still within manufacturer warranty.
Physical damage to the PCIe connector or power pins. Bent pins, scorch marks, or corrosion on the PCIe slot connector or power connectors indicate electrical damage. This card may work today and die tomorrow.
Testing Procedures After Purchase
You have a limited return window. Use these first 48 hours wisely.
Step 1: Visual Inspection (5 Minutes)
Before installing the card, inspect it physically:
- Check the PCIe connector for bent or discolored pins
- Verify all fans spin freely by hand
- Look for bulging or leaking capacitors on the PCB
- Confirm the backplate is present and properly secured
- Check for signs of liquid damage (white residue, corrosion)
Step 2: Baseline Monitoring (15 Minutes)
Install the card and boot into your OS. Install GPU-Z and record:
- GPU clock speed at idle — should match stock specs
- VRAM junction temperature at idle — should be under 50°C
- Fan speed and noise — listen for clicking or grinding from bearings
Step 3: Stress Test (30 Minutes)
Run FurMark or Unigine Heaven for at least 30 minutes. During the test:
- GPU core temperature should stay under 85°C with stock cooling
- VRAM junction temperature should stay under 100°C (critical for GDDR6X cards)
- Watch the screen continuously for any artifacts — even a single colored dot is a failure
- Listen for coil whine — some level is normal, but excessive buzzing may indicate capacitor issues
Step 4: LLM Inference Benchmark (15 Minutes)
This is what actually matters. Install Ollama, pull a model (llama3:8b is a good standard test), and run a generation benchmark:
ollama run llama3:8b "Write a 500-word essay about GPU computing"
Compare your tok/s against the benchmarks listed above for your specific card. Results within 10% of published benchmarks are normal. Results more than 20% below indicate a problem — possibly thermal throttling, power delivery issues, or degraded VRAM.
Step 5: Extended Burn-In (24-48 Hours)
If the card passes steps 1-4, run continuous inference for 24-48 hours. Use a script that repeatedly prompts the model and logs generation speed. Watch for:
- Speed degradation over time (indicates thermal issues)
- System crashes or driver resets
- Increasing VRAM junction temperatures
If anything fails during the burn-in period, initiate a return immediately. Don’t troubleshoot a used card with a ticking return window — send it back and buy another.
The Value Ladder: Making Your Decision
Here’s how to think about the used GPU market for LLMs in March 2026:
| Card | Used Price | VRAM | Best For |
|---|---|---|---|
| RTX 3090 | ~$1,730 | 24 GB | Best overall — runs up to 32B models |
| RTX 3080 Ti | ~$650 | 12 GB | 7B-13B models at high speed |
| RTX 3060 12GB | ~$428 | 12 GB | Budget 7B-8B with low power draw |
| Tesla P40 | ~$400 | 24 GB | 24 GB on a budget, headless only |
| Tesla P100 16GB | ~$150 | 16 GB | Entry-level experimenting |
If you can afford ~$1,730, buy the RTX 3090. It’s the most versatile card on this list and will remain relevant as models grow. At 24 GB, you won’t hit the VRAM ceiling until you’re trying to run 70B+ parameter models.
If your budget is ~$650, the RTX 3080 Ti gives you fast inference on the models most people actually run. The 12 GB VRAM is limiting but sufficient for current 7B-13B workloads.
If you’re just getting started and want to spend as little as possible, the RTX 3060 12GB at ~$428 is a reasonable entry point. Low power, full CUDA support, enough VRAM for the models that matter.
The Tesla cards are specialist picks. The P40 is for budget builders who want 24 GB and don’t mind slower inference and no display output. The P100 is for experimenters who want decent bandwidth at minimal cost.
Whatever you buy, use the testing procedures above within your return window. A used GPU that passes a 48-hour burn-in is likely to last for years. One that doesn’t should go back immediately.
For recommendations on new GPUs, see our best GPU for local LLMs guide. To figure out exactly how much VRAM your target models need, check how much VRAM for LLMs. And for a direct comparison of the most popular used vs. new option, read RTX 3090 vs 4090 for LLMs.
Frequently Asked Questions
Is it safe to buy an ex-mining GPU for LLMs?
How much VRAM do I need for local LLMs?
Should I buy an RTX 3090 or a Tesla P40 for LLMs?
Where is the best place to buy used GPUs?
What should I test first after buying a used GPU?
Get our weekly picks
The best home lab deals and new reviews, every week. Free, no spam.
Join home lab builders who get deals first.