Skip to content

Best Used GPU for Local LLMs: Buying Guide 2026

· · 14 min read

The used GPU market in March 2026 is still a strong source of VRAM for home lab LLM builders. Crypto mining is dead, gamers are upgrading to RTX 40-series and 50-series cards, and there is plenty of used NVIDIA hardware available. But prices have climbed across the board — the days of sub-$1,000 RTX 3090s and $200 Tesla P40s are over.

But buying used GPUs comes with real risks. Dead fans, degraded VRAM, missing thermal pads, no warranty, and sellers who conveniently forget to mention their card spent two years in a mining rig. This guide covers where to buy, what to check, which cards offer the best value for local LLM inference, and how to test your purchase before the return window closes.

If you’re still deciding between new and used, start with our best GPU for local LLMs roundup. If you’ve already decided to buy used, keep reading.


Where to Buy Used GPUs

Not all used GPU marketplaces are equal. Your priority is buyer protection — the ability to return a card that doesn’t work as advertised.

eBay

The default recommendation for most buyers. eBay’s buyer protection program covers you for 30 days on most listings, and the platform sides with buyers in disputes more often than sellers. Look for listings with the “eBay Money Back Guarantee” badge. Filter for sellers with 98%+ positive feedback and at least 100 transactions.

Pricing benchmarks (March 2026): eBay tends to run 5-10% above the absolute cheapest prices you’ll find, but the buyer protection is worth the premium.

r/hardwareswap (Reddit)

The enthusiast marketplace. Prices on r/hardwareswap run 10-20% below eBay because sellers avoid platform fees. The catch: protection depends entirely on using PayPal Goods & Services (never Friends & Family). The subreddit’s reputation system (confirmed trades) helps, but you’re trusting individuals, not a platform.

Best for: Experienced buyers who know exactly what card they want and can evaluate seller history. Worst for first-time used GPU buyers.

Amazon Renewed

Amazon’s refurbished program offers cards with a 90-day warranty and Amazon’s standard return process. The selection is limited and prices run 10-20% above eBay, but the warranty and hassle-free returns make it the safest option if you’re risk-averse.

Best for: Buyers who want warranty coverage and don’t mind paying a premium for peace of mind.

Local Marketplaces (Facebook Marketplace, Craigslist, OfferUp)

The advantage is testing in person before paying. Bring a laptop with GPU-Z installed, a USB-C to DisplayPort adapter, and a lightweight portable PSU if possible. The disadvantage is zero buyer protection once cash changes hands.

Best for: Buyers in metro areas who can inspect and test in person. Avoid shipping deals on local platforms — you lose the only advantage (in-person verification) while gaining none of the protection eBay or Amazon offer.


What to Check Before Buying

Every used GPU listing should be evaluated against these criteria before you commit.

VRAM Capacity and Type

For LLM inference, VRAM is the single most important spec. Models that fit entirely in VRAM run at 30-200+ tokens per second. Models that spill to system RAM drop to 2-5 tok/s. There’s no middle ground.

Know exactly how much VRAM you need before shopping. Our how much VRAM for LLMs guide has the full breakdown, but the short version:

  • 12 GB: Runs 7B-8B models comfortably at Q4
  • 16 GB: Fits 13B models at Q4
  • 24 GB: Handles up to 32B at Q4 — the sweet spot

VRAM Junction Temperature History

Ask the seller for GPU-Z screenshots showing VRAM junction temperature under load. On GDDR6X cards (RTX 3090, 3080 Ti), the junction temp should be under 100°C. Cards that consistently ran above 105°C may have degraded VRAM thermal pads.

If the seller can’t provide temperature data, assume the worst and factor in the cost of replacing thermal pads (~$15 in materials, 30-60 minutes of work).

Visual Artifacts

Ask the seller to run a stress test (FurMark or Unigine Heaven) and send a video of the screen. Any colored dots, horizontal lines, or flickering during stress testing indicates VRAM or GPU die degradation. Walk away — this is not fixable.

Mining History

Mining cards are not inherently bad. Miners typically undervolted their GPUs and ran them at steady temperatures, which is actually gentler on the silicon than gaming workloads with constant thermal cycling. The real concerns with mining cards are:

  1. Fan bearings — 24/7 operation wears fans faster than intermittent use
  2. Thermal pads — sustained heat compresses pads over time, reducing cooling effectiveness
  3. Cosmetic condition — mining rigs often skip backplates and cases, leading to dusty cards

A mining card with good temps and working fans is a perfectly fine purchase for LLM inference. Just price in potential fan replacement (~$20-30) or thermal pad replacement.

Warranty and Return Policy

  • eBay: 30-day return on most listings (check individual listing terms)
  • Amazon Renewed: 90-day warranty
  • r/hardwareswap: PayPal Goods & Services gives you 180 days of buyer protection
  • Local: None — test before paying

Never buy a used GPU without at least a 30-day return window. Period.


Best Used GPUs Ranked by Value

These five cards represent the best value on the used market for local LLM inference in March 2026. Prices reflect actual sold listings, not aspirational asks.

1. RTX 3090 (~$1,730) — Best Overall for 24 GB VRAM

The used RTX 3090 remains the go-to GPU for home LLM inference when you need 24 GB of VRAM.

Specs: 24 GB GDDR6X · 936 GB/s bandwidth · 10,496 CUDA cores · 350W TDP

LLM Performance:

  • Llama 3 8B Q4: ~112 tok/s
  • 13B models Q4: ~85 tok/s
  • 32B models Q4: Fits in VRAM, ~20-25 tok/s

24 GB of VRAM at ~$1,730 used. The discontinued RTX 4090 with identical VRAM capacity now commands ~$2,755. Used 3090 prices have climbed significantly from the sub-$1,000 range of 2024, driven by sustained AI demand, but it’s still the cheapest path to 24 GB GDDR6X — same model sizes as the 4090, 85% of the inference speed, at roughly 63% of the price.

The RTX 3090 launched during the crypto boom, so the used market has plenty of units available. Look for cards with original backplates, documented temperature history, and sellers offering returns.

The 350W TDP is the main drawback. At ~200W actual draw during inference, running 24/7 costs roughly ~$265/year at $0.15/kWh. Budget for a 750W+ PSU and a UPS that can handle the load.

For a detailed head-to-head with the current-gen alternative, see RTX 3090 vs 4090 for LLMs.

2. RTX 3080 Ti (~$650) — Best for 7B-13B Models

The used RTX 3080 Ti hits a sweet spot for builders who primarily run 7B-13B models and don’t need the headroom for 32B.

Specs: 12 GB GDDR6X · 912 GB/s bandwidth · 10,240 CUDA cores · 350W TDP

LLM Performance:

  • Llama 3 8B Q4: ~105 tok/s
  • 13B models Q4: Fits with tight context window, ~55-65 tok/s
  • 32B+ models: Does not fit

12 GB of VRAM is enough for 7B-8B models with full context windows and 13B models with reduced context. At ~$650 used, it costs well under half what a 3090 commands and delivers nearly identical bandwidth (912 vs 936 GB/s). If you know you’ll stick with smaller models, the 3080 Ti is a strong option.

The caveat: 12 GB is a tight ceiling. As models grow and 13B becomes the baseline rather than the ceiling, you may find yourself wanting that extra VRAM sooner than expected. If there’s any chance you’ll want to run 32B models, the extra ~$1,080 for a 3090 gets you to 24 GB.

3. RTX 3060 12GB (~$428) — Budget CUDA Entry Point

The used RTX 3060 12GB is the cheapest way to get 12 GB of VRAM on a CUDA card. At ~$428 used, it remains an accessible entry point for local AI — though no longer the impulse buy it once was.

Specs: 12 GB GDDR6 · 360 GB/s bandwidth · 3,584 CUDA cores · 170W TDP

LLM Performance:

  • Llama 3 8B Q4: ~40-50 tok/s
  • 13B models Q4: Fits with reduced context, ~15-20 tok/s
  • 32B+ models: Does not fit

The 3060 12GB has the same VRAM capacity as the 3080 Ti but roughly one-third the bandwidth. That bandwidth gap shows up in raw tok/s numbers — expect about half the speed of a 3080 Ti on equivalent models. But 40-50 tok/s on 8B models is perfectly usable for interactive chat and coding assistants.

The 170W TDP makes this an excellent choice for always-on inference servers. Annual power costs at 24/7 operation run roughly ~$135, about half what a 3090 draws. If you’re building a dedicated Ollama box that runs 7B-8B models, the 3060 12GB is a solid economic choice.

Important: Make sure you’re buying the 12 GB variant, not the 8 GB RTX 3060 Ti. The 3060 Ti has better gaming performance but 4 GB less VRAM — a terrible trade-off for LLM work.

4. Tesla P40 (~$400) — Datacenter Workhorse, 24 GB

The used Tesla P40 is the wildcard pick. 24 GB of VRAM for ~$400 is still the best VRAM-per-dollar in the market, though the price has doubled from its 2024 lows. The trade-offs are real — but for pure LLM inference on a budget, nothing else comes close on VRAM-per-dollar.

Specs: 24 GB GDDR5X · 346 GB/s bandwidth · 3,840 CUDA cores · 250W TDP

LLM Performance:

  • Llama 3 8B Q4: ~15-20 tok/s
  • 13B models Q4: Fits in VRAM, ~10-15 tok/s
  • 32B models Q4: Fits in VRAM, ~5-8 tok/s

The P40 has Pascal-era bandwidth (346 GB/s) which makes it significantly slower than any Ampere card on a per-token basis. But it fits the same model sizes as an RTX 3090 because VRAM capacity, not speed, determines what models load. If your use case tolerates slower generation — batch processing, background summarization, or scenarios where you’re not staring at the screen waiting for tokens — the P40 at less than a quarter the price of a 3090 is compelling.

Critical caveats:

  • No video output. The Tesla P40 is a datacenter card with no display connectors. You need a separate GPU (even a cheap GT 710) for display, or run headless.
  • No fan. The P40 uses a passive heatsink designed for server airflow. You’ll need a server chassis with front-to-back airflow, or an aftermarket blower-style cooler (~$30-50).
  • Older CUDA compute capability. Pascal (compute 6.1) is still supported by llama.cpp and Ollama, but some newer AI frameworks are dropping Pascal support. Verify compatibility with your stack before buying.
  • Power connector. Uses an 8-pin PCIe power connector, but some P40 units have non-standard mounting. Verify the listing shows a standard PCIe bracket.

The P40 is best suited for builders who already have a headless server, want to experiment with larger models without spending ~$1,730 on a 3090, or want a second GPU for offloading layers in a multi-GPU setup.

5. Tesla P100 16GB (~$150) — Older but Still Useful

The used Tesla P100 16GB is the absolute bottom of the price curve for a usable LLM card. At ~$150, it’s nearly disposable.

Specs: 16 GB HBM2 · 732 GB/s bandwidth · 3,584 CUDA cores · 250W TDP

LLM Performance:

  • Llama 3 8B Q4: ~25-30 tok/s
  • 13B models Q4: Fits with reduced context, ~15-20 tok/s
  • 32B+ models: Does not fit

The P100’s HBM2 memory gives it surprisingly good bandwidth (732 GB/s) — more than double the P40’s GDDR5X. This translates to noticeably faster inference despite an older architecture. The 16 GB capacity fits 13B models at Q4, placing it between the 12 GB consumer cards and the 24 GB options.

Same caveats as the P40: no video output, passive cooling only, datacenter form factor. The P100 also comes in PCIe and SXM2 variants — only buy PCIe unless you have a server with an SXM2 baseboard, which you almost certainly don’t.

At ~$150, the P100 is a great card for learning, experimenting, and running 7B-8B models at acceptable speed. If you find yourself wanting more, sell it for what you paid and upgrade to a 3090.


Red Flags: When to Walk Away

These are deal-breakers regardless of price.

Visual artifacts under stress testing. Colored dots, flickering, horizontal lines, or screen corruption during FurMark or gaming benchmarks indicate dying VRAM or GPU die. This is hardware failure and is not fixable.

Missing backplate with no explanation. The backplate provides structural support and aids cooling. Cards sold without backplates were often racked in mining rigs where backplates were removed for airflow. The missing backplate itself isn’t the problem — the implication about the card’s history and the seller’s transparency is.

Suspiciously low prices. If a listing is 30%+ below the going rate with no obvious reason (like a cosmetic flaw the seller discloses), it’s likely a scam, a card with undisclosed damage, or a bait-and-switch where you receive a different card. Check sold listings on eBay for real price benchmarks.

No return policy. Any seller confident in their card’s condition will offer at least a 14-day return window. A seller who refuses returns is telling you something.

BIOS modifications. Some mining cards had their BIOS flashed to alter power limits or fan curves. Ask the seller if the BIOS is stock. Modified BIOS can cause stability issues and may void RMA eligibility on cards still within manufacturer warranty.

Physical damage to the PCIe connector or power pins. Bent pins, scorch marks, or corrosion on the PCIe slot connector or power connectors indicate electrical damage. This card may work today and die tomorrow.


Testing Procedures After Purchase

You have a limited return window. Use these first 48 hours wisely.

Step 1: Visual Inspection (5 Minutes)

Before installing the card, inspect it physically:

  • Check the PCIe connector for bent or discolored pins
  • Verify all fans spin freely by hand
  • Look for bulging or leaking capacitors on the PCB
  • Confirm the backplate is present and properly secured
  • Check for signs of liquid damage (white residue, corrosion)

Step 2: Baseline Monitoring (15 Minutes)

Install the card and boot into your OS. Install GPU-Z and record:

  • GPU clock speed at idle — should match stock specs
  • VRAM junction temperature at idle — should be under 50°C
  • Fan speed and noise — listen for clicking or grinding from bearings

Step 3: Stress Test (30 Minutes)

Run FurMark or Unigine Heaven for at least 30 minutes. During the test:

  • GPU core temperature should stay under 85°C with stock cooling
  • VRAM junction temperature should stay under 100°C (critical for GDDR6X cards)
  • Watch the screen continuously for any artifacts — even a single colored dot is a failure
  • Listen for coil whine — some level is normal, but excessive buzzing may indicate capacitor issues

Step 4: LLM Inference Benchmark (15 Minutes)

This is what actually matters. Install Ollama, pull a model (llama3:8b is a good standard test), and run a generation benchmark:

ollama run llama3:8b "Write a 500-word essay about GPU computing"

Compare your tok/s against the benchmarks listed above for your specific card. Results within 10% of published benchmarks are normal. Results more than 20% below indicate a problem — possibly thermal throttling, power delivery issues, or degraded VRAM.

Step 5: Extended Burn-In (24-48 Hours)

If the card passes steps 1-4, run continuous inference for 24-48 hours. Use a script that repeatedly prompts the model and logs generation speed. Watch for:

  • Speed degradation over time (indicates thermal issues)
  • System crashes or driver resets
  • Increasing VRAM junction temperatures

If anything fails during the burn-in period, initiate a return immediately. Don’t troubleshoot a used card with a ticking return window — send it back and buy another.


The Value Ladder: Making Your Decision

Here’s how to think about the used GPU market for LLMs in March 2026:

CardUsed PriceVRAMBest For
RTX 3090~$1,73024 GBBest overall — runs up to 32B models
RTX 3080 Ti~$65012 GB7B-13B models at high speed
RTX 3060 12GB~$42812 GBBudget 7B-8B with low power draw
Tesla P40~$40024 GB24 GB on a budget, headless only
Tesla P100 16GB~$15016 GBEntry-level experimenting

If you can afford ~$1,730, buy the RTX 3090. It’s the most versatile card on this list and will remain relevant as models grow. At 24 GB, you won’t hit the VRAM ceiling until you’re trying to run 70B+ parameter models.

If your budget is ~$650, the RTX 3080 Ti gives you fast inference on the models most people actually run. The 12 GB VRAM is limiting but sufficient for current 7B-13B workloads.

If you’re just getting started and want to spend as little as possible, the RTX 3060 12GB at ~$428 is a reasonable entry point. Low power, full CUDA support, enough VRAM for the models that matter.

The Tesla cards are specialist picks. The P40 is for budget builders who want 24 GB and don’t mind slower inference and no display output. The P100 is for experimenters who want decent bandwidth at minimal cost.

Whatever you buy, use the testing procedures above within your return window. A used GPU that passes a 48-hour burn-in is likely to last for years. One that doesn’t should go back immediately.

For recommendations on new GPUs, see our best GPU for local LLMs guide. To figure out exactly how much VRAM your target models need, check how much VRAM for LLMs. And for a direct comparison of the most popular used vs. new option, read RTX 3090 vs 4090 for LLMs.

Frequently Asked Questions

Is it safe to buy an ex-mining GPU for LLMs?
Yes, with caveats. Mining GPUs ran sustained compute loads but at fixed temperatures and power limits — they don't wear out the silicon. The real risks are degraded fan bearings and elevated VRAM junction temps from thermal pad compression. Check VRAM junction temps with GPU-Z (under 100°C under load), listen for fan bearing noise, and buy from sellers who accept returns.
How much VRAM do I need for local LLMs?
12 GB runs 7B-8B models at Q4 quantization comfortably. 16 GB fits 13B models. 24 GB is the sweet spot — it handles models up to 32B at Q4 and 13B at Q8 quality. For most home lab users buying used, 24 GB (RTX 3090) or 12 GB (RTX 3060 12GB / RTX 3080 Ti) are the targets. See our full breakdown at how much VRAM for LLMs.
Should I buy an RTX 3090 or a Tesla P40 for LLMs?
The RTX 3090 is the better card in every way — faster bandwidth (936 GB/s vs 346 GB/s), newer architecture, and video output for dual use. But the Tesla P40 costs ~$400 for 24 GB of VRAM versus ~$1,730 for the RTX 3090. If you only care about fitting large models and can tolerate slower inference (15-20 tok/s on 8B), the P40 is a remarkable budget option.
Where is the best place to buy used GPUs?
eBay with buyer protection is the safest option for most people — you get 30-day returns on most listings. r/hardwareswap on Reddit offers lower prices but requires PayPal Goods & Services for protection. Amazon Renewed provides warranty coverage but prices run 10-20% higher. Avoid Facebook Marketplace for GPUs unless you can test in person before paying.
What should I test first after buying a used GPU?
Install GPU-Z and check VRAM junction temperature under load — it should stay under 100°C. Run a stress test like FurMark for 15 minutes and watch for visual artifacts (colored dots, lines, or flickering). Then run an actual LLM inference benchmark with Ollama to verify real-world performance matches expected speeds for that card.

Get our weekly picks

The best home lab deals and new reviews, every week. Free, no spam.

Join home lab builders who get deals first.