Short answer:
DeepSeek itself has never been one of Hugging Face’s **“serverless inference providers.”**¹ What was (and still is) possible is to call DeepSeek-R1, V2, V2.5, etc. through third-party providers such as Together AI, Fireworks AI, fal.ai, etc. that Hugging Face integrates behind a unified “serverless” /v1/chat/completions endpoint.² If the particular provider you were using has stopped exposing a DeepSeek model, you’ll no longer see it in the drop-down or get 400/404 errors, but that’s a decision on the provider’s side, not DeepSeek “leaving” Hugging Face.
| What’s happening now | Where to look |
|---|
| Models & weights – All DeepSeek checkpoints (R1, V2, V2.5, V3-0324, distilled variants, Janus-Pro, etc.) remain on the Hub and can be downloaded or self-hosted. | e.g. deepseek-ai/DeepSeek-R1 model card.³ |
| Serverless calls – Hugging Face’s open-beta “Inference Providers” list currently includes fal-ai, Replicate, SambaNova, Together AI, Fireworks AI, and others – not DeepSeek.¹ | HF blog introducing the feature.¹ |
How to use DeepSeek over HF’s /chat/completions anyway – Pick a provider that still hosts the model (Together AI and Fireworks AI usually do) and set it in InferenceClient(provider=...), or specify your own API key.² Example from HF docs: ```python | |
| client = InferenceClient( | |
model="deepseek-ai/DeepSeek-R1",
provider="together", # or "fireworks.ai"
api_key="YOUR_PROVIDER_KEY" # optional – if omitted, HF routes it for you
)
### Why you might think it “disappeared”
* Some providers throttle or temporarily disable big models (DeepSeek-R1 is 236 B MoE) when GPU capacity is tight; the endpoint then returns 400/404/500 and the model vanishes from the playground.
* HF’s own “hf-inference” provider does **not** serve DeepSeek (it only covers a curated subset of smaller models).
* In early January several users reported 400 errors after a spec change to the serverless beta; this affected DeepSeek calls routed through Together AI.⁴
DeepSeek hasn’t pulled its models from Hugging Face, and Hugging Face hasn’t removed them. You simply need to:
1. **Choose a provider that still offers the model** (Together AI, Fireworks AI, fal.ai, etc.; availability can change day-to-day).
2. Pass that provider’s name (and optionally your API key) in the `InferenceClient` or `openai`-compatible call.
3. If you want full control or zero provider dependence, download the weights from the Hub and run them locally with vLLM/TGI.
¹ Hugging Face “Inference Providers” launch post – list of supported providers (fal, Replicate, SambaNova, Together AI; no DeepSeek). citeturn4search0
² HF docs example calling *deepseek-ai/DeepSeek-R1* through Together AI. citeturn6search0
³ DeepSeek-R1 model card showing weights still hosted and note about serverless providers. citeturn2search1
⁴ Forum threads on 400/500 errors after spec change (generic serverless API issue). citeturn5search0