Facebook iconHow to Use Hugging Face with OpenAI-Compatible APIs?
Blogs/AI

How to Use Hugging Face with OpenAI-Compatible APIs?

Sep 11, 20254 Min Read
Written by Dharshan
How to Use Hugging Face with OpenAI-Compatible APIs? Hero

As large language models become more widely adopted, developers are looking for flexible ways to integrate them without being tied to a single provider. Hugging Face’s newly introduced OpenAI-compatible API offers a practical solution, allowing you to run models like LLaMA, Mixtral, or DeepSeek using the same syntax as OpenAI’s Python client. According to Hugging Face, hundreds of models are now accessible using the OpenAI-compatible client across providers like Together AI, Replicate, and more.

In this article, you’ll learn how to set up and use the OpenAI-compatible interface step by step: from configuring your environment and authenticating your API key, to choosing the right model and provider, and making your first chat completion request. We’ll also look at how to compare different providers based on speed, cost, or availability, all without changing your existing code.

Start building with more flexibility, right from your existing codebase.

What is Hugging Face Inference Providers?

Hugging Face Inference Providers is a system designed for you to run AI models from lots of different backends: Hugging Face's own servers, AWS, Azure, or third-party companies, all through one single interface. You don't need to learn one API for each provider; with a consistent and combined method, you can do it. 

This is particularly helpful for developers who prefer to shop around between providers due to performance, cost or availability, but don’t want to modify code as they do so. Combine that with OpenAI compatibility, and that means you can write OpenAI-style code and run it on models hosted anywhere Hugging Face does.

OpenAI Compatibility in Hugging Face

Hugging Face recently introduced support for OpenAI-compatible APIs, allowing you to use functions like ChatCompletion.create() or Embedding.create() just as you would with the OpenAI Python client. The key difference is that instead of sending your request to OpenAI’s servers, you point it to Hugging Face’s API, which can route the call to a variety of models both open and third-party. This makes it possible to plug in alternatives like Mixtral, Kimi, or LLaMA with minimal changes to your existing code.

Suggested Reads- How To Use Open Source LLMs (Large Language Model)?

How to Set Up OpenAI-Compatible APIs on Hugging Face

To use OpenAI-style code with Hugging Face, you only need to update your API settings and model reference. This section walks you through the exact steps to get started, including how to select specific providers like Together AI or Replicate. 

Unlike OpenAI, you must also specify which provider will run the model by adding a:provider suffix to the model name. This section shows exactly how to set it up.

Step 1: Install Required Packages

pip install openai python-dotenv

Step 2: Configure Your API Key

Create a .env file and add your Hugging Face token:

HF_TOKEN=hf_your_token_here

Then, load it in your Python script:

from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("HF_TOKEN")

Step 3: Initialize the OpenAI Client

from openai import OpenAI
client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=api_key
)

Step 4: Run a Model (Specify Provider Required)

You must include the provider in the model name using the :provider format for example: model-id:provider

Save Your Seat: Live Webinar

Bridging Hugging Face with OpenAI-Compatible APIs: Patterns and Pitfalls

Come Join Us:
Calendar

Thursday, 18 Sept 2025

5:00 - 5:40 PM IST

With:
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

You can explore available models here: https://huggingface.co/models

To check which inference providers support a model:

  1. Open the model page on Hugging Face.
  2. In the top-right corner, click Deploy.
  3. Then click Inference API Providers.
  4. You'll see a list of supported providers for that model.
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1:together",  # ":any other providers" is required
    messages=[{"role": "user", "content": "Tell me a fun fact."}]
)
print(response.choices[0].message.content)

If you don’t want to specify a provider manually, you can use :auto — it will automatically select a supported provider.

Exploring Inference Providers on Hugging Face

Hugging Face's Inference Providers system gives you access to a wide range of AI models hosted by different backend providers all through one unified API. When using the OpenAI-compatible interface, specifying the provider is required by adding a suffix like :together or :replicate to the model name. This tells Hugging Face exactly where to route the request.

Save Your Seat: Live Webinar

Bridging Hugging Face with OpenAI-Compatible APIs: Patterns and Pitfalls

Come Join Us:
Calendar

Thursday, 18 Sept 2025

5:00 - 5:40 PM IST

With:
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Each provider offers different strengths, some are optimized for speed, others for specific hardware, and some for cost-efficiency. Here's a list of the most commonly used providers you can access via Hugging Face:

ProviderSuffixHighlights

Hugging Face

:hf-inference

Models hosted directly by Hugging Face

Together AI

:together

Fast LLM inference with sub-100 ms latency

Replicate

:replicate

Supports both text and image models

fal.ai

:fal-ai

Lightweight, fast response time

SambaNova

:sambanova

Enterprise-grade AI infrastructure

Groq

:groq

High-speed inference on custom silicon

Nscale

:nscale

Scalable inference with private model hosting

Cerebras

:cerebras

AI models running on wafer-scale compute

Hugging Face

Suffix

:hf-inference

Highlights

Models hosted directly by Hugging Face

1 of 8

To use any of these, just append the suffix to your model name. For example:

model="deepseek-ai/DeepSeek-R1:together"

You can browse huggingface.co/models and filter by provider to find out which models are available under each backend. If you use a model without a supported provider or forget the suffix, the request will fail so it’s important to get this right.

This system gives you flexibility to try different models or backends just by changing the provider tag, all without modifying your application logic.

Conclusion

In conclusion, setting up Hugging Face’s OpenAI-compatible API involves just a few key steps: updating the base URL, providing your Hugging Face token, and including the required provider suffix in the model name. This simple setup allows developers to access a wide range of models without changing their existing code. 

Throughout this blog, we explored how this compatibility works, why specifying a provider is essential, and how it fits into Hugging Face’s broader inference system. It’s a practical and flexible approach for anyone looking to build with language models beyond a single provider, and developers who want to understand alternative transport mechanisms can also check STDIO transport in MCP to see how other protocols handle similar connections.

Author-Dharshan
Dharshan

Passionate AI/ML Engineer with interest in OpenCV, MediaPipe, and LLMs. Exploring computer vision and NLP to build smart, interactive systems.

Phone

Next for you

Top 5 AI-Powered CLI Tools for Coding Cover

AI

Sep 12, 20256 min read

Top 5 AI-Powered CLI Tools for Coding

Have you ever wished your terminal could do more than just run commands, like write code, fix bugs, or explain stuff for you? In this article, you’ll learn about 5 AI-powered CLI tools, Codebuff, Gemini CLI, Claude Code, Amazon Q, and Codex, and see how they stack up. By the end, you’ll know which tool fits your coding style, how it can save you time, reduce mistakes, and make your work feel more fun. According to a recent McKinsey report, about 78% of organizations now use AI tools in at least

How to Use Claude Code? (Everything You Need to Know) Cover

AI

Sep 11, 20256 min read

How to Use Claude Code? (Everything You Need to Know)

Have you ever wanted a simple way to get coding help right inside your terminal? This article is about Claude Code, Anthropic’s AI tool that works from the command line, reads your project, and helps with everyday coding tasks like explaining code, automating routine steps, and handling Git commands using plain language.  We’ll guide you through installing it, using the main commands, setting permissions safely, and extending it with external tools through the Model Context Protocol (MCP). By t

What is RLHF Training? A Complete Beginner’s Guide Cover

AI

Sep 9, 20259 min read

What is RLHF Training? A Complete Beginner’s Guide

Have you ever wondered how ChatGPT learned to be so conversational and helpful? The secret sauce is called Reinforcement Learning from Human Feedback (RLHF), a technique that teaches AI models to behave more like humans by learning from our preferences and feedback. Think of RLHF like teaching a child to write better essays. Instead of just showing them good examples, you also tell them "this answer is better than that one" and "I prefer this style over that style." The AI learns from these com