Hugging Face provides a wide range of machine learning models that can be easily integrated into your applications. In this guide, we will walk you through the process of using ML models from Hugging Face using Vercel AI SDK, which provides a set of utilities to make it easy to use Hugging Face's APIs.
Before you begin, make sure you have the following:
- A Hugging Face API key
- Vercel account
Create a Next.js application and install ai
and @huggingface/inference
:
pnpm dlx create-next-app my-ai-appcd my-ai-apppnpm install ai @huggingface/inference
Note: You can use npm
or yarn
if you would like
HUGGINGFACE_API_KEY=xxxxxxxxx
- Go to the Hugging Face website at huggingface.co.
- Click on the "Models" tab in the navigation bar.
- On the left-hand side of the models page, you will see a list of task types. Choose the task type that corresponds to your use case. For example, if you want to perform text generation, click on the "Text Generation" option.
- Browse through the available models and select the one that best suits your needs.
- Or skip the above steps and use https://sdk.vercel.ai/ to select a hugging face model
To use the selected ML model, you need to create a Route Handler.
- Create a new file named
app/api/completion/route.ts
in your project. - Add your code to the route handler. It might look something like the code below. In this example, the route handler accepts a
POST
request with aprompt
string. It then generates a text completion using theOpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
model. The response is then streamed back to our page.
import { HfInference } from '@huggingface/inference';import { HuggingFaceStream, StreamingTextResponse } from 'ai';
// Create a new Hugging Face Inference instanceconst Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);
export async function POST(req: Request) { // Extract the `prompt` from the body of the request const { prompt } = await req.json();
const response = await Hf.textGenerationStream({ model: 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5', inputs: `<|prompter|>${prompt}<|endoftext|><|assistant|>`, parameters: { max_new_tokens: 200, // @ts-ignore (this is a valid parameter specifically in OpenAssistant models) typical_p: 0.2, repetition_penalty: 1, truncate: 1000, return_full_text: false } });
// Convert the response into a friendly text-stream const stream = HuggingFaceStream(response);
// Respond with the stream return new StreamingTextResponse(stream);}
Note: Replace OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
in the code with the model name you wish to use.
The Vercel AI SDK provides two utility helpers that streamline the integration process:
HuggingFaceStream
: This helper takes the streaming response received fromHf.textGenerationStream
, decodes and extracts the text tokens, and re-encodes them for easy consumption.StreamingTextResponse
: This helper extends the Web Response class and provides default headers, including the desiredContent-Type': 'text/plain; charset=utf-8
.
By utilizing these helpers, you can pass the transformed stream directly to StreamingTextResponse
, enabling the client to consume the response effortlessly.
Now that you have set up the API route, you can fetch data from it in your components by creating a form with an input for the prompt.
To make this process easier, use the useCompletion
hook, which defaults to the POST
Route Handler we created earlier. If you want to override this default behavior, simply pass a custom 'api' prop to useCompletion({ api: '...'})
.
Open the file where you want to use the ML model, and create the form with necessary inputs, specifying the server action as the form action. Your code could look like this:
'use client';
import { useCompletion } from 'ai/react';
export default function Completion() { const { completion, input, stop, isLoading, handleInputChange, handleSubmit } = useCompletion({ api: '/api/completion' });
return ( <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch"> <form onSubmit={handleSubmit}> <label> Say something... <input className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl" value={input} onChange={handleInputChange} /> </label> <output>Completion result: {completion}</output> <button type="button" onClick={stop}> Stop </button> <button disabled={isLoading} type="submit"> Send </button> </form> </div> );}
Finally, we’ll be deploying the repo to Vercel.
- First, create a new GitHub repository and push your local changes.
- Deploy it to Vercel. Ensure you add all environment variables that you configured earlier to Vercel during the import process.
After successful deployment, your ML model from Hugging Face will be running at the edge, providing faster response times and lower latency.
Congratulations! You have successfully integrated an ML model from Hugging Face with Vercel Functions. Users can now interact with your application and get answers to their questions in real time.