Intro
There are three principal ways if you want to want to have an llm with custom responses and operating on custom (maybe private and confidential) data.
- Train your own LLM
- Fine-tune an existing LLM
- Use a default model, but leverage a long context with information to improve responses for your domain (RAG)
In an ideal world, anyone would simply train their own LLM. However, this would cost millions or even billions of dollars today, making it affordable only for companies like Google, Facebook, and Microsoft. And it would also take a long time to do so. It’s not affordable for the average Joe (like me).
The two cost-effective alternatives are
- fine-tuning an existing model and
- using a RAG with an existing model.
We have already explored the RAG approach over here
This article will focus on options to fine-tune an OpenAI ChatGPT model. Let’s go!
Why Fine-Tune a GPT Model?
Fine-tuning allows you to customize a pre-trained model to better align with your specific needs and applications. This process can significantly enhance the quality of results beyond what can be achieved through simple prompting. Here are some key advantages of fine-tuning:
- Higher Quality Results: Fine-tuning can yield more precise and reliable outputs by training on a greater number of examples than what can fit in a single prompt.
- Cost Efficiency: By reducing the need for lengthy prompts (RAG approach), fine-tuned models save on token usage and lower latency. A fine-tuned model can be a bit more expensive upfront (training costs), but it can be much cheaper to run when used frequently. It also potentially allows for the use of cheaper base models (4o-mini).
- Task Specialization: Fine-tuning allows models to handle specific tasks, styles, or tones, making them ideal for niche applications.
Getting Started with Fine-Tuning
Fine-tuning a GPT model involves several key steps, which are straightforward thanks to OpenAI’s intuitive interface. Below is a step-by-step guide to help you navigate the process effectively.
Step 1: Prepare Your Training Data
Start by collecting relevant input-output pairs that reflect how you want the model to respond. For instance, if you’re developing a customer service chatbot, gather typical questions and their best responses. Format this data into a JSON Lines file (.jsonl), ensuring each line follows the chat completions API format.
{"messages": [{"role": "user", "content": "What is your refund policy?"}, {"role": "assistant", "content": "You can request a refund within 90 days of purchase."}]}
{"messages": [{"role": "user", "content": "What is the phone number of the service desk?"}, {"role": "assistant", "content": "The phone number of the service desk is +1-232-256-7420"}]}
Step 2: Log into OpenAI
Access the OpenAI platform via the Developer Dashboard rather than the ChatGPT interface. This platform provides all the tools necessary for fine-tuning.
Step 3: Upload Your Training File
Navigate to the fine-tuning section and upload your prepared .jsonl file. This step is crucial for setting the stage for fine-tuning.
Step 4: Start a Fine-Tuning Job
After uploading your data, initiate a fine-tuning job by selecting the appropriate base model (e.g., gpt-4o-mini). You can adjust settings like the learning rate, but default settings are typically sufficient for most cases.
Step 5: Monitor the Progress
Use the fine-tuning dashboard to track the status of your job. Once complete, you’ll receive a new model ID that you can use in your API calls.
Step 6: Use Your Fine-Tuned Model
With your fine-tuned model ready, test it with various prompts to evaluate its performance. OpenAI’s playground can be a useful tool to compare responses from different models.
Step 7: Evaluate and Deploy
Assess the model’s performance to ensure it meets your expectations. It’s essential to build a test harness around your tuned model using evals. Evals are to AI what tests are to software engineering. You can do it without evals, but then it will suck. OpenAI released their evals framework which is available on GitHub.
If necessary, refine your dataset and fine-tune again. Once satisfied, deploy the model in your application’s environment.
When to Consider Fine-Tuning
Before diving into fine-tuning, it’s crucial to determine whether it’s the right solution for your needs. Fine-tuning is particularly beneficial when:
- You need a model to consistently follow complex prompts.
- You want to establish a specific style or tone.
- You need to handle many edge cases in a specific way.
In cases where prompt engineering or a RAG can achieve desired results, consider these methods first due to their quicker feedback loops.
Cost
The cost depends on the model you want to fine-tune and the amount of input data. A rule of thumb is that fine-tuning gpt-4o-mini with around 500 book pages of knowledge (question-answer pairs) for 4 epochs will cost you approximately $2 USD. It’s not very expensive, and you can easily see why it might be cheaper in the long run than using a larger model with a RAG. The full pricing is available here. To estimate the number of tokens, you can use the OpenAI Token estimator.
Fine-Tuning Use Cases
Fine-tuning is suitable for a variety of applications, including:
- Customer Support: Tailoring responses to align with company policies and tone.
- Content Generation: Adapting style and tone to match brand guidelines.
- Data Extraction: Structuring output for specific data fields in a consistent format.
Conclusion
Finetuning surprisingly simple and cost effective. But starting with a RAG might be the sane and quicker alternative for many usecases.
More
- OpenAI’s docs on fine-tuning: https://platform.openai.com/docs/guides/fine-tuning
- Estimating tokens that will be consumed based on your input data: https://platform.openai.com/tokenizer
- OpenAI’s test framework to ensure you get good results consistently: https://github.com/openai/evals
- Blog post on GPT-4o’s fine-tuning capabilities: https://openai.com/index/gpt-4o-fine-tuning/