Efficacy of OpenAI Fine-tuning API, Cost/Latency Considerations, Experiencing Catastrophic Forgetting, and...Getting Roasted by a Fine-tuned GPT-3.5
great work! I'm curious about the latency measurements; did you notice how time-to-first-token affected when streaming responses?
I've also seen some reports that time-to-first-token latency with fine-tuned models is less consistent than vanilla 3.5-turbo; did you observe this?
great work! I'm curious about the latency measurements; did you notice how time-to-first-token affected when streaming responses?
I've also seen some reports that time-to-first-token latency with fine-tuned models is less consistent than vanilla 3.5-turbo; did you observe this?