4 Comments
Sep 25, 2023Liked by Barry Z

great work! I'm curious about the latency measurements; did you notice how time-to-first-token affected when streaming responses?

I've also seen some reports that time-to-first-token latency with fine-tuned models is less consistent than vanilla 3.5-turbo; did you observe this?

Expand full comment