Comparing LLMs on "Real-World" Retrieval
Read OriginalThe article details a personal evaluation of 8 instruction-tuned LLMs (including GPT-4, Claude, Gemini, and open-source models) on a custom "real-world" retrieval task. The author uses ~85 doctor-patient transcripts to test model performance on three questions of varying difficulty, moving beyond standard benchmarks to assess reasoning on unstructured data likely absent from training sets.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser