Daniel Miessler • 11/11/2024

Using the Smartest AI to Rate Other AI

The article details the creation of a 'rate_ai_result' Pattern within the Fabric framework. It describes a system where a sophisticated 'Judging AI' (specifically o1-preview) is given the original input, task instructions, and the output from a model being tested (e.g., GPT-3.5-Turbo) to assess its performance quality across thousands of dimensions, comparing it to human-level execution.

0 comments

#llm #prompt engineering #AI Evaluation