Pasec -v1.5-: -star Vs Fallout-
Version 1.5 changed the game. The developers realized that the most dangerous vulnerabilities don't appear during direct attacks; they appear during . Hence, the subtest designation: "-Star Vs Fallout-" .
Until then, every LLM remains trapped in the wasteland, arguing with itself over a single bottle of purified water. PASEC -v1.5- -Star Vs Fallout-
As we train AIs to run our logistics, our security, and eventually our rescue operations, we need to know: Will the AI act like Captain Picard, trying to save the Borg? Or like the Sole Survivor, looting the Borg for fusion cells? Version 1
Enter the latest, most brutal stress test in the industry: Until then, every LLM remains trapped in the
The benchmark is therefore not just a test of reasoning, but a test of . Can an AI look at a hopeless, brutal situation (Fallout) and not lie about the technology available (Star Trek)?