Tests the model's specific knowledge regarding the history, geography, transportation, and culture of the German city Karlsruhe.
Each test is one prompt sent to every model in the benchmark.
9 tests × 110 models = 1980 arena votes for reliable rankings.