Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.

exu , 1 hour ago

There’s a reason why the open llm leaderboard was changed a while ago.
Basically, scores didn’t improve much anymore and many tests were contained in the training data.

See this blogpost for more info.

huggingface.co/spaces/open-llm-leaderboard/blog

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

MajorHavoc , 1 hour ago

“close to meaningless” sums up my expert opinion on the whole current AI hype machine sales pitch.

Highly tuned models for incredibly specific, not-dangerous use cases is the next pragmatic step. There’s a lot to excited about, in that very narrow band.

Anyone selling more than that is part of a con, or in very rare cases, doing genuine “fuck off and ask me again in a decade” kinds of research.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_A , 2 hours ago

Looks quite satisfying to me, otherwise, we can still create new tests … :

The tests cover an astounding range of knowledge, such as eighth-grade math, world history, and pop culture. Many are multiple choice, others take free-form answers. Some purport to measure knowledge of advanced fields like law, medicine and science. Others are more abstract, asking AI systems to choose the next logical step in a sequence of events, or to review “moral scenarios” and decide what actions would be considered acceptable behavior in society today.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Buffalox , 2 hours ago

Much like IQ tests for humans are flawed too. Figuring out series of numbers or relations in a graphic representation, only tells how good you are at these specific tasks, and doesn’t provide a reliable picture of “general” intelligence.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...