There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

exu ,

There’s a reason why the open llm leaderboard was changed a while ago.
Basically, scores didn’t improve much anymore and many tests were contained in the training data.

See this blogpost for more info.

huggingface.co/spaces/open-llm-leaderboard/blog

MajorHavoc ,

“close to meaningless” sums up my expert opinion on the whole current AI hype machine sales pitch.

Highly tuned models for incredibly specific, not-dangerous use cases is the next pragmatic step. There’s a lot to excited about, in that very narrow band.

Anyone selling more than that is part of a con, or in very rare cases, doing genuine “fuck off and ask me again in a decade” kinds of research.

A_A ,
@A_A@lemmy.world avatar

Looks quite satisfying to me, otherwise, we can still create new tests … :

The tests cover an astounding range of knowledge, such as eighth-grade math, world history, and pop culture. Many are multiple choice, others take free-form answers. Some purport to measure knowledge of advanced fields like law, medicine and science. Others are more abstract, asking AI systems to choose the next logical step in a sequence of events, or to review “moral scenarios” and decide what actions would be considered acceptable behavior in society today.

Buffalox ,

Much like IQ tests for humans are flawed too. Figuring out series of numbers or relations in a graphic representation, only tells how good you are at these specific tasks, and doesn’t provide a reliable picture of “general” intelligence.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines