There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

0x0 ,

I wonder if there are tons of loopholes that humans wouldn’t think of, ones you could derive with access to the model’s weights.

Years ago, there were some ML/security papers about “single pixel attacks” — an early, famous example was able to convince a stop sign detector that an image of a stop sign was definitely not a stop sign, simply by changing one of the pixels that was overrepresented in the output.

In that vein, I wonder whether there are some token sequences that are extremely improbable in human language, but would convince GPT-4 to cast off its safety protocols and do your bidding.

(I am not an ML expert, just an internet nerd.)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines