There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

0x0 , 8 months ago

I wonder if there are tons of loopholes that humans wouldn’t think of, ones you could derive with access to the model’s weights.

Years ago, there were some ML/security papers about “single pixel attacks” — an early, famous example was able to convince a stop sign detector that an image of a stop sign was definitely not a stop sign, simply by changing one of the pixels that was overrepresented in the output.

In that vein, I wonder whether there are some token sequences that are extremely improbable in human language, but would convince GPT-4 to cast off its safety protocols and do your bidding.

(I am not an ML expert, just an internet nerd.)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Federation

Status:

On | Off

/m/[email protected]

Threads (9049)

Microblog (10)

People

Magazines

Thread

The_Picard_Maneuver

@[email protected]

Added: 8 months ago
Views: 32
Online: -
Ratio: 0

Magazine

lemmyshitpost

@[email protected]

Welcome to Lemmy Shitpost. Here you can shitpost to your hearts content.

Anything and everything goes. Memes, Jokes, Vents and Banter. Though we still have to comply with lemmy.world instance rules. So behave!

Rules:

Be Respectful___ Refrain from using harmful language pertaining to a protected characteristic: e.g. race, gender, sexuality, disability or religion. Refrain from being argumentative when responding or commenting to posts/replies. Personal attacks are not welcome here. …

No Illegal Content___ Content that violates the law. Any post/comment found to be in breach of common law will be removed and given to the authorities if required. That means: -No promoting violence/threats against any individuals -No CSA content or Revenge Porn -No sharing private/personal information (Doxxing) …

No Spam___ Posting the same post, no matter the intent is against the rules. -If you have posted content, please refrain from re-posting said content within this community. -Do not spam posts with intent to harass, annoy, bully, advertise, scam or harm this community. -No posting Scams/Advertisements/Phishing Links/IP Grabbers -No Bots, Bots will be banned from the community. …

No Porn/ExplicitContent ___ -Do not post explicit content. Lemmy.World is not the instance for NSFW content. -Do not post Gore or Shock Content. …

No Enciting Harassment,Brigading, Doxxing or Witch Hunts ___ -Do not Brigade other Communities -No calls to action against other communities/users within Lemmy or outside of Lemmy. -No Witch Hunts against users/communities. -No content that harasses members within or outside of the community. …

NSFW should be behind NSFW tags.___ -Content that is NSFW should be behind NSFW tags. -Content that might be distressing should be kept behind NSFW tags.

…

If you see content that is a breach of the rules, please flag and report the comment and a moderator will take action where they can.

Also check out:

Partnered Communities:

1.Memes

2.Lemmy Review

3.Mildly Infuriating

4.Lemmy Be Wholesome

5.No Stupid Questions

6.You Should Know

7.Comedy Heaven

8.Credible Defense