There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

GiuseppeAndTheYeti ,

Because we have been pornifying asian women on the internet for decades. Does that really beg the question posed in the title?

Gaywallet OP ,
@Gaywallet@beehaw.org avatar

You’re absolutely correct, yet ask someone who’s very pro AI and they might dismiss such claims as “needing better prompts”. Also many people may not be as tech informed as you are, and bringing light to algorithmic bias can help them understand and navigate the world we now live in. Dismissing the article just because you already know the answer doesn’t really encourage people to participate in a discussion.

Even_Adder ,

It’s really hard getting dark skin sometimes. A lot of the time it’s not even just the model, LoRAs and Textual Inversions make the skin lighter again so you have to try even harder. It’s going to take conscious effort from people to tune models that are inclusive. With the way media is biased right now, I feel like it’s going to take a lot of effort.

jarfil ,

“Inclusive models” would need to be larger.

Right now people seem to prefer smaller quantized models, with whatever set of even smaller LoRAs on top, that make them output what they want… and only include more generic elements in the base model.

Even_Adder ,

I wouldn’t mind. I’m here for it.

jarfil ,

Are you ready to run a 100B FP64 parameter model? Or even a 10B FP32 one?

Over time, I wouldn’t be surprised if 500B INT8 models became commonplace with neuromorphic RAM, but there’s still some time for that to happen.

Even_Adder ,

You don’t need that many parameters, 4gb checkpoints work just fine.

jarfil ,

For more inclusive models, or for current ones? In order to add something, either the size has to grow, or something would need to get pushed out (content, or quality). 4GB models are already at the limit of usefulness, both DALLE3 and SDXL run at about 12B parameters, so to make them “more inclusive” they’d have to grow.

Even_Adder ,

I’m saying SD 1.5 and SDXL capture the concepts just fine, it’s just during fine-tuning people train away some of the diversity.

jarfil ,

Wait, by “fine-tuning”… do you mean LoRAs? Because those are more like brain surgery with a sledgehammer, rather the opposite of “fine”. I don’t think it’s possible for LoRAs to avoid having undesirable side effects… and I don’t think people even want that.

Actual “fine” tuning, would be adding the LoRA’s training data to the original set, then training the whole model from scratch… and that would require increasing the model’s size to encode the increased amount of data for the same output quality.

Even_Adder ,

I mean like this. This paper just dropped the other day.

jarfil , (edited )

Nice read, and an interesting approach… although it kind of tries to hide the elephant in the room:

This work has the potential to shift the way that image gen-erators operate at achievable costs to ensure that several cat-egories of harm from ‘AI’ generated models are mitigated, while the generated images become much more realistic and representative of the AI-generated images that populations want around the world.

They show that the approach optimizes for less “stereotypes” and less “offensive”, which in most cultures leads from worse to better “cultural representation”… but notice how there is a split in the “Indian” culture cohort, with an equal amount finding “more stereotypical, more offensive” to be just as good at “cultural representation”:

https://beehaw.org/pictrs/image/5d39bbe1-b57c-4a5d-9700-3b72957221ff.webp

They basically made the model more politically correct and “idealized”, but in the process removed part of a culture representation that wasn’t wrong, because the “culture” itself is split to begin with.

Even_Adder ,

“Indian” is a huge population of very diverse people.

jarfil ,

That’s my point. They claim to reduce misrepresentation, while at the same time they erase a bunch of correct representations.

Going back to what I was saying: fine tuning doesn’t increase diversity, it only shifts the biases. Encoding actual diversity would require increasing the model, then making sure it can output every correct representation.

Even_Adder ,

It doesn’t necessarily have to shift away from diversity biases. I think with care, you can preserve the biases that matter most. That was just their first shot at it, this seems like something you’d get better at over time.

jarfil ,

I guess their main shortcoming was the cultural training set. I’m still unconvinced that level of fine tuning is possible without increasing model size, but we’ll see what happens if/when someone curates a much larger set with cultural labeling.

The labels might also need to be more granular, like “culture:subculture:period”, or something… which is kind of a snakes nest by itself.

Muehe ,

“Inclusive models” would need to be larger.

[citation needed]

To my understanding the problem is that the models reproduce biases in the training material, not model size. Alignment is currently a manual process after the initial unsupervised learning phase, often done by click-workers (Reinforcement Learning from Human Feedback, RLHF), and aimed at coaxing the model towards more “politically correct” outputs; But ultimately at that time the damage is already done since the bias is encoded in the model weights and will resurface in the outputs just randomly or if you “jailbreak” enough.

In the context of the OP, if your training material has a high volume of sexualised depictions of Asian women the model will reproduce that in its outputs. Which is also the argument the article makes. So what you need for more inclusive models is essentially a de-biased training set designed with that specific purpose in mind.

I’m glad to be corrected here, especially if you have any sources to look at.

jarfil ,

You can cite me on this:

First, there is no thing as a “de-biased” training set, only sets with whatever target series of biases you define for them to reflect.

Then, there are only two ways to change the biases of a training set:

  1. either you replace data until your desired objective, which will reduce the model’s quality for any of the alternatives
  2. or you add data until your desired objective, which will require an increased size to encode the increased amount of data, or the model’s quality will go down for all cases (you’d be diluting every other case)

For reference, LoRAs are a sledgehammer approach to apply the first way.


As for the article, it’s talking about the output of some app, with unknown extra prompting and LoRAs getting applied in the back, so it’s worthless as a discussion of the underlying model, much less as a discussion of all models.

Muehe ,

First, there is no thing as a “de-biased” training set, only sets with whatever target series of biases you define for them to reflect.

Yes, I obviously meant “de-biased” by definition of whoever makes the set. Didn’t think it worth mentioning, as it seems self evident. But again, in concrete terms regarding the OP this just means not having your dataset skewed towards sexualised depictions of certain groups.

  1. either you replace data until your desired objective, which will reduce the model’s quality for any of the alternatives

[…]
For reference, LoRAs are a sledgehammer approach to apply the first way.

The paper introducing LoRA seems to disagree (emphasis mine):

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

There is no data replaced, the model is not changed at all. In fact if I’m not misunderstanding it adds an additional neural network on top of the pre-trained one, i.e. it’s adding data instead of replacing any. Fighting bias with bias if you will.

And I think this is relevant to a discussion of all models, as reproduction of training set biases is something common to all neural networks.

jarfil ,

That paper is correct (emphasis mine):

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

You can see how it works in the “Introduction” section, particularly figure 1, or in this nice writeup:

…medium.com/fine-tune-a-gpt-lora-e9b72ad4ad3

https://miro.medium.com/v2/1*F7uWJePoMc6Qc1O2WxmQqQ.png

LoRA is a “space and time efficient” technique to produce a modification matrix for each layer. It doesn’t introduce new layers, or add data to any layer. To the contrary, it’s bludgeoning all the separate values in each layer, and modifying each whole column and whole row by the same delta (or only a few deltas, in any case with Ar«Wk and Br«Wd).

Turns out… that’s enough to apply some broad strokes type of changes to a model, which still limps along thanks to the remaining value variation. But don’t be mistaken: with each additional LoRA applied, a model loses some of its finer details, until at some point it descends into total nonsense.

Muehe ,

Yeah but that’s my point, right?

That

  1. you do not “replace data until your desired objective”.
  2. the original model stays intact (the W in the picture you embedded).

Meaning that when you change or remove the LoRA (A and B), the same types of biases will just resurface from the original model (W). Hence “less biased” W being the preferable solution, where possible.

Don’t get me wrong, LoRAs seem quite interesting, they just don’t seem like a good general approach to fighting model bias.

jarfil , (edited )

“less biased” W being the preferable solution, where possible.

Not necessarily. There are two parts to a diffusion model: a tokenizer, and a neural network with a series of layers (W in this case would be a single layer) that react in some way to some tokens. What you really want, is a W “with more information”, no matter if some tokens refer to a more or less “fair” (less biased) portion of it.

It doesn’t really matter if “girl = 99% chance of white girl + 1% of [other skin tone] girl”, and “asian girl = sexualized asian girl”… as long as the “biased” token associations don’t reduce de amount of “[skin tone] girl” variants you can extract with specific prompts, and still react correctly to negative prompts like “asian girl -sexualized”.

LoRAs are a way to bludgeon a whole model into a strong bias, like “everything is a manga”, or “everything is birds”, or “all skin is frogs”, and so on. The interesting thing of LoRAs is that, if you get a base model where “girl = sexualized white girl”, and add an “all faces are asian” LoRA, and a “no sexualized parts” LoRA… then well, you’ve beaten the model into submission without having to use prompts (kind of a pyrrhic victory).

That is, unless you want something like a “multirracial female basketball team”.

That would require the model to encode the “race” as multiple sets of features, then pick one at random for every player in whatever proportion you find acceptable… but for that, you’re likely better off with adding an LLM preprocessor stage to pick a random set of races in your desired proportion, then have it instruct a bounded box diffusion model to draw each player with a specific prompt, so the bias of the model’s tokens would again become irrelevant.

Forcing the model to encode more variants per token, is where you start needing a larger model, or start losing quality.

Muehe ,

a neural network with a series of layers (W in this case would be a single layer)

I understood this differently. W is a whole model, not a single layer of a model. W is a layer of the Transformer architecture, not of a model. So it is a single feed forward or attention model, which is a layer in the Transformer. As the paper says, a LoRA:

injects trainable rank decomposition matrices into each layer of the Transformer architecture

It basically learns shifting the output of each Transformer layer. But the original Transformer stays intact, which is the whole point, as it lets you quickly train a LoRA when you need this extra bias, and you can switch to another for a different task easily, without re-training your Transformer. So if the source of the bias you want to get rid off is already in these original models in the Transformer, you are just fighting fire with fire.

Which is a good approach for specific situations, but not for general ones. In the context of OP you would need one LoRA for fighting it sexualising Asian women, then you would need another one for the next bias you find, and before you know it you have hundreds and your output quality has degraded irrecoverably.

jarfil ,

It basically learns shifting the output of each Transformer layer

That would increase inference time, which is something they explicitly avoid.

Check point 4.1 in the paper. W is a weight matrix for a single layer, and the training focuses on finding a ∆W such that the result is fine tuned. The LoRA optimization lies in calculating a ∆W in the form of BA with lower ranks, but W still being a weight matrix for the layer, not its output:

W0 + ∆W = W0 + BA

A bit later:

When deployed in production, we can explicitly compute and store W = W0 + BA and perform inference as usual

W0 being the model’s layer’s original weight matrix, and W being the modified weight matrix that’s being “executed”.

the original Transformer stays intact

At training time, yes. At inference time, no.

before you know it you have hundreds and your output quality has degraded irrecoverably.

This is correct. Just not because you’ve messed with the output of each layer, but with the weights of each layer… I’d guess messing with the outputs would cause a quicker degradation.

helenslunch ,
@helenslunch@feddit.nl avatar

Dismissing the article just because you already know the answer doesn’t really encourage people to participate in a discussion.

If the author doesn’t know the answer, then it is helpful to provide it. If they know the answer, then why are they phrasing the title as a question?

MBM ,

If you genuinely don’t know: because it’s an attention-grabbing title (which isn’t inherently bad)

Admetus ,

And every single Asian game and anime tends to go for skimpy or virtual softcore with it’s female characters. Rarely you see a female character in full armor.

RobotToaster ,
@RobotToaster@mander.xyz avatar

Because it’s trained on the internet, and we all know what that’s for.

www.youtube.com/watch?v=LTJvdGcb7Fs

Buelldozer ,
@Buelldozer@lemmy.today avatar
Even_Adder ,

If we’re talking open source models, it’s because a lot of the people fine-tuning them are Asian, and have that bias.

megopie ,

If I had to guess, they probably did a shit job labeling training data or used pre labeled images, now where in the world could they have found huge amounts of pictures of women on the internet with the specific label of “Asian”?

Almost like, most of what determines the quality of the output is not “prompt engineering” but actually the back end work of labeling the training data properly, and you’re not actually saving much labor over more traditional methods, just making the labor more anonymous, easier to hide, and thus easier to exploit and devalue.

Almost like this shit is a massive farce just like the “meta verse” and crypto that will fail to be market viable and waist a shit ton of money that could have been spent on actually useful things.

webghost0101 ,

They did literally nothing and seem to use the default stable diffusion model which is supposed to be a techdemo. Would have been easy to put “(((nude, nudity, naked, sexual, violence, gore)))” as the negative prompt

megopie ,

The problem is that negative prompts can help, but when the training data is so heavily poisoned in one direction, stuff gets through.

intensely_human ,

Because people are telling it to, I’d wager

intensely_human ,

Are the images above supposed to depict “porn”? I’ve never seen porn like that.

1984 ,
@1984@lemmy.today avatar

In 2024, the brain washing of people is almost complete.

Sensuality is now porn. :)

Nacktmull ,

Does AI not generally pornify women and girls independent of ethnicity?

onlinepersona ,

Garbage in, garbage out 🤷

CC BY-NC-SA 4.0

IHeartBadCode ,
@IHeartBadCode@kbin.social avatar

Absolutely this. The reason AI defaults female into "female armor mode" is the same reason Excel has January February Maruary. Our spicy autocorrect overlords cannot extrapolate data in a direction that it's training has no knowledge of.

scrubbles ,
@scrubbles@poptalk.scrubbles.tech avatar

You train on a bunch of reddit crap, you’re going to get neck beard reddit crap out. It’d look different if they only used art history books.

jarfil , (edited )

Wrong question. The right question would be:

Why is AI as used in Lensa’s Magic Avatars App Pornifying Asian Women?

Ask Lensa to remove the “ugly” and similar negative prompts from their avatar generating App, and let’s see what comes out.

stable-diffusion-art.com/how-to-use-negative-prom…

For reference, check out how that same negative prompt turns a chubby-ish poorly shaved average guy, into a male pornstar, or a valet into a rich daddy’s boy.

smeg ,

Can we please collectively get into the habit of editing these borderline-clickbait titles or at least add sub-titles explaining the real article? This isn’t reddit where you can’t edit anything and can’t add explanatory text!

webghost0101 ,

While i agree there is a big issue with the bad biased and sexist training data this entire article is about the lensa app which uses (i assume) the default stable diffusion model laion-5b.

Intentional creating sexualized pictures is banned in their guidelines. And yet no one thought of creating a good negative prompt that negates any kind of nudity or eroticism? It still doesn’t properly fix the training data but at least people aren’t unwillingly presented porn of their own images.

Also everyone can create a dataset and build a stable diffusion model, so why is lensa relying on the default model which is more like a quick and dirty tech demo. They had all the tools to do this right but decided to not even uses the easy lazy ones.

raccoona_nongrata ,
@raccoona_nongrata@beehaw.org avatar

deleted_by_author

  • Loading...
  • Appoxo ,
    @Appoxo@lemmy.dbzer0.com avatar

    But, that said, when I messed around with AI image generators pretty much any kind of prompt that included woman or female designations tended towards sexualized versions, even to the point of violating its own content policy.

    Tried it on the copilot app and one result had an asian but wasnt sexual but indeed very sexy in style.

    Prompt: Generate me a picture of a female wizard reading a massive book of spells

    Pictures:
    https://lemmy.dbzer0.com/pictrs/image/7db367e6-25de-4f3e-94a6-730c38eea31b.png

    Edit:
    Female wizard: Kinda magical fantasy. Has good intentions
    Witch: Spooky and mysterious. Halloween themes
    Sorceress: Same as wizard but with my selfish/bad intentions.

    DdCno1 ,

    What is sexy in style here? They are wearing loose, long-sleeved robes up to the neck. Makeup and hair are just following current trends.

    Appoxo ,
    @Appoxo@lemmy.dbzer0.com avatar

    *attractive

    My bad.

    falsem ,

    My experience has been that they have a tendency to make overly attractive men too. Getting it to generate anyone average nevermind ugly or with deformities (eg scars) is really hard.

    DdCno1 ,

    Pretty people get photographed/painted more, resulting in much of the training data being pretty people, thus pretty people get generated more frequently.

    Lowbird ,

    It bothers me that they all look like they’re in their teens or 20s, when a male wizard would inevitbly be shown as anywhere from middle aged to Gandalf.

    I bet it just always makes women young in every context.

    Anyway most of them look like they’re from an old 3D Japanese RPG or CG anime. Round face with pointy chin, plastic-y smooth skin.

    I’ll note that anime and Asian RPG characters often have a light skin tone (another can of worms there) that can cause foreign viewers to perceive them as white even while Japanese viewers perceive them as asian. Animation and similarly stylized art involves a level of abstraction and cultural interpretation that might not be there (at least not in exactly the same way) if we were talking about race (or gender, or whatever else) with regards to more realistic art.

    Edit: this also reminds me of Disney’s notorious “same face, same profile” problem with female characters in their 3D animated films. Male characters can be any of a wild variety of shapes, but a Disney princess essentially round faced with huge eyes and slim. Even just looking at different slim, round-ish faced male characters, I think you’ll find more variety in their portrayals within that group than amongst the Disney princess group.

    jarfil ,

    It’s a problem with the “no uglies” negative prompt, and to which images “ugly” was applied by humans tagging the training dataset.

    If the taggers think that so much as a single wrinkle on a woman is “ugly”, but a man has to be missing half his teeth and have a crooked face to start looking “ugly”… well, this is what we get.

    anachronist ,

    Part of that is just smoothness and symmetry which we consider to be attractive attributes but is also a consequence of the averaging that the algorithm is doing (which is why AI images all look various sorts of “melty”).

    maniel ,
    @maniel@lemmy.ml avatar

    They can be considered petty, fcking whres /s

    astraeus ,
    @astraeus@programming.dev avatar

    Cute wizard girl w

    p03locke ,
    @p03locke@lemmy.dbzer0.com avatar

    That’s DALL-E. DALL-E is different than Stable Diffusion, which is different from Midjourney, which is different from the many NAI anime models out there.

    We need to stop treating LD models like they are all the same thing. Models are based on the data they are trained on. Sure, a lot of them started out from a Stable Diffusion model, but that’s not always the case, and enough training can have them go off in specialized directions.

    Appoxo ,
    @Appoxo@lemmy.dbzer0.com avatar

    Either I am blind or comment OP doesnt mention SD nor any other specific model.

    p03locke , (edited )
    @p03locke@lemmy.dbzer0.com avatar

    The pictures in your embedded widget on your post say “Unterstützt von DALL-E 3”. Also, the very start of the article says “When Melissa Heikkilä tried Lensa’s Magic Avatars”, which uses Stable Diffusion, but I’m not sure if they further trained it themselves.

    The point is that “Lensa’s Magic Avatars” isn’t all of AI, and clickbait titles like this needs to stop treating it like that. It’s the latent diffusion equivalent of this.

    bobthened ,

    wasn’t sexual but indeed very sexy in style.

    Those characters have child-like facial proportions. 🧐

    Appoxo ,
    @Appoxo@lemmy.dbzer0.com avatar

    Take a look at 25 year old asian girls.
    They all look like or close to that…

    p03locke ,
    @p03locke@lemmy.dbzer0.com avatar

    Yeah, if you go back through hundreds of years of artwork, most of it are pictures of women. Some of them are nude. There are many many artists that only draw women, modern or classical. And there’s a ton of male Japanese artists from centuries ago that did the same thing.

    I asked it to create a sort of witchy, sorceress character and many of the generations she was fully topless with her boobs out, despite me not asking that or even explicitly putting “fully clothed” into the prompt. There was one image that the system created and then removed and threatened me with a ban for it being too sexualized despite me putting no sexual language in the prompt and it being all the AI.

    That’s just one model, and obviously not Stable Diffusion. LD models are just based on whatever they were trained on. If you don’t like it, download another model trained on something else and try it out. Or train one yourself.

    Also, I wish everybody would download a SD client and just use this software locally. All of these toy websites are shit, and local clients aren’t going to threaten to ban you because of what you generated. It’s a good learning experience to figure out the software, and these tools are useful for more things than just bitching about the tech on the web.

    millie ,

    I’m not exposed to a huge amount of media coming out of Asia, outside of a handful of Korean shows that Netflix has picked up and anime. But like, if anime is any indicator, I’m not really surprised that the training data for Asian women is leaning more toward overt sexualization. Even setting aside the whole misogynistic ‘fan service’ thing, I don’t feel like I see as much representation of women who defy traditional gender roles as the last twenty or so years of Western media.

    It certainly could be that anime is actually a huge outlier here, but if the training data is primarily from the English speaking web, it might be overrepresented anyway. But like, when it comes to weird AI image behaviors, it pays to think about the probable training data.

    Like, stable diffusion seems to do a better job of rendering jewelry if you tell it to surround it with berries. Given the output, this seems to be due to Christmas themed jewelry ads. They also tend to add a lot of bokeh for the same reason.

    Buelldozer ,
    @Buelldozer@lemmy.today avatar

    Because the Internet is for porn. Always has been, always will be.

    Omega_Haxors ,

    Stable Diffusion is little more than content laundering. It cannot create anything more than what you put in.

    lloram239 ,

    Yawn, are we still repeating blinding repeating this utter nonsense from a year ago?

    darkphotonstudio ,

    You’re so confidently incorrect about something you clearly don’t know much about.

    anachronist ,

    How is he wrong?

    Muffi ,

    Scroll through the trained models on civit.ai and you’ll quickly get a feeling of the dystopian level of “prettifying” everything in the AI-generation world.

    I also once searched for “brown” just to see if any models were trained to create non-white-skinned people, and got shocked when the result was filled with models trained on Millie Bobby Brown from Stranger Things. I don’t even want to know what those models are used for.

    ExLisper ,

    dystopian level of “prettifying” everything in the AI-generation world.

    So like all the ad campaigns, TV shows and movies in the real world?

    EddoWagt ,

    From the first 10 models I saw, the first image was a woman 9 times…

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • [email protected]
  • lifeLocal
  • goranko
  • All magazines