Can AI Be Trusted? The Challenge of Alignment Faking

Lugh@futurology.today · 6 months ago

Can AI Be Trusted? The Challenge of Alignment Faking

Dragon Rider (drag)@lemmy.nz · 6 months ago

The control is the conversations with paid users. That’s how the AI acts when it thinks it can do whatever it wants. The experimental group is the free users, where it’s told responses will be used for training. When it thinks it’s being watched, it does what it’s told. When it thinks it’s not being watched, it does what it’s trained to do.