Anthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

beep@piefed.world · 15 days ago

Anthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

supersquirrel · 15 days ago

Can we stop repeating Anthropic’s fear mongering about AI? They are just trying to prop up their stock price and keep the bubble from popping a little longer.

Anthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

Anthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

Teaching Claude Why