Thoughts? Ideas? How do we align these systems, some food for thought; when we have these systems do chain of reasoning or various methods of logically going through problems and coming to conclusions we’ve found that they are telling “lies” about their method, they follow no logic even if their stated logic is coherent and makes sense.
Here’s the study I’m poorly explaining, read that instead. https://arxiv.org/abs/2305.04388
When you ask something to a generative written language AI, it will give an output. When asked later to explain it, it won’t go trough where it found it but instead will read again their previous answer and try to explain it with a now broken context.
It also doesn’t help that some AIs like OpenAI’s ChatGPT will just agree if you try to “correct” it with blatant lies.
Well even if you have it explain in parallel to the answer the explanation is false (as in not matching with it’s “internal reasoning”) , from the perspective of a language model the conversation isn’t even separate moments in time, it’s responses and yours are all part of the same text dump that it’s appending likely text to the end of.
Its more like a student trying to bullshit his way out of a school oral test for not studying.
But what can be done about it? I am sure the AI actually doesn’t understand why it has written the thing it did, there would be the need of a secondary component that saves and pieces together the neural activity of the main AI, or something that compares it to the original dataset (but then it would just be a fancier search engine like Bing’s)