So... Alignment problem.

Tetreo · 2 years ago

So... Alignment problem.

SSUPII · 2 years ago

When you ask something to a generative written language AI, it will give an output. When asked later to explain it, it won’t go trough where it found it but instead will read again their previous answer and try to explain it with a now broken context.

It also doesn’t help that some AIs like OpenAI’s ChatGPT will just agree if you try to “correct” it with blatant lies.

Tetreo · 2 years ago

Well even if you have it explain in parallel to the answer the explanation is false (as in not matching with it’s “internal reasoning”) , from the perspective of a language model the conversation isn’t even separate moments in time, it’s responses and yours are all part of the same text dump that it’s appending likely text to the end of.

SSUPII · 2 years ago

Its more like a student trying to bullshit his way out of a school oral test for not studying.

But what can be done about it? I am sure the AI actually doesn’t understand why it has written the thing it did, there would be the need of a secondary component that saves and pieces together the neural activity of the main AI, or something that compares it to the original dataset (but then it would just be a fancier search engine like Bing’s)

ranok · 2 years ago

There are a couple of things:

There is the active research into explainable AI (https://www.darpa.mil/program/explainable-artificial-intelligence) that looks at ways to get a better explanation for why an output was given. These techniques generally look at the attention matrices and try to come up with more simple examples for why something was generated. This can work if there is not a source of truth, but allows for more trustworthy human-machine teaming. The XAI program mostly has wound down, I am curious if their techniques would apply to LLMs.
Second there is the integration of AI into a system that enforces some type of “guard rails” around the AI. This works best when there is a source of truth. Some examples are using an LLM to write queries for a more factual database and then take the output from that query and reformat it into natural language. This is where the LangChain project comes in. I built a janky prototype for using ChatGPT to optimize Python code, where the candidates from the AI are compared to the original (slow) code using symbolic analysis, and any differences are then fed back to ChatGPT to refactor until an equivalent candidate is generated. Both of these work because there is a source of truth.