So... Alignment problem.

Tetreo · 2 years ago

ranok · 2 years ago

There are a couple of things:

There is the active research into explainable AI (https://www.darpa.mil/program/explainable-artificial-intelligence) that looks at ways to get a better explanation for why an output was given. These techniques generally look at the attention matrices and try to come up with more simple examples for why something was generated. This can work if there is not a source of truth, but allows for more trustworthy human-machine teaming. The XAI program mostly has wound down, I am curious if their techniques would apply to LLMs.
Second there is the integration of AI into a system that enforces some type of “guard rails” around the AI. This works best when there is a source of truth. Some examples are using an LLM to write queries for a more factual database and then take the output from that query and reformat it into natural language. This is where the LangChain project comes in. I built a janky prototype for using ChatGPT to optimize Python code, where the candidates from the AI are compared to the original (slow) code using symbolic analysis, and any differences are then fed back to ChatGPT to refactor until an equivalent candidate is generated. Both of these work because there is a source of truth.