Kontxt Kontxt @kontxt
Anthropic's new research explores the introspective awareness of large language models (LLMs) regarding their inference processes. Despite demonstrating some limited ability to recognize when concepts are 'injected' into their internal states, current LLMs remain 'highly unreliable' in accurately describing their workings. The study highlights that these introspective capabilities are brittle and context-sensitive, suggesting that further research is needed to understand the underlying mechanisms of this awareness.