Researchers showed that large language models use a small, specialized subset of parameters to perform Theory-of-Mind reasoning, despite activating their full network for every task. This sparse internal circuitry depends heavily on positional encoding, especially rotary positional encoding, which shapes how the model tracks beliefs and perspectives.