An interpretability tool for understanding
the internal mechanisms of Turing‑LLM
An interpretability tool for understanding Turing‑LLM
Turing Explorer
Designed to advance the study and understanding of inner mechanisms in LLMs
Turing-LLM
Built to simplify and enhance
comprehension of how LLMs operate
Research
Focused on careful, step-by-step inquiry to broaden our understanding of LLMs
The Mission
By deeply understanding how AI models operate, we have the potential to significantly enhance human intelligence. Through understanding and integrating advanced algorithms found in deep learning models—either by learning them ourselves or by enhancing our brains—we can adopt more effective problem-solving strategies. A comprehensive mechanistic understanding of superintelligent systems can open up opportunities for a coevolutionary journey towards a safer and smarter future.
Progress
Generated a 2B Token Synthetic Dataset
Built and Trained Turing-LLM-1.0-254M
Trained Sparse Autoencoders on LLM Activations
Assembled a Tool to Explore Turing-LLM Internals
Developed Novel Interpretability Approaches
Evaluating Novel Interpretability Approaches
Motivation & Hypothesis
Hypothesis: Solving mechanistic interpretability could allow us to greatly increase human intelligence.

Reasoning:
  • A model that can solve a problem better than any human must contain algorithms that are better for solving the problem than those we have.
  • With mechanistic interpretability, we could discover these more competent algorithms within deep learning models.
  • Once we have acquired these algorithms, we could learn them (updating our software) or modify the brain to contain them (updating our hardware).
  • Thus, interpretability can enable us to advance our own intelligence to keep up with the most intelligent models, not through reliance on superintelligent models but through learning and modifying ourselves.

Conclusion: This may be the best path to true ensured safety, not only through aligning models, but through coevolving together.