By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Anthropic Open-sources Tool to Trace the “Thoughts” of Large Language Models
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Anthropic Open-sources Tool to Trace the “Thoughts” of Large Language Models
News

Anthropic Open-sources Tool to Trace the “Thoughts” of Large Language Models

News Room
Last updated: 2025/06/08 at 2:03 PM
News Room Published 8 June 2025
Share
SHARE

Anthropic researchers have open-sourced the tool they used to trace what goes on inside a large language model during inference. It includes a circuit tracing Python library that can be used with any open-weights model and a frontend hosted on Neuropedia to explore the library output through a graph.

As InfoQ reported at the time of Anthropic’s original disclosure, their approach to shed light on an LLM’s internal behavior involves replacing the actual model with another one that uses sparsely-active features from cross-layer MLP transcoders instead of the original neurons. These features can often represent interpretable concepts, making it possible to build an attribution graph by pruning away all features that do not influence the output under investigation.

Anthropic’s circuit tracer library can identify replacement circuits and generate attribution graphs from a given model using pre-trained transcoders.

It computes the direct effect that each non-zero transcoder feature, transcoder error node, and input token has on each other non-zero transcoder feature and output logit [Editor’s note: the raw (non-normalized) score a model assigns to each possible output before applying a probability function like softmax].

As one of Anthropic’s researchers noted on Hacker News, the graph reveals intermediate computational steps the model took to sample a token, which can provide useful insights. These insights can then be used to manipulate transcoder features and observe how the model’s output changes, for example.

Anthropic has already used its circuit tracer to study multi-step reasoning and multilingual representations in Gemma-2-2b and Llama-3.2-1b. Below is an example of the attribution graph generated for the prompt “Fact: The capital of the state containing Dallas is”.

In a lenghthy podcast hosted by Dwarkesh Patel featuring Anthropic’s Trenton Bricken and Sholto Douglas, Bricken explained how Anthropic’s research into circuit tracing is a key contribution to LLM mechanistic interpretability, that isthe effort to understand what the core units of computation are inside an LLM. This builds on previous research using toy models, then sparse autoencoders, and eventually circuits.

Now you’re identifying individual features across the layers of the model that are all working together to perform some complicated task. And you can get a much better idea of how it’s actually doing the reasoning and coming to decisions

This is still a very young field, but one that is becoming increasingly critical for the safe use of LLMs:

Depending on how quickly AI accelerates and where the state of our tools are, we might not be in the place where we can prove from the ground up that everything is safe. But I feel like that’s a very good North Star. It’s a very powerful reassuring North Star for us to aim for, especially when we consider we are part of the broader AI safety portfolio

The circuit tracing library can be easily run from Anthropic’s tutorial notebook. Alternatively, you can use it on Neuronpedia or install it locally.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article How did you get my number? Inside the shadowy world of data brokers
Next Article Meta Disrupts Influence Ops Targeting Romania, Azerbaijan, and Taiwan with Fake Personas
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Lululemon fans scramble for budget-friendly dupes as chain confirms price hike
News
Influencer Marketing Examples That Hit 1M+ Views
Computing
Japan’s moon lander ‘crashes AGAIN’ in second botched mission
News
How to Create A LinkedIn Marketing Strategy
Computing

You Might also Like

News

Lululemon fans scramble for budget-friendly dupes as chain confirms price hike

6 Min Read
News

Japan’s moon lander ‘crashes AGAIN’ in second botched mission

4 Min Read
News

UK’s error-prone eVisa system is ‘anxiety-inducing’ | Computer Weekly

30 Min Read
News

WhatsApp finally launches version of app for iPad

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?