Elliot Arledge says Anthropic wants to hire people who play with model internals, pointing to mechanistic interpretability work like Neel Nanda and sparse autoencoders.
anthropic wants to hire people who love to play with the internals of models, aka mechanistic interpretability.
search up neel nanda and sparse autoencoders.
this is what dario is talking about
This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new expert voices, debates, and emerging ideas.
← Back to Artificial Intelligence