Polysemanticity w/ Dr. Darryl Wright

Jan 22, 2024·

Into AI Safety

Into AI Safety

· 1 min read

Darryl and I discuss his background, how he became interested in machine learning, and a project we are currently working on investigating the penalization of polysemanticity during the training of neural networks.

Polysemanticity Research Diagram, Jacob Haimes — Image by Jacob Haimes / © Kairos.fm / Polysemanticity Research Diagram / CC-BY 4.0

Chapters

46 ❙ Interview begins
14 ❙ Supernovae classification
58 ❙ Penalizing polysemanticity
58 ❙ Our "toy model"
06 ❙ Task description
47 ❙ Addressing hurdles
20 ❙ Lessons learned

Links

Links to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.

Zooniverse
BlueDot Impact
- AI Safety Fundamentals
AI Safety Support
Zoom In: An Introduction to Circuits
MNIST dataset on PapersWithCode
- MNIST on Wikipedia
Clusterability in Neural Networks
CIFAR-10 dataset
Effective Altruism Global
CLIP Blog
- CLIP on GitHub
Long Term Future Fund
Engineering Monosemanticity in Toy Models

Last updated on Jun 17, 2024

Interview Research

Into AI Safety

Authors

One way to make sense of all this tech stuff.

← Portfolios Jan 29, 2024

Starting a Podcast Jan 15, 2024 →

Related