Jackson Kernion

Now: product research, human feedback at Anthropic. Before: MIT Postdoc, UC Berkeley Philosophy PhD. Built deductivelogic.org.

Jackson Kernion is a Technical Staff member at Anthropic, an artificial intelligence company based in San Francisco. He works on Anthropic's human feedback program, focusing on evals, model training, data collection, and interface design.2 Kernion's role involves using human feedback data to teach large language models how to perform targeted tasks.2

Prior to joining Anthropic, Kernion had an academic background in philosophy. He holds a PhD in Philosophy from the University of California, Berkeley, where his dissertation was titled "Constraining Consciousness".1 He also completed a postdoctoral associate position at MIT in Spring 2020, where he taught a lecture course with over 70 students.1

Kernion's areas of specialization include Philosophy of Mind, Philosophy of Cognitive Science, and Epistemology.1 His academic work focused on cognitive architecture and methodology in cognitive science, exploring questions about the functional relationships between perceptual modalities, working memory, and central cognition.2

At Anthropic, Kernion has contributed to several significant research papers in the field of AI, including:

"Training a helpful and harmless assistant with reinforcement learning from human feedback" (2022)3
"Constitutional AI: Harmlessness from AI feedback" (2022)3
"Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned" (2022)3

These publications demonstrate Kernion's involvement in cutting-edge AI research, particularly in areas related to AI safety and alignment.