Jacob Hilton's Homepage

I am a researcher and the executive director at the Alignment Research Center, where we are working on outperforming random sampling using a mechanistic understanding of neural network behaviors. Prior to joining ARC, I worked at OpenAI on a variety of reinforcement learning-related topics: truthfulness of language models (ChatGPT, WebGPT and TruthfulQA), scaling laws for RL and overoptimization, and interpretability for RL. Before that, I worked at Jane Street, and before that, I was a PhD student in combinatorial set theory.

If you're interested in getting into language model alignment, you may find my curriculum helpful.

My email addresses are [firstname].[lastname]@gmail.com (personal) and [firstname]@alignment.org (work).

Machine learning articles:

Low probability estimation in language models (code, blog)
Iterated expectations for heuristic estimators* (blog)
Backdoor defense, learnability and obfuscation (blog, talk)
Scaling laws for reward model overoptimization (blog)
Verbalized calibration*
InstructGPT* (blog)
WebGPT (blog, forum)
GSM8K* (blog)
Scaling laws for RL (forum)
Batch size-invariance for policy optimization (code, poster)
TruthfulQA* (code)
Understanding RL Vision (code)
Phasic Policy Gradient* (code)
Procgen Benchmark* (blog)
*Secondary contribution

Machine learning notes:

Deep Learning Curriculum
Basis-invariant attribution
KL divergence of max-of-n
Double-GAE
Learning rate warmup for Adam
Preconditioning for SGD

Maths articles:

Combinatorics of countable ordinal topologies (my PhD thesis)
Topological Ramsey numbers and countable ordinals (poster, slides)
The topological pigeonhole principle for ordinals
Any modification of Müller's Markov process is transient
Lebesgue measurability and large cardinals (my master's thesis)
The Hex Factor: The NIST Hash Function Competition

Coding projects:

Poisson timer
Avalon setup tool
Command-line backgammon bot
Recurrent neural network text generator
Find me a task
Stratego Legends
My Giving
SearchBar
Set-theoretical charades
Runeword calculator
Rapid Roll

Music:

Birthday Mix 2 (accordion video)
Birthday Mix (piano video)
Seder songbook
"Little" Fugue in G minor*
Mars*
Tocatta and Fugue in D minor*
Adonai Oz*
Variations on "Little Bo Peep"*
*Accordion duet