Rishi Jain

Dueling DQN: splitting state value from action advantage

The Double DQN post changed how the bootstrap target is computed and did not touch the network at all, the same convolutional trunk and the same single head emitting one number per action, only the arithmetic on top of it rearranged. The dueling architecture is the other half of that...

June 4, 2026

Notes
Simply explained: dueling DQN

This is the gentle companion to the Dueling DQN notes, the same architecture explained from much further back, every term defined as it turns up and every formula taken apart symbol by symbol. The denser version assumed you were already at home with how a DQN agent works, so here...

June 4, 2026

Notes
Double DQN: why the max leans high

I had been watching a DQN agent learn a small control task and noticed that its predicted value for the early states kept climbing well past anything the actual returns could justify, the agent quietly convinced it was sitting on a goldmine of future reward that never arrived, and the...

June 4, 2026

Notes
Simply explained: double DQN

I wrote up Double DQN already, and reading it back I think I leaned on a fair amount of vocabulary you only have if you have already spent a while with Q-learning: replay buffers, target networks, bootstrapping, the Bellman update, all turning up in the first paragraph as though they...

June 4, 2026

Notes
Practicing ML from scratch on Deep-ML Visit

After a few months of working through PyTorch notebooks and first projects I wanted somewhere to practice the smaller pieces in isolation, the kind of thing that hides inside a library call and that you never really implement yourself: a dot product written out by hand, k-fold cross-validation, PCA, a...

June 3, 2026

Projects

Dueling DQN: splitting state value from action advantage

Simply explained: dueling DQN

Double DQN: why the max leans high

Simply explained: double DQN

Practicing ML from scratch on Deep-ML Visit ↗

Practicing ML from scratch on Deep-ML Visit