• Simply explained: dueling DQN

    This is the gentle companion to the Dueling DQN notes, the same architecture explained from much further back, every term defined as it turns up and every formula taken apart symbol by symbol. The denser version assumed you were already at home with how a DQN agent works, so here...

  • Double DQN, and the trouble with taking the max

    I had been watching a DQN agent learn a small control task and noticed that its predicted value for the early states kept climbing well past anything the actual returns could justify, the agent quietly convinced it was sitting on a goldmine of future reward that never arrived, and the...

  • Simply explained: double DQN

    I wrote up Double DQN already, and reading it back I think I leaned on a fair amount of vocabulary you only have if you have already spent a while with Q-learning: replay buffers, target networks, bootstrapping, the Bellman update, all turning up in the first paragraph as though they...

  • Practicing ML on Deep-ML, and fixing one of its problems Visit

    After a few months of working through PyTorch notebooks and first projects I wanted somewhere to practice the smaller pieces in isolation, the kind of thing that hides inside a library call and that you never really implement yourself: a dot product written out by hand, k-fold cross-validation, PCA, a...

  • Quantization notes, from FP32 down to packed 4-bit weights

    The other day I wanted to understand how a 7-billion-parameter model, which in full FP32 precision wants 28 GB just to hold its weights, gets squeezed onto a single consumer GPU. The trick is quantization: store each weight as an 8-bit integer rather than a 32-bit float and that 28...