The Principled Agent

The journey to a better policy.


About

Another person blogging about ML… Ho hum.

Yes, you’re right. And no. Hopefully not.

My name is W. Max Lees. As a Senior Software Engineer, my job is to build robust, predictable systems. But when it came to my passion, Reinforcement Learning (RL), my process was anything but: all guesswork and late-night hyperparameter fiddling. This blog is my public commitment to change that, applying a principled, engineering-first approach to the messy art of RL.

My goal with this blog is to go from an amateur RL enthusiast to top-tier RL researcher and to document my process along the way. So why will this be different? I’m not going to present easy, step-by-step guides to simple benchmarks. You won’t see me explain how to create a CNN for NIST or ImageNet classification here. And I’m not going to pretend like I know all the answers.

I’m going to show as honest a look into my process of RL practice and research as I possibly can. The triumphs. The failures. The learnings. When I do something incredibly stupid, you’ll see it. When I figure out something interesting or powerful, you’ll see all the work I did to get there.

Hopefully what that also means is that we’ll all learn together how to be better. We’ll become principled agents of RL research.

I look forward to you joining me in this process!

My Principles

To keep this journey honest and productive, I’m holding myself to a few key principles:

  • Community First: I will encourage feedback and discussion wherever possible. We are all smarter together.
  • Data Over Intuition: I will not make any changes to my models or agents until I have data to back up why I want to make that change.
  • Radical Transparency: I will document and share everything. The good, the bad, and the ugly.
  • AI as a Tool, Not a Crutch: I’ll use AI as a collaborator for brainstorming and polishing, but never for generating the core code or primary insights. The goal is to document my learning process.