-
Another Day, Another Bug – Breakout Baseline #7
Reinforcement learning is a fickle beast. One minute, you’re on top of the world, just having discovered a novel algorithm that performs amazingly well, the next you’re down in the dumps, your algorithm shattered, and your agent barely functioning. Today is that story. Where We Left Off Last time, we deconstructed our “helpful bug” and Continue reading
-
Is It Bias or Noise – Breakout Baseline #6
Ladies and gentlemen… today you’ll see the matchup of a lifetime. All the way from RL Canon, you know it, you love it. It’s entropy bonus!!! If you let them, they’ll make your agent explore from sun up to sun down, never settling on a policy. In the other corner, a newcomer to the ring, Continue reading
-
Forensic RL: Investigating a Surprisingly Successful Bug – Breakout Baseline #5
The case landed on my desk late. The loss curves were a mess, all volatility and noise, the kind of data that spells trouble. Then the score walked in: 84. A perfect score. Too perfect. I’d been chasing that number for days. Turns out, the whole thing was built on a lie: a bug in Continue reading
-
A Whole New Worldview – Breakout Baseline #4
MLPs are the most basic type of deep neural networks. They are quick, easy, and cheap. CNNs are quick, easy, and cheap too but I didn’t want to jump the gun and immediately use them. But today, we’re going to make the leap from basic to bespoke, using an architecture that was designed for the Continue reading
-
An Agent of Chaos – Breakout Baseline #3
As Shakespeare once wrote: To exploit or to explore. That is the question.Whether ’tis nobler for the agent to sufferThe meager rewards of a known, safe policy,Or to take arms against a sea of unknown states,And by exploring, discover a better one? That guy was way ahead of his time, huh! Today we’re taking a Continue reading
-
Can You See What I See – Breakout Baseline #2
The biggest problem with tuning my RL agents is that I’m flying blind. How can I possibly turn the right knobs when my only instruments are a scrolling loss value and a final score? Today, we’re upgrading the cockpit with a full instrument panel. Let’s visualize what our agent is actually thinking and use that Continue reading
-
Breaking Out – Breakout Baseline #1
The hardest part of Reinforcement Learning isn’t the math or the complex theory or the ML frameworks. It’s what comes after the code finally runs. For me, that’s when the real challenge begins: tuning the agent. In the past, my process was mostly guesswork: endless fiddling with hyperparameters until, by sheer luck, something worked. More Continue reading