Policy Gradient

Policy Gradient methods are a type of reinforcement learning algorithm that directly optimizes the policy function, which maps states to actions. Instead of learning a value function (like Q-learning) and then deriving a policy from it, policy gradient methods directly learn the optimal policy by adjusting its parameters to maximize the expected reward. This is typically done by estimating the gradient of the expected reward with respect to the policy parameters and then updating the parameters in the direction of the gradient.

Visit the following resources to learn more:

@article@Policy Gradient Theorem Explained: A Hands-On Introduction
@video@An introduction to Policy Gradient methods - Deep Reinforcement Learning