guides/advanced/troubleshooting.md
During deployment or unexpected node restarts, jobs may be left in an executing state
indefinitely. We call these jobs "orphans", but orphaning isn't a bad thing. It means that the job
wasn't lost and it may be retried again when the system comes back online.
There are two mechanisms to mitigate orphans:
Increase the shutdown_grace_period to allow the
system more time to finish executing before shutdown. During shutdown each queue stops
fetching more jobs, but executing jobs have up to the grace period to complete. The default
value is 15000ms, or 15 seconds.
Use the Lifeline plugin to automatically move those jobs back to available so they can run again.
config :my_app, Oban,
plugins: [Oban.Plugins.Lifeline],
shutdown_grace_period: :timer.seconds(60),
...
Sometimes Cron or Pruner plugins appear to stop working unexpectedly. Typically, this happens in
systems with multi-node setups where "web" nodes only enqueue jobs while "worker" nodes are
configured to run queues and plugins. Most plugins require leadership to function, so when a "web"
node becomes leader the plugins go dormant.
The solution is to disable leadership with peer: false on any node that doesn't run plugins:
config :my_app, Oban, peer: false, ...
The @reboot cron expression depends on leadership to prevent duplicate job insertion across
nodes. In development, when you shut down your application (e.g., by exiting IEx), the node may
not cleanly relinquish leadership in the database. This creates a delay before the node can become
leader again on the next startup, making it appear as though @reboot jobs aren't working.
Wait for leadership - The default peer will eventually assume leadership, typically within 30 seconds.
Use the Global peer in development - The Global peer handles restarts more gracefully:
# In config/dev.exs
config :my_app, Oban,
peer: Oban.Peers.Global,
...
Clear leadership manually - If needed, you can clear the oban_peers table in your database to force immediate leadership.
Keep the default peer in production for better reliability and persistence across restarts.
Using PgBouncer's "Transaction Pooling" setup disables all of PostgreSQL's LISTEN and NOTIFY
activity. Some functionality, such as triggering job execution, scaling queues, canceling jobs,
etc. rely on those notifications.
There are several options available to ensure functional notifications:
Switch to the Oban.Notifiers.PG notifier. This alternative notifier relies on Distributed
Erlang and exchanges messages within a cluster. The only drawback to the PG notifier is that it
doesn't trigger job insertion events.
Switch PgBouncer to "Session Pooling". Session pooling isn't as resource efficient as
transaction pooling, but it retains all Postgres functionality.
Use a dedicated Repo that connects directly to the database, bypassing PgBouncer.
If none of those options work, Oban's job staging will switch to local polling mode to ensure
that queues keep processing jobs.
Without a version comment on the oban_jobs table, it will rerun all of the migrations. This can
happen when comments are stripped when restoring from a backup, most commonly during a transition
from one database to another.
The fix is to set the latest migrated version as a comment. To start, search through your previous
migrations and find the last time you ran an Oban migration. Once you've found the latest version,
e.g. version: 10, then you can set that as a comment on the oban_jobs table:
COMMENT ON TABLE public.oban_jobs IS '10'"
Once the comment is in place only the migrations from that version onward will run.
If every queue is polling the database every second, job staging has switched from efficient global mode to local mode. In global mode, only the leader queries for jobs and notifies queues via pubsub. In local mode, each queue polls independently.
This typically happens when:
PG notifier without clustering and functional pubsubLook for these log messages confirming the switch to local mode:
"job staging switched to local mode. local mode polls for jobs for every queue"
Oban.Peers.Global in development