Abstract
We address the challenge of reliable payment routing in the Bitcoin Lightning Network (LN), an emerging technology designed to scale Bitcoin transactions through off-chain channels.
A key bottleneck to LN performance is liquidity placement, ensuring sufficient channel capacity along payment paths to maximize throughput. We formulate throughput-oriented liquidity placement as a graph reinforcement learning problem and introduce a lightweight agent that combines a message-passing policy network with proximal policy optimization (PPO) and action masking for stable and generalizable learning.
To provide a theory-grounded yet scalable training reward, we use max flow as a proxy for payment throughput.
In extensive experiments on real Lightning Network snapshots, our method consistently outperforms strong heuristic baselines across multiple seeds and unseen graphs. The agent has been deployed in production for peer recommendations, facilitating over $5,000,000 in BTC in liquidity allocation across 200 channels.
Our results highlight the potential of graph-based reinforcement learning for adaptive resource allocation in decentralized financial systems.
Read the full published paper here.