An Open Access Journal
From: Optimization of a physical internet based supply chain using reinforcement learning
Hyper-parameter name
Value
Batch size
32
Discounting factor
0.99
Optimizer
Adam
Learning-rate
5e-4
Synchronization freq. (ε)
1e-2 (16)
Experience replay mem. size
1000
Policy for exploration
Boltzmann-sampl. (τ=1.0) eq. 15
Training length
10000