DRL is different from regular DL in that it tends towards CPU-heavy, not GPU-heavy. It's hard to saturate a single GPU/TPU since you're using tiny little NNs and only once in a while updating them based on long episodes through the environment.
It might not be using GPUs/TPUs at all! If you look at the algorithm which that DM paper is based on, PPO, the original OpenAI paper & implementation (https://blog.openai.com/openai-baselines-ppo/) doesn't use GPUs, it's pure-CPU. (They have a second version which adds GPU support.)
Or in a DM vein, look at their latest IMPALA which you might've noticed on the front page a few days ago: https://arxiv.org/pdf/1802.01561.pdf Look at Table 1 pg5's computational resources for various agents: note how many of them have 0 GPUs whatsoever. Even the largest configuration, 500 CPUs, only saturates 1 Nvidia P100 GPU.
(So, 'worker' could hypothetically refer to a server with X cores and 1 GPU processing them locally, but this is almost certainly not the case since it would imply scaling up to thousands of CPUs which is actually highly difficult and requires careful engineering like with IMPALA.)
It might not be using GPUs/TPUs at all! If you look at the algorithm which that DM paper is based on, PPO, the original OpenAI paper & implementation (https://blog.openai.com/openai-baselines-ppo/) doesn't use GPUs, it's pure-CPU. (They have a second version which adds GPU support.)
Or in a DM vein, look at their latest IMPALA which you might've noticed on the front page a few days ago: https://arxiv.org/pdf/1802.01561.pdf Look at Table 1 pg5's computational resources for various agents: note how many of them have 0 GPUs whatsoever. Even the largest configuration, 500 CPUs, only saturates 1 Nvidia P100 GPU.
(So, 'worker' could hypothetically refer to a server with X cores and 1 GPU processing them locally, but this is almost certainly not the case since it would imply scaling up to thousands of CPUs which is actually highly difficult and requires careful engineering like with IMPALA.)