miniRL is a reinforcement learning (RL) proxy application derived from the Easily eXtendable Architecture for Reinforcement Learning (EXARL) framework, which is being developed by the ExaLearn Control project. The EXARL framework is designed to be used by researchers interested in using RL for control and optimization of their applications or experiments without worrying about the details of the RL implementations. Any RL problem consists of an agent (controller) and an environment (system to be controlled), and EXARL uses an extension of the OpenAI Gym framework, which not only allows existing benchmark environments in Gym to be used but also provides easy integration of new scientific environments. The agent is nothing but a collection of RL algorithms with a state table or associated neural network architectures. EXARL also includes distributed learning workflows, which define how the agent and environment interact with each other.
Los Alamos National Laboratory
DOE NNSA and Office of Science
Scalable Reinforcement Learning, RL framework
Git/SVN Repository URL
Source Code (tar/zip) URL
The architecture of EXARL is separated into learner and actors. A simple round-robin scheduling scheme is used to distribute work from the learner to the actors. The learner consists of a target model that is trained using experiences collected by the actors. Each actor consists of a model replica, which receives the updated weights from the learner. This model is used to infer the next action given a state of the environment. The environment can be rendered/simulated to update the state using this action. In contrast to other architectures such as IMPALA and SEED, each actor in EXARL independently stores experiences and runs the Bellman equation to generate training data. These training data are sent back to the leaner, once enough data is collected. By locally running the Bellman equations in each actor in parallel, the load is equally distributed among all actor processes. The learner distributes work by parallelizing across episodes, and actors request work in a round-robin fashion. Each actor runs all of the steps in an episode to completion before requesting more work from the learner. This process is repeated until the learner gathers experiences from all episodes. miniRL uses the well known inverted pendulum or 'CartPole' environment along with a DQN agent available in EXARL. In addition, it uses the asynchronous workflow distribution scheme to collect experiences from multiple environments running in parallel. miniRL not only acts a benchmark application for the RL algorithms (agents), but also for different workflow distribution schemes. With the EXARL framework, it is easy to swap different environments, agents, as well as learning and workflow distribution schemes for testing the performance. miniRL also had CANDLE functionality built-in, which allows for hyperparameter optimization using the CANDLE Supervisor.