Main Content

Once you train a reinforcement learning agent, you can generate code to deploy the optimal policy. You can generate:

CUDA

^{®}code for deep neural network policies using GPU Coder™C/C++ code for table, deep neural network, or linear basis function policies using MATLAB

^{®}Coder™

Code generation is supported for agents using feedforward neural networks in any of the input paths, provided that all the used layers are supported. Code generation is not supported for continuous actions PG, AC, PPO, and SAC agents using a recurrent neural network (RNN).

For more information on training reinforcement learning agents, see Train Reinforcement Learning Agents.

To create a policy evaluation function that selects an action based on a given
observation, use the `generatePolicyFunction`

command. This command generates a MATLAB script, which contains the policy evaluation function, and a MAT-file, which
contains the optimal policy data.

You can generate code to deploy this policy function using GPU Coder or MATLAB Coder.

If your trained optimal policy uses a deep neural network, you can generate CUDA code for the policy using GPU Coder. For more information on supported GPUs see GPU Support by Release (Parallel Computing Toolbox). There are several required and recommended prerequisite products for generating CUDA code for deep neural networks. For more information, see Installing Prerequisite Products (GPU Coder) and Setting Up the Prerequisite Products (GPU Coder).

Not all deep neural network layers support GPU code generation. For a list of supported layers, see Supported Networks, Layers, and Classes (GPU Coder). For more information and examples on GPU code generation, see Deep Learning with GPU Coder (GPU Coder).

As an example, generate GPU code for the policy gradient agent trained in Train PG Agent to Balance Cart-Pole System.

Load the trained agent.

load('MATLABCartpolePG.mat','agent')

Create a policy evaluation function for this agent.

generatePolicyFunction(agent)

This command creates the `evaluatePolicy.m`

file, which contains
the policy function, and the `agentData.mat`

file, which contains the
trained deep neural network actor. For a given observation, the policy function
evaluates a probability for each potential action using the actor network. Then, the
policy function randomly selects an action based on these probabilities.

You can generate code for this network using GPU Coder. For example, you can generate a CUDA compatible MEX function.

Configure the `codegen`

function to create a CUDA compatible C++ MEX function.

cfg = coder.gpuConfig('mex'); cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

Set an example input value for the policy evaluation function. To find the
observation dimension, use the `getObservationInfo`

function. In
this case, the observations are in a four-element vector.

`argstr = '{ones(4,1)}';`

Generate code using the `codegen`

function.

codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');

This command generates the MEX function
`evaluatePolicy_mex`

.

You can generate C/C++ code for table, deep neural network, or linear basis function policies using MATLAB Coder.

Using MATLAB Coder, you can generate:

C/C++ code for policies that use Q tables, value tables, or linear basis functions. For more information on general C/C++ code generation, see Generating Code (MATLAB Coder).

C++ code for policies that use deep neural networks. Note that code generation is not supported for continuous actions PG, AC, PPO, and SAC agents using a recurrent neural network (RNN). For a list of supported layers, see Networks and Layers Supported for Code Generation (MATLAB Coder). For more information, see Prerequisites for Deep Learning with MATLAB Coder (MATLAB Coder) and Deep Learning with MATLAB Coder (MATLAB Coder).

As an example, generate C code without dependencies on third-party libraries for the policy gradient agent trained in Train PG Agent to Balance Cart-Pole System.

Load the trained agent.

load('MATLABCartpolePG.mat','agent')

Create a policy evaluation function for this agent.

generatePolicyFunction(agent)

This command creates the `evaluatePolicy.m`

file, which contains
the policy function, and the `agentData.mat`

file, which contains the
trained deep neural network actor. For a given observation, the policy function
evaluates a probability for each potential action using the actor network. Then, the
policy function randomly selects an action based on these probabilities.

Configure the `codegen`

function to generate code suitable for
building a MEX file.

`cfg = coder.config('mex');`

On the configuration object, set the target language to C++, and set
`DeepLearningConfig`

to '`none`

'. This option
generates code without using any third-party library.

cfg.TargetLang = 'C'; cfg.DeepLearningConfig = coder.DeepLearningConfig('none');

Set an example input value for the policy evaluation function. To find the
observation dimension, use the `getObservationInfo`

function. In
this case, the observations are in a four-element vector.

`argstr = '{ones(4,1)}';`

Generate code using the `codegen`

function.

codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');

This command generates the C++ code for the policy gradient agent containing the deep neural network actor.

As an example, generate C++ code for the policy gradient agent trained in Train PG Agent to Balance Cart-Pole System using the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN).

Load the trained agent.

load('MATLABCartpolePG.mat','agent')

Create a policy evaluation function for this agent.

generatePolicyFunction(agent)

This command creates the `evaluatePolicy.m`

file, which contains
the policy function, and the `agentData.mat`

file, which contains the
trained deep neural network actor. For a given observation, the policy function
evaluates a probability for each potential action using the actor network. Then, the
policy function randomly selects an action based on these probabilities.

Configure the `codegen`

function to generate code suitable for
building a MEX file.

`cfg = coder.config('mex');`

On the configuration object, set the target language to C++, and set
`DeepLearningConfig`

to the target library
'`mkldnn`

'. This option generates code using the Intel Math
Kernel Library for Deep Neural Networks (Intel MKL-DNN).

cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn');

Set an example input value for the policy evaluation function. To find the
observation dimension, use the `getObservationInfo`

function. In
this case, the observations are in a four-element vector.

`argstr = '{ones(4,1)}';`

Generate code using the `codegen`

function.

codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');

This command generates the C++ code for the policy gradient agent containing the deep neural network actor.

As an example, generate C code for the Q-learning agent trained in Train Reinforcement Learning Agent in Basic Grid World.

Load the trained agent.

load('basicGWQAgent.mat','qAgent')

Create a policy evaluation function for this agent.

generatePolicyFunction(qAgent)

This command creates the `evaluatePolicy.m`

file, which contains
the policy function, and the `agentData.mat`

file, which contains the
trained Q table value function. For a given observation, the policy function looks up
the value function for each potential action using the Q table. Then, the policy
function selects the action for which the value function is greatest.

Set an example input value for the policy evaluation function. To find the
observation dimension, use the `getObservationInfo`

function. In
this case, there is a single one dimensional observation (belonging to a discrete set
of possible values).

`argstr = '{[1]}';`

Configure the `codegen`

function to generate embeddable C code
suitable for targeting a static library, and set the output folder to
`buildFolder`

.

cfg = coder.config('lib'); outFolder = 'buildFolder';

Generate C code using the `codegen`

function.

codegen('-c','-d',outFolder,'-config','cfg',... 'evaluatePolicy','-args',argstr,'-report');