rai_toolbox.optim.L2ProjectedOptim#

class rai_toolbox.optim.L2ProjectedOptim(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, epsilon=<required parameter>, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, **inner_opt_kwargs)[source]#

A gradient-tranforming optimizer that constrains the updated parameters to lie within an \(\epsilon\)-sized ball in \(L^2\) space centered on the origin.

A step with this optimizer normalizes the gradient by its \(L^2\)-norm prior to using InnerOp.step to update the corresponding parameter. Each parameter is then projected into the constraint set.

The transformation/projection is applied to the gradient/parameter in accordance with param_ndim.

See also

L2NormedGradientOptim
LinfProjectedOptim
ParamTransformingOptimizer

__init__(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, epsilon=<required parameter>, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, **inner_opt_kwargs)[source]#

Parameters:

paramsSequence[Tensor] | Iterable[Mapping[str, Any]]

Iterable of parameters or dicts defining parameter groups.

InnerOptType[Optimizer] | Partial[Optimizer], optional (default=`torch.nn.optim.SGD`)

The optimizer that updates the parameters after their gradients have been transformed.

epsilonfloat

Specifies the size of the L2-space ball that all parameters will be projected into, post optimization step.

param_ndimUnion[int, None], optional (default=-1)

Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both _pre_step_transform_ and _post_step_transform_. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-group

A positive number determines the dimensionality of the tensor that the transformation will act on.
A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).
None means that the transformation will be applied directly to the tensor without any broadcasting.

See ParamTransformingOptimizer for more details and examples.

grad_scalefloat, optional (default=1.0)

Multiplies each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

grad_biasfloat, optional (default=0.0)

Added to each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

defaultsOptional[Dict[str, Any]]

Specifies default parameters for all parameter groups.

div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)

A lower bound used to clamp the normalization factor to prevent div-by-zero.

**inner_opt_kwargsAny

Named arguments used to initialize InnerOpt.

Examples

Let’s create an optimizer that normalizes all parameter gradients using their \(L^2\)-norm, and then updates the parameters with a standard SGD-step with a learning rate of 1.0. After the step, each parameter will be projected into a \(L^2\)-ball of radius 0.8.

>>> import torch as tr
>>> from rai_toolbox.optim import L2ProjectedOptim

Creating a parameter for our optimizer to update, and our optimizer. We want the norm to be computed over the entire gradient tensor – without broadcasting – so we specify param_ndim=None. This also controls the projection behavior.

>>> x = tr.tensor([-1.0, 1.0], requires_grad=True)
>>> optim = L2ProjectedOptim([x], param_ndim=None, InnerOpt=tr.optim.SGD, lr=1.0, epsilon=0.8)

Performing a simple calculation with x and performing backprop to create a gradient.

>>> (tr.tensor([2.0, 2.0]) * x).sum().backward()
>>> x.grad # the un-normed gradient
tensor([2., 2.])

Performing a step with our optimizer transforms the gradient in-place, updates the parameter using SGD([x], lr=1.0).step(), and then projects the parameter into the constraint set.

>>> optim.step()
>>> x.grad # the normalized gradient
tensor([0.7071, 0.7071])
>>> x  # the updated parameter
tensor([-0.7885,  0.1353], requires_grad=True)
>>> x.norm(p=2).item() # `x` lies on the L2-ball of radius 0.8
0.800000011920929

Methods

__init__(params[, InnerOpt, epsilon, ...])

Parameters: