rai_toolbox.optim.L2NormedGradientOptim#

class rai_toolbox.optim.L2NormedGradientOptim(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, param_ndim=-1, defaults=None, grad_scale=1.0, grad_bias=0.0, div_by_zero_eps=1.1754943508222875e-38, **kwargs)[source]#

A gradient-tranforming optimizer that normalizes the gradient by its \(L^2\)-norm prior to using InnerOp.step to update the corresponding parameter.

The transformation is applied to the gradient in accordance with param_ndim.

See also

L1NormedGradientOptim
SignedGradientOptim
ParamTransformingOptimizer

Examples

Let’s create an optimizer that normalizes all parameter gradients using their \(L^2\)-norm, and then updates the parameters with a standard SGD-step with a learning rate of 1.0.

>>> import torch as tr
>>> from rai_toolbox.optim import L2NormedGradientOptim

Creating a parameter for our optimizer to update, and our optimizer. We want the norm to be computed over the entire gradient tensor – without broadcasting – so we specify param_ndim=None.

>>> x = tr.tensor([-1.0, 1.0], requires_grad=True)
>>> optim = L2NormedGradientOptim([x], param_ndim=None, InnerOpt=tr.optim.SGD, lr=1.0)

Performing a simple calculation with x and performing backprop to create a gradient.

>>> (tr.tensor([2.0, 2.0]) * x).sum().backward()
>>> x.grad # the un-normed gradient
tensor([2., 2.])

Performing a step with our optimizer transforms the gradient in-place, and then updates the parameter using SGD([x], lr=1.0).step().

>>> optim.step()
>>> x.grad # the normalized gradient
tensor([0.7071, 0.7071])
>>> x  # the updated parameter
tensor([-1.7071,  0.2929], requires_grad=True)

__init__(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, param_ndim=-1, defaults=None, grad_scale=1.0, grad_bias=0.0, div_by_zero_eps=1.1754943508222875e-38, **kwargs)#

Parameters:

paramsSequence[Tensor] | Iterable[Mapping[str, Any]]

Iterable of parameters or dicts defining parameter groups.

InnerOptType[Optimizer] | Partial[Optimizer], optional (default=`torch.nn.optim.SGD`)

The optimizer that updates the parameters after their gradients have been transformed.

param_ndimOptional[int]

Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both _pre_step_transform_ and _post_step_transform_. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-group

A positive number determines the dimensionality of the tensor that the transformation will act on.
A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).
None means that the transformation will be applied directly to the tensor without any broadcasting.

See ParamTransformingOptimizer for more details and examples.

grad_scalefloat, optional (default=1.0)

Multiplies each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

grad_biasfloat, optional (default=0.0)

Added to each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

defaultsOptional[Dict[str, Any]]

Specifies default parameters for all parameter groups.

div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)

A lower bound used to clamp the normalization factor to prevent div-by-zero.

**inner_opt_kwargsAny

Named arguments used to initialize InnerOpt.

Methods

__init__(params[, InnerOpt, param_ndim, ...])

Parameters: