rai_toolbox.optim.L2NormedGradientOptim#
- class rai_toolbox.optim.L2NormedGradientOptim(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, param_ndim=-1, defaults=None, grad_scale=1.0, grad_bias=0.0, div_by_zero_eps=1.1754943508222875e-38, **kwargs)[source]#
A gradient-tranforming optimizer that normalizes the gradient by its \(L^2\)-norm prior to using
InnerOp.step
to update the corresponding parameter.The transformation is applied to the gradient in accordance with
param_ndim
.Examples
Let’s create an optimizer that normalizes all parameter gradients using their \(L^2\)-norm, and then updates the parameters with a standard SGD-step with a learning rate of
1.0
.>>> import torch as tr >>> from rai_toolbox.optim import L2NormedGradientOptim
Creating a parameter for our optimizer to update, and our optimizer. We want the norm to be computed over the entire gradient tensor – without broadcasting – so we specify
param_ndim=None
.>>> x = tr.tensor([-1.0, 1.0], requires_grad=True) >>> optim = L2NormedGradientOptim([x], param_ndim=None, InnerOpt=tr.optim.SGD, lr=1.0)
Performing a simple calculation with
x
and performing backprop to create a gradient.>>> (tr.tensor([2.0, 2.0]) * x).sum().backward() >>> x.grad # the un-normed gradient tensor([2., 2.])
Performing a step with our optimizer transforms the gradient in-place, and then updates the parameter using
SGD([x], lr=1.0).step()
.>>> optim.step() >>> x.grad # the normalized gradient tensor([0.7071, 0.7071]) >>> x # the updated parameter tensor([-1.7071, 0.2929], requires_grad=True)
- __init__(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, param_ndim=-1, defaults=None, grad_scale=1.0, grad_bias=0.0, div_by_zero_eps=1.1754943508222875e-38, **kwargs)#
- Parameters:
- paramsSequence[Tensor] | Iterable[Mapping[str, Any]]
Iterable of parameters or dicts defining parameter groups.
- InnerOptType[Optimizer] | Partial[Optimizer], optional (default=`torch.nn.optim.SGD`)
The optimizer that updates the parameters after their gradients have been transformed.
- param_ndimOptional[int]
Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both
_pre_step_transform_
and_post_step_transform_
. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-groupA positive number determines the dimensionality of the tensor that the transformation will act on.
A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).
None
means that the transformation will be applied directly to the tensor without any broadcasting.
See
ParamTransformingOptimizer
for more details and examples.- grad_scalefloat, optional (default=1.0)
Multiplies each gradient in-place after the in-place transformation is performed. This can be specified per param-group.
- grad_biasfloat, optional (default=0.0)
Added to each gradient in-place after the in-place transformation is performed. This can be specified per param-group.
- defaultsOptional[Dict[str, Any]]
Specifies default parameters for all parameter groups.
- div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)
A lower bound used to clamp the normalization factor to prevent div-by-zero.
- **inner_opt_kwargsAny
Named arguments used to initialize
InnerOpt
.
Methods
__init__
(params[, InnerOpt, param_ndim, ...])- Parameters: