rai_toolbox.optim.L1qNormedGradientOptim#

class rai_toolbox.optim.L1qNormedGradientOptim(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, q=<required parameter>, dq=0.0, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, generator=<torch._C.Generator object>, **inner_opt_kwargs)[source]#

A gradient-transforming optimizer that sparsifies a parameter’s gradient and normalizes the gradient to have an \(L^1\)-norm of grad_scale, prior to updating the parameter using InnerOpt.step.

The sparsification process retains only the signs (i.e., \(\pm 1\)) of the gradient’s elements. The transformation is applied to the gradient in accordance with param_ndim.

__init__(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, q=<required parameter>, dq=0.0, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, generator=<torch._C.Generator object>, **inner_opt_kwargs)[source]#
Parameters:
paramsSequence[Tensor] | Iterable[Mapping[str, Any]]

Iterable of parameters or dicts defining parameter groups.

InnerOptType[Optimizer] | Partial[Optimizer], optional (default=`torch.nn.optim.SGD`)

The optimizer that updates the parameters after their gradients have been transformed.

qfloat

Specifies the (fractional) percentile of absolute-largest gradient elements to retain when sparsifying the gradient. E.g., q=0.9 means that only the gradient elements within the 90th-percentile will be retained.

Must be within [0.0, 1.0]. The sparsification is applied to the gradient in accordance to param_ndim.

dqfloat, optional (default=0.0)

If specified, the sparsity factor for each gradient transformation will be drawn from a uniform distribution over \([q - dq, q + dq] \in [0.0, 1.0]\).

param_ndimUnion[int, None], optional (default=-1)

Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both _pre_step_transform_ and _post_step_transform_. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-group

  • A positive number determines the dimensionality of the tensor that the transformation will act on.

  • A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).

  • None means that the transformation will be applied directly to the tensor without any broadcasting.

See ParamTransformingOptimizer for more details and examples.

grad_scalefloat, optional (default=1.0)

Multiplies each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

grad_biasfloat, optional (default=0.0)

Added to each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

defaultsOptional[Dict[str, Any]]

Specifies default parameters for all parameter groups.

div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)

A lower bound used to clamp the normalization factor to prevent div-by-zero.

generatortorch.Generator, optional (default=`torch.default_generator`)

Controls the RNG source.

**inner_opt_kwargsAny

Named arguments used to initialize InnerOpt.

Examples

Let’s use L1qNormedGradientOptim along with a standard SGD-step with a learning rate of 1.0. We’ll sparsify the gradient to retain the top 70% elements of the tensor, and we’ll normalize the sparse gradient to have a \(L^1\)-norm of 1.8.

>>> import torch as tr
>>> from rai_toolbox.optim import L1qNormedGradientOptim

Creating a parameter for our optimizer to update, and our optimizer. We specify param_ndim=None so that the sparsification/normalization occurs on the gradient without any broadcasting.

>>> x = tr.tensor([1.0, 1.0, 1.0], requires_grad=True)
>>> optim = L1qNormedGradientOptim(
...     [x],
...     q=0.30,
...     grad_scale=1.8,
...     InnerOpt=tr.optim.SGD,
...     lr=1.0,
...     param_ndim=None,
... )

Performing a simple calculation with x and performing backprop to create a gradient.

>>> x.backward(gradient=tr.tensor([0.0, 1.0, 2.0]))
>>> x.grad # the original gradient
tensor([0., 1., 2.])

Performing a step with our optimizer sparsifies and normalizes the gradient in-place, and then updates the parameter using SGD([x], lr=1.0).step().

>>> optim.step()
>>> x.grad # the signed, sparsified, and normalized gradient
tensor([0.0000, 0.9000, 0.9000])
>>> x  # the updated parameter
tensor([1.0000, 0.1000, 0.1000], requires_grad=True)

Methods

__init__(params[, InnerOpt, q, dq, ...])

Parameters: