rai_toolbox.optim.L1qNormedGradientOptim#

class rai_toolbox.optim.L1qNormedGradientOptim(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, q=<required parameter>, dq=0.0, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, generator=<torch._C.Generator object>, **inner_opt_kwargs)[source]#

A gradient-transforming optimizer that sparsifies a parameter’s gradient and normalizes the gradient to have an L1-norm of grad_scale, prior to updating the parameter using InnerOpt.step.

The sparsification process retains only the signs (i.e., ±1) of the gradient’s elements. The transformation is applied to the gradient in accordance with param_ndim.

__init__(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, q=<required parameter>, dq=0.0, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, generator=<torch._C.Generator object>, **inner_opt_kwargs)[source]#
Parameters:
paramsSequence[Tensor] | Iterable[Mapping[str, Any]]

Iterable of parameters or dicts defining parameter groups.

InnerOptType[Optimizer] | Partial[Optimizer], optional (default=`torch.nn.optim.SGD`)

The optimizer that updates the parameters after their gradients have been transformed.

qfloat

Specifies the (fractional) percentile of absolute-largest gradient elements to retain when sparsifying the gradient. E.g., q=0.9 means that only the gradient elements within the 90th-percentile will be retained.

Must be within [0.0, 1.0]. The sparsification is applied to the gradient in accordance to param_ndim.

dqfloat, optional (default=0.0)

If specified, the sparsity factor for each gradient transformation will be drawn from a uniform distribution over [qdq,q+dq][0.0,1.0].

param_ndimUnion[int, None], optional (default=-1)

Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both _pre_step_transform_ and _post_step_transform_. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-group

  • A positive number determines the dimensionality of the tensor that the transformation will act on.

  • A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).

  • None means that the transformation will be applied directly to the tensor without any broadcasting.

See ParamTransformingOptimizer for more details and examples.

grad_scalefloat, optional (default=1.0)

Multiplies each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

grad_biasfloat, optional (default=0.0)

Added to each gradient in-place after the in-place transformation is performed. This can be specified per param-group.

defaultsOptional[Dict[str, Any]]

Specifies default parameters for all parameter groups.

div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)

A lower bound used to clamp the normalization factor to prevent div-by-zero.

generatortorch.Generator, optional (default=`torch.default_generator`)

Controls the RNG source.

**inner_opt_kwargsAny

Named arguments used to initialize InnerOpt.

Examples

Let’s use L1qNormedGradientOptim along with a standard SGD-step with a learning rate of 1.0. We’ll sparsify the gradient to retain the top 70% elements of the tensor, and we’ll normalize the sparse gradient to have a L1-norm of 1.8.

>>> import torch as tr
>>> from rai_toolbox.optim import L1qNormedGradientOptim

Creating a parameter for our optimizer to update, and our optimizer. We specify param_ndim=None so that the sparsification/normalization occurs on the gradient without any broadcasting.

>>> x = tr.tensor([1.0, 1.0, 1.0], requires_grad=True)
>>> optim = L1qNormedGradientOptim(
...     [x],
...     q=0.30,
...     grad_scale=1.8,
...     InnerOpt=tr.optim.SGD,
...     lr=1.0,
...     param_ndim=None,
... )

Performing a simple calculation with x and performing backprop to create a gradient.

>>> x.backward(gradient=tr.tensor([0.0, 1.0, 2.0]))
>>> x.grad # the original gradient
tensor([0., 1., 2.])

Performing a step with our optimizer sparsifies and normalizes the gradient in-place, and then updates the parameter using SGD([x], lr=1.0).step().

>>> optim.step()
>>> x.grad # the signed, sparsified, and normalized gradient
tensor([0.0000, 0.9000, 0.9000])
>>> x  # the updated parameter
tensor([1.0000, 0.1000, 0.1000], requires_grad=True)

Methods

__init__(params[, InnerOpt, q, dq, ...])

Parameters: