rai_toolbox.optim.L1FrankWolfe#

class rai_toolbox.optim.L1FrankWolfe(params, *, epsilon, lr=2.0, use_default_lr_schedule=True, param_ndim=-1, div_by_zero_eps=1.1754943508222875e-38)[source]#

A Frank-Wolfe [1] optimizer that constrains each updated parameter to fall within an \(\epsilon\)-sized ball in \(L^1\) space, centered on the origin.

Notes

The method L1NormedGradientOptim._pre_step_transform_ is responsible for computing the negative linear minimization oracle for a parameter and storing it on param.grad.

References

__init__(params, *, epsilon, lr=2.0, use_default_lr_schedule=True, param_ndim=-1, div_by_zero_eps=1.1754943508222875e-38)[source]#
Parameters:
paramsSequence[Tensor] | Iterable[Mapping[str, Any]]

Iterable of parameters or dicts defining parameter groups.

epsilonfloat

The radius of the of the L1 ball to which each updated parameter will be constrained. Can be specified per parameter-group.

lrfloat, optional (default=2.0)

Indicates the weight with which the LMO contributes to the parameter update. See use_default_lr_schedule for additional details. If use_default_lr_schedule=False then lr must be be in the domain [0, 1].

use_default_lr_schedulebool, optional (default=True)

If True, then the per-parameter “learning rate” is scaled by \(\hat{l_r} = l_r / (l_r + k)\) where k is the update index for that parameter, which starts at 0.

param_ndimUnion[int, None], optional (default=-1)

Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both _pre_step_transform_ and _post_step_transform_. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-group

  • A positive number determines the dimensionality of the tensor that the transformation will act on.

  • A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).

  • None means that the transformation will be applied directly to the tensor without any broadcasting.

See ParamTransformingOptimizer for more details and examples.

div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)

Prevents div-by-zero error in learning rate schedule.

Examples

Using L1FrankWolfe, we’ll constrain the updated parameter to fall within a \(L^1\)-ball of radius 1.8.

>>> import torch as tr
>>> from rai_toolbox.optim import L1FrankWolfe

Creating a parameter for our optimizer to update, and our optimizer. We specify param_ndim=None so that the constrain occurs on the parameter without any broadcasting.

>>> x = tr.tensor([1.0, 1.0], requires_grad=True)
>>> optim = L1FrankWolfe([x], epsilon=1.8, param_ndim=None)

Performing a simple calculation with x and performing backprop to create a gradient.

>>> (tr.tensor([1.0, 2.0]) * x).sum().backward()

Performing a step with our optimizer uses the Frank-Wolfe algorithm to update its parameters. Note that the updated parameter falls within/on the \(L^1\)-ball of radius 1.8.

>>> optim.step()
>>> x
tensor([ 0.0000, -1.8000], requires_grad=True)

Methods

__init__(params, *, epsilon[, lr, ...])

Parameters: