Optimizers#

Our optimizers are designed to compose with off-the-shelf torch.optim.Optimizer implementations by adding the ability to modify parameters – and their gradients – before and after the optimizer’s step process. This is facilitated by ParamTransformingOptimizer, which is able to compose with any PyTorch optimizer (referred to as InnerOpt throughout the reference docs) and add parameter/gradient-transforming capabilities to it.

The capabilities and implementations provided here are particularly useful for solving for data perturbations. Popular adversarial training and evaluation methods often involve normalizing parameter gradients prior to applying updates, as well as constraining (or projecting) the updated parameters. Thus, our optimizers are particularly well-suited for such applications. E.g., SignedGradientOptim implements the fast gradient sign method, and can encapsulate any other optimizer (e.g., torch.optim.Adam) which performs the actual gradient-based step.

Because these optimizers are frequently used to update perturbations of data, and not model weights, it is often necessary to control how the parameter-transformations performed by ParamTransformingOptimizer are broadcast over each tensor. For example, we may be solving for a single perturbation (e.g., a “universal” perturbation), or for a batch of perturbations. In the latter case our parameter transformations ought to broadcast over the leading batch dimension. param_ndim is exposed throughout our optimizer APIs to control this behavior. Refer to ParamTransformingOptimizer for more details.

All of our reference documentation features detailed Examples sections; scroll to the bottom of any given reference page to see them. For additional instructions for creating your own parameter-transforming optimizer please refer to our How-To guide.

Base Parameter-Transforming Optimizers#

ParamTransformingOptimizer([params, ...])

An optimizer that performs an in-place transformation to each parameter, both before and after performing the gradient-based update on each parameter via InnerOptim.step.

ChainedParamTransformingOptimizer(...[, ...])

Chains together an arbitrary number of parameter-transforming optimizers, composing their pre- and post-step transformation functions to modify the parameters (and their gradients) in-place.

Optimizers with Normed Gradients#

L1NormedGradientOptim(params[, InnerOpt, ...])

A gradient-tranforming optimizer that normalizes the gradient by its \(L^1\)-norm prior to using InnerOp.step to update the corresponding parameter.

L2NormedGradientOptim(params[, InnerOpt, ...])

A gradient-tranforming optimizer that normalizes the gradient by its \(L^2\)-norm prior to using InnerOp.step to update the corresponding parameter.

SignedGradientOptim(params[, InnerOpt, ...])

A gradient-tranforming optimizer that takes the elementwise sign of a parameter's gradient prior to using InnerOp.step to update the corresponding parameter.

L1qNormedGradientOptim(params[, InnerOpt, ...])

A gradient-transforming optimizer that sparsifies a parameter's gradient and normalizes the gradient to have an \(L^1\)-norm of grad_scale, prior to updating the parameter using InnerOpt.step.

Miscellaneous Gradient-Transforming Optimizers#

TopQGradientOptimizer(params[, InnerOpt, q, ...])

A gradient-tranforming optimizer that zeros the elements of a gradient whose absolute magnitudes fall below the Qth percentile.

ClampedGradientOptimizer([params, InnerOpt, ...])

A gradient-tranforming optimizer that clamps the elements of a gradient to fall within user-specified bounds prior to using InnerOp.step to update the corresponding parameter.

ClampedParameterOptimizer([params, ...])

A parameter optimizer that clamps the elements of a parameter to fall within user-specified bounds after InnerOpt.step() has updated the parameter

Optimizers with Projections Onto Constraint Sets#

L2ProjectedOptim(params[, InnerOpt, ...])

A gradient-tranforming optimizer that constrains the updated parameters to lie within an \(\epsilon\)-sized ball in \(L^2\) space centered on the origin.

LinfProjectedOptim(params[, InnerOpt, ...])

A gradient-tranforming optimizer that constrains the updated parameter values to fall within \([-\epsilon, \epsilon]\).

Frank-Wolfe Optimizers#

FrankWolfe(params, *[, lr, ...])

Implements the Frank-Wolfe minimization algorithm [R708bf7dd73fb-1].

L1FrankWolfe(params, *, epsilon[, lr, ...])

A Frank-Wolfe [R7f744f42992b-1] optimizer that constrains each updated parameter to fall within an \(\epsilon\)-sized ball in \(L^1\) space, centered on the origin.

L2FrankWolfe(params, *, epsilon[, lr, ...])

A Frank-Wolfe [Ra856166c660b-1] optimizer that constrains each updated parameter to fall within an \(\epsilon\)-sized ball in \(L^2\) space, centered on the origin.

LinfFrankWolfe(params, *, epsilon[, lr, ...])

A Frank-Wolfe [R313614fd02b0-1] optimizer that constrains each updated parameter to fall within an \(\epsilon\)-sized ball in \(L^\infty\) space, centered on the origin.

L1qFrankWolfe(params, *, q, epsilon[, dq, ...])

A Frank-Wolfe [Rd25ad1f7d854-1] optimizer that, when computing the LMO, sparsifies a parameter's gradient.

L1qNormedGradientOptim(params[, InnerOpt, ...])

A gradient-transforming optimizer that sparsifies a parameter's gradient and normalizes the gradient to have an \(L^1\)-norm of grad_scale, prior to updating the parameter using InnerOpt.step.