# Project Overview

## Overview

A symphony in three parts, this line of work explores welfare-centric group-fair machine learning, sampling, and statistical estimation problems, wherein the goal is to establish a cardinal objective function over per-group average utility or disutility (risk) values, and then optimize or estimate the value of this objective from data.
In this work, I generally adopt the philosophy of “flexible and non-judgmental algorithmics,” wherein we seek to show what is possible for a broad class of objectives, with the understanding that fairness concepts vary interpersonally and across cultures, and it is not necessarily our place as algorithm designers to make normative claims as to how one should view fairness.
However, I do see axiomatic reasoning as a path to better understand the space of reasonable fairness concepts, as should one accept certain precepts, they can at least limit consideration to *particular classes* of fairness concepts.
In this work, the axiomatic class of interest is usually the *weighted power-mean*, defined for real-valued $p$ as
\[
\operatorname{M}_{p}(\boldsymbol{u}; \boldsymbol{w}) \doteq \sqrt[p]{\sum_{i=1}^{g} \boldsymbol{w}_{i} {\boldsymbol{u}_{i}^{{p}}}} \enspace.
\]

Welfare-centric learning methods contrast constraint-based fairness methods, which are generally intuitively motivated, rather than axiomatically justified, and frequently run afoul of statistical quandaries (overfitting to fairness).
In particular, it is difficult to ensure that fairness constraints hold with high probability over the underlying distribution while also ensuring that the learned function approximately optimizes the objective subject to these constraints: intuitively put, even selecting between two functions from data is impossible if one *exactly satisfies* a fairness constraint, while the other more stringently satisfies the constraint while obtaining a worse objective value, since empirical estimates of the fairness constraint will not converge. Cardinal fairness methods resolve this dilemma by encoding the objective and the fairness concept in a single function, and then must only show that this function may be efficiently optimized and estimated from data.

## The Core Papers

Part one [NeurIPS 2021] finds that the power-mean family for $p \geq 1$ is the only axiomatically justified class for aggregating loss values, hence termed*malfare functions*, and shows that methods from convex optimization are sufficient to optimize this class in polynomial time, Rademacher averages and tools from statistical learning theory are sufficient to show generalization bounds, not just on the per-group risk values, but also on the cardinal objective function itself, despite its nonlinear dependence on per-group samples. The second movement [FAccT 2022] extends this theory, giving sharper generalization guarantees under broader conditions, and exploring dynamic (adaptive) sampling strategies for more realistic sampling models that consider the intricacies of populations consisting of multiple groups (i.e., sampling random sets, or conditionally from various groups under some cost model), and also gives similar treatment to a generalized class of regret based objectives, wherein each group considers the

*difference between*the optimal (dis)utility it could obtain in an unshared model, and the (dis)utility it obtains in a shared model, and we optimize the malfare of this quantity. Finally, the third part [AIStats 2023] extends this analysis to welfare functions $p \leq 1$ weighted power-means, and shows that, while not always Lipschitz continuous, this class is locally Hölder continuous, and as a result, we obtain finite sample complexity bounds for bounded learning problems where uniform convergence holds (as in the malfare case), where polynomial sample complexity is preserved so long as $|p|$ is bounded away from $0$ and no group’s weight is too small.

## Extensions

Many practical student projects and collaborations have also developed from this line of work; of note are projects in fair learning with unknown group IDs, robust fair learning, and fair reinforcement learning, which have all produced workshop papers. I have also further explored these themes, with the goal of*sharper per-group generalization bounds*using more sophisticated techniques, in subsequent work. In ongoing work, I am studying more efficient convex optimization algorithms for power-mean optimization; SGD is difficult due to the nonlinear objective and biased gradient estimators, and first-order methods suffer from a general lack of smoothness, even under smooth losses, as the $p=\infty$ power-mean $\operatorname{M}_{p}(\boldsymbol{u}; \boldsymbol{w})$ is the

*maximum*over per-group risk values. I am also working to apply these methods in practical large-scale machine learning settings, such as facial recognition, verification, and classification, as well as recommender systems, wherein sampling models have additional subtlety, as we generally see multiple datapoints associated with individuals, and must aggregate these data across all individuals in a group.