DP-FTRL and matrix factorization mechanism (III)
In this series of blog posts, I plan to write down some of my personal understanding (of course shaped by many wonderful papers) of DP-FTRL and matrix factorization mechanism.
As motivated at the end of the previous post, in this post, our goal is to give a general way to compute the total noise (error) in prefix sum for any factorization . As a result, one can then optimize this error (objective) under privacy constraints to search for an even better mechanism than the tree-based algorithm (i.e., better pair ). From this, we will see that one needs to be careful in choosing and to balance the (i) sensitivity (hence the variance for each noise, determined by ) and (ii) the total number of accumulated noise (determined by ). The key behind tree-based algorithm is that its pair of guarantees that both (i) and (ii) are on the log order, which is in sharp contrast to Simple I and II.
The calculation for the error in the matrix factorization mechanism is standard ( e.g.,[LMHMR15]). Here, we first slightly deviate from existing works by explicitly focusing on the task of prefix sum and aiming to bound the maximal error with high probability rather than expectation. Another difference is that for clarity, we explicitly consider the fact that the input data is a matrix rather than vector in standard previous works. Later, we will turn to the general way to bound the expected error.
To start with, we can easily see that the max error across prefix outputs (here is the all-ones lower triangular matrix) is given by
where is the -th row of and each element of is sampled from , which is used to provide -DP. To this end, (assume norm of is bounded by again) with and is the -sensitivity of (here it is Frobenious norm since is a matrix) 1. Let us first compute the high-probability max-error for a general and then plug in the proper value of it for privacy. For each , we have each element of the -dimensional row vector is distributed according to . Hence, by the standard concentration of Gaussian vector and union bound over all , we have with high probability
It remains to determine the value of , i.e., . To this end, we can write as a linear combination of the columns in
where is the -th column of and . Thus, considering an arbitrary neighboring sequence of , the maximal Frobenious norm change is given by . By the fact that the Frobenious norm is sub-multiplicative, we have
where in the last step we use the bounded norm of gradients. Combining the above with (3.1) and the fact together, we conclude that for -DP under for the task of prefix sum, the max-error across all noisy prefix sum is upper bounded by the following with high probability
Remark 3.1: From the above bound, one can now see that a good choice of such that is the one that minimizes the norm of columns in and the norm of rows in , simultaneously. Moreover, for any factorization, one can simply plug the bound in (3.2) into (1.2) to obtain the performance of DP-FTRL with the corresponding mechanism instead of the tree-based algorithm.
We now turn to a general task (instead of prefix sum task above) and discuss the standard optimization objective in the matrix factorization mechanism. In particular, the standard goal in the literature is to minimize the expected error given by
where one can think of as the total expected error in all outputs and hence normalized by to have the same unit as the maximal expected error. In most existing works, the bound for (3.3) is used directly in the regret of DP-FTRL (i.e., (1.2)). However, in this case, one can at most get an expected regret and hence an expected population excess risk. Let us first put this aside and focus on how to bound the term in (3.3).
For -DP, the value of for is the same, i.e., . To bound the error, we have
With this, we have that under -DP
Hence, in most works, the objective is to optimize the following objective
This problem is well-studied and can be solved using numerical optimization algorithms (e.g.,[YYZH16], [MMHM18]).
Remark 3.2: One may wonder if we can get a high probability bound on . It turns out that with some use of concentration inequality, one can get it done. One straightforward way is to apply the (high-dimensional) Hanson-Wright inequality (since is a matrix rather than vector), see Exercise 6.2.7 in [Ver18].
Summary. Let us summarize what we have covered so far in this series: In (I), we introduce DP-FTRL, tree-based algorithm and general DP-FTRL regret bound as a function of the maximal error in prefix sum (i.e., (1.2)). Then, in (II), we view existing private mechanisms for prefix sum as special cases of matrix factorization mechanism, which motivates us to find better pair . Hence, in this post, we derive optimization problems over to minimize the total error under privacy guarantee. As a result, we can now substitute the best error into (1.2) to obtain better regret upper bound.
But, as hinted by Q2 in the end of (I), it is still unclear whether minimizing the maximal error in prefix sum leads to tight final actual regret guarantee. This leads us to the topic of the next post:)
THE END
Now, it’s time to take a break by appreciating the masterpiece of Monet.
Water Lilies
courtesy of https://www.wikiart.org/