Private Frequency Estimation Via Residue Number Systems

Generative AI & LLMs
Published: arXiv: 2511.11569v1

Abstract

We present \textsf{ModularSubsetSelection} (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size $k$ and $n$ users, our $\varepsilon$-LDP mechanism encodes each input via a Residue Number System (RNS) over $\ell$ pairwise-coprime moduli $m_0, \ldots, m_{\ell-1}$, and reports a randomly chosen index $j \in [\ell]$ along with the perturbed residue using the statistically optimal \textsf{SubsetSelection}~(SS) (Wang et al. 2016). This design reduces the user communication cost from $Θ\bigl(ω\log_2(k/ω)\bigr)$ bits required by standard SS (with $ω\approx k/(e^\varepsilon+1)$) down to $\lceil \log_2 \ell \rceil + \lceil \log_2 m_j \rceil$ bits, where $m_j < k$. Server-side decoding runs in $Θ(n + r k \ell)$ time, where $r$ is the number of LSMR (Fong and Saunders 2011) iterations. In practice, with well-conditioned moduli (\textit{i.e.}, constant $r$ and $\ell = Θ(\log k)$), this becomes $Θ(n + k \log k)$. We prove that MSS achieves worst-case MSE within a constant factor of state-of-the-art protocols such as SS and \textsf{ProjectiveGeometryResponse} (PGR) (Feldman et al. 2022), while avoiding the algebraic prerequisites and dynamic-programming decoder required by PGR. Empirically, MSS matches the estimation accuracy of SS, PGR, and \textsf{RAPPOR} (Erlingsson, Pihur, and Korolova 2014) across realistic $(k, \varepsilon)$ settings, while offering faster decoding than PGR and shorter user messages than SS. Lastly, by sampling from multiple moduli and reporting only a single perturbed residue, MSS achieves the lowest reconstruction-attack success rate among all evaluated LDP protocols.

Paper Summary

Problem
Describe the main problem or challenge the paper addresses. The main problem addressed in this paper is how to **privately estimate the frequencies of items in a universe**. This is a crucial challenge for various applications, including recommender systems, data mining, and privacy-preserving statistics. In these applications, it's essential to ensure that individual user data remains private while still allowing for accurate frequency estimates.
Key Innovation
Explain what is new or unique about this work. The key innovation of this paper is the **ModularSubsetSelection (MSS) algorithm**, which uses Residue Number Systems (RNS) to reduce the user communication cost and achieve the statistically optimal sample complexity for locally differentially private (LDP) frequency estimation.
Practical Impact
Describe how this research could be applied in the real world, or why it matters. This research has significant practical implications for various applications, including: * **Recommender systems**: By privately estimating item frequencies, recommender systems can provide personalized recommendations without compromising user privacy. * **Data mining**: Private frequency estimation enables data miners to identify trends and patterns in user data without revealing individual user information. * **Privacy-preserving statistics**: This research contributes to the development of privacy-preserving statistical methods, which are essential for ensuring user privacy in various applications.
Analogy / Intuitive Explanation
Explain the core idea using a simple analogy or metaphor, if possible. Imagine you're at a party with many people, and you want to know which songs are most popular among the attendees. However, you don't want to reveal which songs each person likes. The ModularSubsetSelection (MSS) algorithm works like a **randomized song survey**, where each person reports a randomly chosen song they like, along with a perturbed (or "noisy") version of the song's popularity. By analyzing these reports, the algorithm can estimate the most popular songs without compromising individual user preferences. The MSS algorithm works as follows: 1. Each user generates a random subset of size ℓ from their input. 2. Each user encodes their input via a Residue Number System (RNS) over ℓ pairwise-coprime moduli m0, . . . , mℓ−1. 3. Each user reports a randomly chosen index j ∈ [ℓ] along with the perturbed residue. 4. The aggregator computes the estimated frequencies using the reported residues and indices. The key insight behind MSS is that by using RNS, users can encode their input in a way that reduces the user communication cost while achieving the statistically optimal sample complexity. The authors evaluated the performance of MSS on synthetic and real-world datasets and compared it with existing LDP frequency estimation algorithms. The results show that MSS achieves the statistically optimal sample complexity of O( k/n) and significantly reduces the user communication cost from Θ (k + n) to O(ℓlog2 k). In conclusion, the ModularSubsetSelection (MSS)
Paper Information
Categories:
cs.CR cs.AI
Published Date:

arXiv ID:

2511.11569v1

Quick Actions