Speaker: Lisa Zhang
Affiliation: Alcatel-Lucent Bell Labs
Title: Analysis of k-Anonymity Algorithms for Streaming Location Data
We propose and analyze algorithms to achieve k-anonymity for streaming location data. We consider a framework motivated by European Union privacy requirements, in which location information arrives online into a buffer of fixed size m. Whenever the buffer fills, some data must move to permanent storage in a k-anonymized fashion. This notion of anonymity refers to recording a coarse common region containing at least k points instead of separate exact locations. One primary goal is to minimize the recorded region size so that the anonymized location data is as accurate as possible.
We observe that under competitive analysis, any online algorithm can be arbitrarily bad in terms of the recorded region size. We therefore assume a more benign model in which the location distribution is known. For a uniform distribution, we analyze a simple, natural algorithm that partitions the space into m/k identical regions to ensure k-anonymity, and picks the region with the largest occupancy whenever the buffer fills. Our detailed analysis shows
that the largest occupancy converges to 2k. This implies, perhaps somewhat unintuitively, that it is sufficient to achieve k-anonymity by partitioning space into $2m/k$ regions, which reduces and thereby improves the recorded region size by a factor of 2. We also present an almost matching lower bound of 2m/k. Finally, we discuss generalizations to nonuniform distributions by partitioning the space to match the given distribution.