A Public and Reproducible Assessment of the Topics API on Real Data
Abstract: The Topics API for the web is Google's privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google's results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users' interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these real browsing traces by adapting methodologies from prior privacy studies. Finally, we call on web actors to perform and enable reproducible evaluations by releasing anonymized distributions. We find that for the 1207 real users in this dataset, the probability of being re-identified across websites is of 2%, 3%, and 4% after 1, 2, and 3 observations of their topics by advertisers, respectively. This paper shows on real data that Topics does not provide the same privacy guarantees to all users and that the information leakage worsens over time, further highlighting the need for public and reproducible evaluations of the claims made by new web proposals.
- Google, “Topics API: Relevant Ads without Cookies - The Privacy Sandbox,” 2022. [Online]. Available: https://privacysandbox.com/proposals/topics/
- ——, “GitHub - patcg-individual-drafts/topics: The Topics API,” 2022. [Online]. Available: https://github.com/patcg-individual-drafts/topics/
- ——, “Topics API developer guide,” Apr. 2023. [Online]. Available: https://developer.android.com/design-for-safety/privacy-sandbox/guides/topics
- M. Thomson, “A Privacy Analysis of Google’s Topics Proposal,” Mozilla, Tech. Rep., Jan. 2023. [Online]. Available: https://mozilla.github.io/ppa-docs/topics.pdf
- N. Jha, M. Trevisan, E. Leonardi, and M. Mellia, “On the Robustness of Topics API to a Re-Identification Attack,” Proceedings on Privacy Enhancing Technologies, 2023. [Online]. Available: https://petsymposium.org/popets/2023/popets-2023-0098.php
- Y. Beugin and P. McDaniel, “Interest-disclosing Mechanisms for Advertising are Privacy-Exposing (not Preserving),” Proceedings on Privacy Enhancing Technologies, 2024. [Online]. Available: https://petsymposium.org/popets/2024/popets-2024-0004.php
- A. Epasto, A. M. Medina, C. Ilvento, and J. Karlin, “Measures of cross-site re-identification risk: An analysis of the Topics API Proposal,” p. 12, 2022.
- C. J. Carey, T. Dick, A. Epasto, A. Javanmard, J. Karlin, S. Kumar, A. M. Medina, V. Mirrokni, G. H. Nunes, S. Vassilvitskii, and P. Zhong, “Measuring Re-identification Risk,” Apr. 2023. [Online]. Available: http://arxiv.org/abs/2304.07210
- J. Kulshrestha, M. Oliveira, O. Karacalik, D. Bonnay, and C. Wagner, “A web tracking data set of online browsing behavior of 2,148 users,” Dec. 2020. [Online]. Available: https://zenodo.org/records/4757574
- ——, “Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data,” Dec. 2020. [Online]. Available: http://arxiv.org/abs/2012.15112
- S. Dutton, “Topics API: developer guide,” Jan. 2022. [Online]. Available: https://developer.chrome.com/docs/privacy-sandbox/topics/
- Google, “Preparing to ship the Privacy Sandbox relevance and measurement APIs,” May 2023. [Online]. Available: https://developer.chrome.com/blog/shipping-privacy-sandbox/
- ——, “Shipping the Privacy Sandbox relevance and measurement APIs,” 2023. [Online]. Available: https://developers.google.com/privacy-sandbox/blog/privacy-sandbox-launch
- A. van Kesteren, “The Topics API · Issue #111 · webKit/standards-positions,” 2022. [Online]. Available: https://github.com/webKit/standards-positions/issues/111#issuecomment-1359609317
- M. Thomson, “Request for Position: Topics API · Issue #622 · mozilla/standards-positions,” 2022. [Online]. Available: https://github.com/mozilla/standards-positions/issues/622
- P. Snyder, “Google’s Topics API: Rebranding FLoC Without Addressing Key Privacy Issues,” Jan. 2022. [Online]. Available: https://brave.com/web-standards-at-brave/7-googles-topics-api/
- L. Tauscher and S. Greenberg, “How people revisit web pages: empirical findings and implications for the design of history systems,” International Journal of Human-Computer Studies, vol. 47, no. 1, pp. 97–137, Jul. 1997. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1071581997901257
- A. Montgomery and C. Faloutsos, “Identifying web Browsing Trends and Patterns,” Computer, vol. 34, no. 7, pp. 94–95, 2001.
- R. Kumar and A. Tomkins, “A characterization of online browsing behavior,” in Proceedings of the 19th international conference on World wide web, ser. WWW ’10. New York, NY, USA: Association for Computing Machinery, Apr. 2010, pp. 561–570. [Online]. Available: https://doi.org/10.1145/1772690.1772748
- S. Goel, J. Hofman, and M. Sirer, “Who Does What on the web: A Large-Scale Study of Browsing Behavior,” Proceedings of the International AAAI Conference on web and Social Media, vol. 6, no. 1, pp. 130–137, 2012. [Online]. Available: https://ojs.aaai.org/index.php/ICWSM/article/view/14266
- S. K. Tyler, J. Teevan, P. Bailey, S. d. l. Chica, and N. Dandekar, “Large Scale Log Analysis of Individuals’ Domain Preferences in web Search,” Microsoft Research, Tech. Rep. MSR-TR-2015-048, Jun. 2015. [Online]. Available: https://www.microsoft.com/en-us/research/publication/large-scale-log-analysis-of-individuals-domain-preferences-in-web-search/
- H. Müller, J. L. Gove, J. S. webb, and A. Cheang, “Understanding and Comparing Smartphone and Tablet Use: Insights from a Large-Scale Diary Study,” in Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction, ser. OzCHI ’15. New York, NY, USA: Association for Computing Machinery, Dec. 2015, pp. 427–436. [Online]. Available: https://dl.acm.org/doi/10.1145/2838739.2838748
- Google, “Topics API latest updates | Privacy Sandbox,” Nov. 2023. [Online]. Available: https://developers.google.com/privacy-sandbox/relevance/topics/latest
- Z. Durumeric, “Cached Chrome Top Million Websites,” Feb. 2023. [Online]. Available: https://github.com/zakird/crux-top-lists
- V. Le Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczynski, and W. Joosen, “Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation,” in Proceedings 2019 Network and Distributed System Security Symposium. San Diego, CA: Internet Society, 2019. [Online]. Available: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_01B-3_LePochat_paper.pdf
- M. Oliveira, J. Yang, D. Griffiths, D. Bonnay, and J. Kulshrestha, “Browsing behavior exposes identities on the Web,” Dec. 2023. [Online]. Available: http://arxiv.org/abs/2312.15489
- J. Su, A. Shukla, S. Goel, and A. Narayanan, “De-anonymizing Web Browsing Data with Social Networks,” in Proceedings of the 26th International Conference on World Wide Web. Perth Australia: International World Wide Web Conferences Steering Committee, Apr. 2017, pp. 1261–1269. [Online]. Available: https://dl.acm.org/doi/10.1145/3038912.3052714
- W. Aiello, C. Kalmanek, P. McDaniel, S. Sen, O. Spatscheck, and J. Van Der Merwe, “Analysis of Communities of Interest in Data Networks,” in Passive and Active Network Measurement, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, and C. Dovrolis, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, vol. 3431, pp. 83–96. [Online]. Available: http://link.springer.com/10.1007/978-3-540-31966-5_7
- P. McDaniel, S. Sen, O. Spatschek, and J. E. V. D. Merwe, “System and method for tracking individuals on a data network using communities of interest,” EP Patent EP1 699 173A1, Sep., 2006. [Online]. Available: https://patents.google.com/patent/EP1699173A1/en
- L. Olejnik, C. Castelluccia, and A. Janc, “On the uniqueness of web browsing history patterns,” annals of telecommunications - annales des télécommunications, vol. 69, pp. 63–74, 2012. [Online]. Available: https://api.semanticscholar.org/CorpusID:14783622
- S. Bird, I. Segall, and M. Lopatka, “Replication: Why we still can’t browse in peace: On the uniqueness and reidentifiability of web browsing histories,” in Sixteenth Symposium on Usable Privacy and Security (SOUPS 2020). Boston, MA, USA: USENIX Association, Aug. 2020, pp. 489–503. [Online]. Available: https://www.usenix.org/conference/soups2020/presentation/bird
- Google, “The Privacy Sandbox,” 2021. [Online]. Available: https://developer.chrome.com/docs/privacy-sandbox/
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.