Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Public and Reproducible Assessment of the Topics API on Real Data

Published 28 Mar 2024 in cs.CR | (2403.19577v3)

Abstract: The Topics API for the web is Google's privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google's results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users' interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these real browsing traces by adapting methodologies from prior privacy studies. Finally, we call on web actors to perform and enable reproducible evaluations by releasing anonymized distributions. We find that for the 1207 real users in this dataset, the probability of being re-identified across websites is of 2%, 3%, and 4% after 1, 2, and 3 observations of their topics by advertisers, respectively. This paper shows on real data that Topics does not provide the same privacy guarantees to all users and that the information leakage worsens over time, further highlighting the need for public and reproducible evaluations of the claims made by new web proposals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Google, “Topics API: Relevant Ads without Cookies - The Privacy Sandbox,” 2022. [Online]. Available: https://privacysandbox.com/proposals/topics/
  2. ——, “GitHub - patcg-individual-drafts/topics: The Topics API,” 2022. [Online]. Available: https://github.com/patcg-individual-drafts/topics/
  3. ——, “Topics API developer guide,” Apr. 2023. [Online]. Available: https://developer.android.com/design-for-safety/privacy-sandbox/guides/topics
  4. M. Thomson, “A Privacy Analysis of Google’s Topics Proposal,” Mozilla, Tech. Rep., Jan. 2023. [Online]. Available: https://mozilla.github.io/ppa-docs/topics.pdf
  5. N. Jha, M. Trevisan, E. Leonardi, and M. Mellia, “On the Robustness of Topics API to a Re-Identification Attack,” Proceedings on Privacy Enhancing Technologies, 2023. [Online]. Available: https://petsymposium.org/popets/2023/popets-2023-0098.php
  6. Y. Beugin and P. McDaniel, “Interest-disclosing Mechanisms for Advertising are Privacy-Exposing (not Preserving),” Proceedings on Privacy Enhancing Technologies, 2024. [Online]. Available: https://petsymposium.org/popets/2024/popets-2024-0004.php
  7. A. Epasto, A. M. Medina, C. Ilvento, and J. Karlin, “Measures of cross-site re-identification risk: An analysis of the Topics API Proposal,” p. 12, 2022.
  8. C. J. Carey, T. Dick, A. Epasto, A. Javanmard, J. Karlin, S. Kumar, A. M. Medina, V. Mirrokni, G. H. Nunes, S. Vassilvitskii, and P. Zhong, “Measuring Re-identification Risk,” Apr. 2023. [Online]. Available: http://arxiv.org/abs/2304.07210
  9. J. Kulshrestha, M. Oliveira, O. Karacalik, D. Bonnay, and C. Wagner, “A web tracking data set of online browsing behavior of 2,148 users,” Dec. 2020. [Online]. Available: https://zenodo.org/records/4757574
  10. ——, “Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data,” Dec. 2020. [Online]. Available: http://arxiv.org/abs/2012.15112
  11. S. Dutton, “Topics API: developer guide,” Jan. 2022. [Online]. Available: https://developer.chrome.com/docs/privacy-sandbox/topics/
  12. Google, “Preparing to ship the Privacy Sandbox relevance and measurement APIs,” May 2023. [Online]. Available: https://developer.chrome.com/blog/shipping-privacy-sandbox/
  13. ——, “Shipping the Privacy Sandbox relevance and measurement APIs,” 2023. [Online]. Available: https://developers.google.com/privacy-sandbox/blog/privacy-sandbox-launch
  14. A. van Kesteren, “The Topics API · Issue #111 · webKit/standards-positions,” 2022. [Online]. Available: https://github.com/webKit/standards-positions/issues/111#issuecomment-1359609317
  15. M. Thomson, “Request for Position: Topics API · Issue #622 · mozilla/standards-positions,” 2022. [Online]. Available: https://github.com/mozilla/standards-positions/issues/622
  16. P. Snyder, “Google’s Topics API: Rebranding FLoC Without Addressing Key Privacy Issues,” Jan. 2022. [Online]. Available: https://brave.com/web-standards-at-brave/7-googles-topics-api/
  17. L. Tauscher and S. Greenberg, “How people revisit web pages: empirical findings and implications for the design of history systems,” International Journal of Human-Computer Studies, vol. 47, no. 1, pp. 97–137, Jul. 1997. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1071581997901257
  18. A. Montgomery and C. Faloutsos, “Identifying web Browsing Trends and Patterns,” Computer, vol. 34, no. 7, pp. 94–95, 2001.
  19. R. Kumar and A. Tomkins, “A characterization of online browsing behavior,” in Proceedings of the 19th international conference on World wide web, ser. WWW ’10.   New York, NY, USA: Association for Computing Machinery, Apr. 2010, pp. 561–570. [Online]. Available: https://doi.org/10.1145/1772690.1772748
  20. S. Goel, J. Hofman, and M. Sirer, “Who Does What on the web: A Large-Scale Study of Browsing Behavior,” Proceedings of the International AAAI Conference on web and Social Media, vol. 6, no. 1, pp. 130–137, 2012. [Online]. Available: https://ojs.aaai.org/index.php/ICWSM/article/view/14266
  21. S. K. Tyler, J. Teevan, P. Bailey, S. d. l. Chica, and N. Dandekar, “Large Scale Log Analysis of Individuals’ Domain Preferences in web Search,” Microsoft Research, Tech. Rep. MSR-TR-2015-048, Jun. 2015. [Online]. Available: https://www.microsoft.com/en-us/research/publication/large-scale-log-analysis-of-individuals-domain-preferences-in-web-search/
  22. H. Müller, J. L. Gove, J. S. webb, and A. Cheang, “Understanding and Comparing Smartphone and Tablet Use: Insights from a Large-Scale Diary Study,” in Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction, ser. OzCHI ’15.   New York, NY, USA: Association for Computing Machinery, Dec. 2015, pp. 427–436. [Online]. Available: https://dl.acm.org/doi/10.1145/2838739.2838748
  23. Google, “Topics API latest updates | Privacy Sandbox,” Nov. 2023. [Online]. Available: https://developers.google.com/privacy-sandbox/relevance/topics/latest
  24. Z. Durumeric, “Cached Chrome Top Million Websites,” Feb. 2023. [Online]. Available: https://github.com/zakird/crux-top-lists
  25. V. Le Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczynski, and W. Joosen, “Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation,” in Proceedings 2019 Network and Distributed System Security Symposium.   San Diego, CA: Internet Society, 2019. [Online]. Available: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_01B-3_LePochat_paper.pdf
  26. M. Oliveira, J. Yang, D. Griffiths, D. Bonnay, and J. Kulshrestha, “Browsing behavior exposes identities on the Web,” Dec. 2023. [Online]. Available: http://arxiv.org/abs/2312.15489
  27. J. Su, A. Shukla, S. Goel, and A. Narayanan, “De-anonymizing Web Browsing Data with Social Networks,” in Proceedings of the 26th International Conference on World Wide Web.   Perth Australia: International World Wide Web Conferences Steering Committee, Apr. 2017, pp. 1261–1269. [Online]. Available: https://dl.acm.org/doi/10.1145/3038912.3052714
  28. W. Aiello, C. Kalmanek, P. McDaniel, S. Sen, O. Spatscheck, and J. Van Der Merwe, “Analysis of Communities of Interest in Data Networks,” in Passive and Active Network Measurement, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, and C. Dovrolis, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, vol. 3431, pp. 83–96. [Online]. Available: http://link.springer.com/10.1007/978-3-540-31966-5_7
  29. P. McDaniel, S. Sen, O. Spatschek, and J. E. V. D. Merwe, “System and method for tracking individuals on a data network using communities of interest,” EP Patent EP1 699 173A1, Sep., 2006. [Online]. Available: https://patents.google.com/patent/EP1699173A1/en
  30. L. Olejnik, C. Castelluccia, and A. Janc, “On the uniqueness of web browsing history patterns,” annals of telecommunications - annales des télécommunications, vol. 69, pp. 63–74, 2012. [Online]. Available: https://api.semanticscholar.org/CorpusID:14783622
  31. S. Bird, I. Segall, and M. Lopatka, “Replication: Why we still can’t browse in peace: On the uniqueness and reidentifiability of web browsing histories,” in Sixteenth Symposium on Usable Privacy and Security (SOUPS 2020).   Boston, MA, USA: USENIX Association, Aug. 2020, pp. 489–503. [Online]. Available: https://www.usenix.org/conference/soups2020/presentation/bird
  32. Google, “The Privacy Sandbox,” 2021. [Online]. Available: https://developer.chrome.com/docs/privacy-sandbox/

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.