Papers
Topics
Authors
Recent
Search
2000 character limit reached

FreqyWM: Frequency Watermarking for the New Data Economy

Published 27 Dec 2023 in cs.CR and cs.DB | (2312.16547v1)

Abstract: We present a novel technique for modulating the appearance frequency of a few tokens within a dataset for encoding an invisible watermark that can be used to protect ownership rights upon data. We develop optimal as well as fast heuristic algorithms for creating and verifying such watermarks. We also demonstrate the robustness of our technique against various attacks and derive analytical bounds for the false positive probability of erroneously detecting a watermark on a dataset that does not carry it. Our technique is applicable to both single dimensional and multidimensional datasets, is independent of token type, allows for a fine control of the introduced distortion, and can be used in a variety of use cases that involve buying and selling data in contemporary data marketplaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. S. A. Azcoitia and N. Laoutaris, “A survey of data marketplaces and their business models,” SIGMOD Rec., vol. 51, no. 3, pp. 18–29, 2022. [Online]. Available: https://doi.org/10.1145/3572751.3572755
  2. A. Lutu, D. Perino, M. Bagnulo, E. Frias-Martinez, and J. Khangosstar, “A characterization of the covid-19 pandemic impact on a mobile network operator traffic,” in Proceedings of the ACM Internet Measurement Conference, ser. IMC ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3419394.3423655
  3. D. Evans, V. Kolesnikov, and M. Rosulek, “A pragmatic introduction to secure multi-party computation,” Found. Trends Priv. Secur., 2018. [Online]. Available: https://doi.org/10.1561/3300000019
  4. C. Gentry, “A fully homomorphic encryption scheme,” Ph.D. dissertation, Stanford University, USA, 2009. [Online]. Available: https://searchworks.stanford.edu/view/8493082
  5. D. Boneh, A. Sahai, and B. Waters, “Functional encryption: Definitions and challenges,” in Theory of Cryptography Conference, TCC.   Springer, 2011. [Online]. Available: https://doi.org/10.1007/978-3-642-19571-6_16
  6. M. Sabt, M. Achemlal, and A. Bouabdallah, “Trusted execution environment: What it is, and what it is not,” in TrustCom/BigDataSE/ISPA.   IEEE, 2015. [Online]. Available: https://doi.org/10.1109/Trustcom.2015.357
  7. Y. Li, D. Ghosh, P. Gupta, S. Mehrotra, N. Panwar, and S. Sharma, “PRISM: private verifiable set computation over multi-owner outsourced databases,” in SIGMOD: International Conference on Management of Data, Virtual.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3448016.3452839
  8. R. Poddar, T. Boelter, and R. A. Popa, “Arx: An encrypted database using semantically secure encryption,” Proc. VLDB Endow., 2019. [Online]. Available: http://www.vldb.org/pvldb/vol12/p1664-poddar.pdf
  9. N. Anciaux, L. Bouganim, P. Pucheral, I. S. Popa, and G. Scerri, “Personal database security and trusted execution environments: A tutorial at the crossroads,” Proc. VLDB Endow., 2019. [Online]. Available: http://www.vldb.org/pvldb/vol12/p1994-anciaux.pdf
  10. X. Ren, L. Su, Z. Gu, S. Wang, F. Li, Y. Xie, S. Bian, C. Li, and F. Zhang, “HEDA: multi-attribute unbounded aggregation over homomorphically encrypted database,” Proc. VLDB Endow., 2022. [Online]. Available: https://www.vldb.org/pvldb/vol16/p601-gu.pdf
  11. W. Zhou, Y. Cai, Y. Peng, S. Wang, K. Ma, and F. Li, “Veridb: An sgx-based verifiable database,” in SIGMOD: International Conference on Management of Data.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3448016.3457308
  12. P. Jougleux, “Data ownership (and succession law),” in Facebook and the (EU) Law: How the Social Network Reshaped the Legal Framework.   Springer, 2022, pp. 129–143.
  13. J. Kennedy, P. Subramaniam, S. Galhotra, and R. C. Fernandez, “Revisiting online data markets in 2022: A seller and buyer perspective,” SIGMOD Rec., vol. 51, no. 3, pp. 30–37, 2022. [Online]. Available: https://doi.org/10.1145/3572751.3572757
  14. R. C. Fernandez, P. Subramaniam, and M. J. Franklin, “Data market platforms: Trading data assets to solve data problems,” Proc. VLDB Endow., vol. 13, no. 11, pp. 1933–1947, 2020. [Online]. Available: http://www.vldb.org/pvldb/vol13/p1933-fernandez.pdf
  15. F. Banterle, “Data ownership in the data economy: a european dilemma,” EU Internet Law in the Digital Era: Regulation and Enforcement, pp. 199–225, 2020. [Online]. Available: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3277330
  16. M. Asikuzzaman and M. R. Pickering, “An overview of digital video watermarking,” IEEE Trans. Circuits Syst. Video Technol., 2018. [Online]. Available: https://doi.org/10.1109/TCSVT.2017.2712162
  17. M. Begum and M. S. Uddin, “Digital image watermarking techniques: A review,” Inf., 2020. [Online]. Available: https://doi.org/10.3390/info11020110
  18. H. Ma, C. Jia, S. Li, W. Zheng, and D. Wu, “Xmark: Dynamic software watermarking using collatz conjecture,” IEEE Trans. Inf. Forensics Secur., 2019. [Online]. Available: https://doi.org/10.1109/TIFS.2019.2908071
  19. X. Zhou, H. Pang, K. Tan, and D. Mangla, “Wmxml: A system for watermarking XML data,” in International Conference on Very Large Data Bases (VLDB).   ACM, 2005. [Online]. Available: http://www.vldb.org/conf/2005/papers/p1318-zhou.pdf
  20. R. Agrawal and J. Kiernan, “Watermarking relational databases,” in Proceedings ofInternational Conference on Very Large Data Bases, VLDB, 2002. [Online]. Available: http://www.vldb.org/conf/2002/S05P03.pdf
  21. R. Agrawal, P. J. Haas, and J. Kiernan, “A system for watermarking relational databases,” in ACM SIGMOD International Conference, 2003. [Online]. Available: https://doi.org/10.1145/872757.872865
  22. T. Wang and F. Kerschbaum, “RIGA: covert and robust white-box watermarking of deep neural networks,” in WWW: The Web Conference, 2021. [Online]. Available: https://doi.org/10.1145/3442381.3450000
  23. S. Rani and R. Halder, “Comparative analysis of relational database watermarking techniques: An empirical study,” IEEE Access, vol. 10, pp. 27 970–27 989, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3157866
  24. N. Agarwal, A. K. Singh, and P. K. Singh, “Survey of robust and imperceptible watermarking,” Multim. Tools Appl., 2019. [Online]. Available: https://doi.org/10.1007/s11042-018-7128-5
  25. R. Agrawal, P. J. Haas, and J. Kiernan, “Watermarking relational data: framework, algorithms and analysis,” VLDB J., 2003. [Online]. Available: https://doi.org/10.1007/s00778-003-0097-x
  26. T. Ji, E. Yilmaz, E. Ayday, and P. Li, “The curse of correlations for robust fingerprinting of relational databases,” in RAID : International Symposium on Research in Attacks, Intrusions and Defenses.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3471621.3471853
  27. E. Quiring, D. Arp, and K. Rieck, “Forgotten siblings: Unifying attacks on machine learning and digital watermarking,” in IEEE European Symposium on Security and Privacy, EuroS&P.   IEEE, 2018. [Online]. Available: https://doi.org/10.1109/EuroSP.2018.00041
  28. A. Cohen, J. Holmgren, R. Nishimaki, V. Vaikuntanathan, and D. Wichs, “Watermarking cryptographic capabilities,” SIAM J. Comput., 2018. [Online]. Available: https://doi.org/10.1137/18M1164834
  29. X. Tang, Z. Cao, X. Dong, and J. Shen, “Pkmark: A robust zero-distortion blind reversible scheme for watermarking relational databases,” in IEEE International Conference on Big Data Science and Engineering, 2021. [Online]. Available: https://doi.org/10.1109/BigDataSE53435.2021.00020
  30. W. Li, N. Li, J. Yan, Z. Zhang, P. Yu, and G. Long, “Secure and high-quality watermarking algorithms for relational database based on semantic,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–14, 2022.
  31. M. L. P. Gort, M. Olliaro, A. Cortesi, and C. F. Uribe, “Semantic-driven watermarking of relational textual databases,” Expert Syst. Appl., 2021. [Online]. Available: https://doi.org/10.1016/j.eswa.2020.114013
  32. C. Lin, T. Nguyen, and C. Chang, “LRW-CRDB: lossless robust watermarking scheme for categorical relational databases,” Symmetry, 2021. [Online]. Available: https://doi.org/10.3390/sym13112191
  33. S. Kumar, B. K. Singh, and M. Yadav, “A recent survey on multimedia and database watermarking,” Multim. Tools Appl., vol. 79, no. 27-28, pp. 20 149–20 197, 2020. [Online]. Available: https://doi.org/10.1007/s11042-020-08881-y
  34. M. H. Jony, F. T. Johora, and J. F. Katha, “A robust and efficient numeric approach for relational database watermarking,” in IEEE International Conference on Sustainable Technologies for Industry 4.0 (STI), 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9732582
  35. M. Shehab, E. Bertino, and A. Ghafoor, “Watermarking relational databases using optimization-based techniques,” IEEE Trans. Knowl. Data Eng., 2008. [Online]. Available: https://doi.org/10.1109/TKDE.2007.190668
  36. D. Ibosiola, B. A. Steer, Á. García-Recuero, G. Stringhini, S. Uhlig, and G. Tyson, “Movie pirates of the caribbean: Exploring illegal streaming cyberlockers,” in Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM.   AAAI Press, 2018. [Online]. Available: https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17835
  37. W. Zhou, J. Hu, and S. Wang, “Enhanced locality-sensitive hashing for fingerprint forensics over large multi-sensor databases,” IEEE Trans. Big Data, 2021. [Online]. Available: https://doi.org/10.1109/TBDATA.2017.2736547
  38. Y. Lei, Q. Huang, M. S. Kankanhalli, and A. K. H. Tung, “Locality-sensitive hashing scheme based on longest circular co-substring,” in Proceedings of the 2020 International Conference on Management of Data, SIGMOD.   ACM, 2020. [Online]. Available: https://doi.org/10.1145/3318464.3389778
  39. D. Chang, M. Ghosh, S. K. Sanadhya, M. Singh, and D. R. White, “Fbhash: A new similarity hashing scheme for digital forensics,” Digit. Investig., 2019. [Online]. Available: https://doi.org/10.1016/j.diin.2019.04.006
  40. C. N. K. Osiakwan and S. G. Akl, “The maximum weight perfect matching problem for complete weighted graphs is in pc*,” Parallel Algorithms Appl., 1995. [Online]. Available: https://doi.org/10.1080/10637199508915506
  41. Z. Galil, “Efficient algorithms for finding maximum matching in graphs,” in ACM CSUR, 1986.
  42. E. Ayday, E. Yilmaz, and A. Yilmaz, “Robust optimization-based watermarking scheme for sequential data,” in International Symposium on Research in Attacks, Intrusions and Defenses, RAID, 2019. [Online]. Available: https://www.usenix.org/conference/raid2019/presentation/ayday
  43. T. Ji, E. Ayday, E. Yilmaz, and P. Li, “Robust fingerprinting of genomic databases,” CoRR, vol. abs/2204.01801, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2204.01801
  44. M. Kamran and M. Farooq, “A comprehensive survey of watermarking relational databases research,” in arXiv preprint arXiv:1801.08271, 2018.
  45. A. S. Panah, R. G. van Schyndel, T. K. Sellis, and E. Bertino, “On the properties of non-media digital watermarking: A review of state of the art techniques,” IEEE Access, 2016. [Online]. Available: https://doi.org/10.1109/ACCESS.2016.2570812
  46. M. E. Farfoura, S. Horng, J. Lai, R. Run, R. Chen, and M. K. Khan, “A blind reversible method for watermarking relational databases based on a time-stamping protocol,” Expert Syst. Appl., 2012. [Online]. Available: https://doi.org/10.1016/j.eswa.2011.09.005
  47. Y. Li and R. H. Deng, “Publicly verifiable ownership protection for relational databases,” in Proceedings of th ACM Symposium on Information, Computer and Communications Security, ASIACCS.   ACM, 2006. [Online]. Available: https://doi.org/10.1145/1128817.1128832
  48. D. Hu, D. Zhao, and S. Zheng, “A new robust approach for reversible database watermarking with distortion control,” IEEE Trans. Knowl. Data Eng., 2019. [Online]. Available: https://doi.org/10.1109/TKDE.2018.2851517
  49. H. M. El-Bakry and M. Hamada, “A novel watermark technique for relational databases,” in Artificial Intelligence and Computational Intelligence - International Conference, AICI 2010, Sanya, China, October 23-24, 2010, Proceedings, Part II, ser. Lecture Notes in Computer Science.   Springer, 2010. [Online]. Available: https://doi.org/10.1007/978-3-642-16527-6_29
  50. S. M. Darwish, H. A. Selim, and M. M. El-Sherbiny, “Distortion free database watermarking system based on intelligent mechanism for content integrity and ownership control,” J. Comput., 2018. [Online]. Available: https://doi.org/10.17706/jcp.13.9.1053-1066
  51. Y. Zhang, B. Yang, and X.-M. Niu, “Reversible watermarking for relational database authentication,” 2008.
  52. W. Wang, C. Liu, Z. Wang, and T. Liang, “FBIPT: A new robust reversible database watermarking technique based on position tuples,” in International Conference on Data Intelligence and Security, ICDIS.   IEEE, 2022, pp. 67–74. [Online]. Available: https://doi.org/10.1109/ICDIS55630.2022.00018
  53. G. Gupta and J. Pieprzyk, “Reversible and blind database watermarking using difference expansion,” Int. J. Digit. Crime Forensics, 2009. [Online]. Available: https://doi.org/10.4018/jdcf.2009040104
  54. K. Jawad and A. Khan, “Genetic algorithm and difference expansion based reversible watermarking for relational databases,” J. Syst. Softw., 2013. [Online]. Available: https://doi.org/10.1016/j.jss.2013.06.023
  55. M. B. Imamoglu, M. Ulutas, and G. Ulutas, “A new reversible database watermarking approach with firefly optimization algorithm,” Mathematical Problems in Engineering, 2017. [Online]. Available: https://doi.org/10.1155/2017/1387375
  56. C. Chang, T. Nguyen, and C. Lin, “A reversible database watermark scheme for textual and numerical datasets,” in IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD.   IEEE, 2021. [Online]. Available: https://doi.org/10.1109/SNPD51163.2021.9704991
  57. C. Iordanou, N. Kourtellis, J. M. Carrascosa, C. Soriente, R. Cuevas, and N. Laoutaris, “Beyond content analysis: detecting targeted ads via distributed counting,” in Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies, CoNEXT.   ACM, 2019. [Online]. Available: https://doi.org/10.1145/3359989.3365428
  58. G. Cormode, S. Maddock, and C. Maple, “Frequency estimation under local differential privacy,” Proc. VLDB Endow., 2021. [Online]. Available: http://www.vldb.org/pvldb/vol14/p2046-cormode.pdf
  59. D. İşler, E. Cabana, A. Garcia-Recuero, G. Koutrika, and N. Laoutaris, “Freqywm: Frequency watermarking for the new data economy,” IMDEA Networks Technical Report, Tech. Rep., 2022.
  60. “Chicago Data Portal,” 2022, https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew.
  61. “Adult Dataset,” 1996, https://archive.ics.uci.edu/ml/datasets/Adult.
  62. A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,” SIAM Rev., 2009. [Online]. Available: https://doi.org/10.1137/070710111
  63. A. Kerckhoffs, “A. kerckhoffs, la cryptographie militaire, journal des sciences militaires ix, 38 (1883),” in Journal des sciences militaires, 1883.
  64. A. Adelsbach, S. Katzenbeisser, and H. Veith, “Watermarking schemes provably secure against copy and ambiguity attacks,” in ACM workshop on Digital rights management, 2003. [Online]. Available: https://doi.org/10.1145/947380.947395
  65. S. Behnezhad, “Dynamic algorithms for maximum matching size,” in ACM-SIAM Symposium on Discrete Algorithms, SODA.   SIAM, 2023. [Online]. Available: https://doi.org/10.1137/1.9781611977554.ch6
  66. S. Solomon, “Fully dynamic maximal matching in constant update time,” in IEEE Annual Symposium on Foundations of Computer Science, FOCS.   IEEE Computer Society, 2016. [Online]. Available: https://doi.org/10.1109/FOCS.2016.43
  67. T. Ji, E. Ayday, E. Yilmaz, and P. Li, “Differentially-private fingerprinting of relational databases,” CoRR, vol. abs/2109.02768, 2021. [Online]. Available: https://arxiv.org/abs/2109.02768
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.