FreqyWM: Frequency Watermarking for the New Data Economy
Abstract: We present a novel technique for modulating the appearance frequency of a few tokens within a dataset for encoding an invisible watermark that can be used to protect ownership rights upon data. We develop optimal as well as fast heuristic algorithms for creating and verifying such watermarks. We also demonstrate the robustness of our technique against various attacks and derive analytical bounds for the false positive probability of erroneously detecting a watermark on a dataset that does not carry it. Our technique is applicable to both single dimensional and multidimensional datasets, is independent of token type, allows for a fine control of the introduced distortion, and can be used in a variety of use cases that involve buying and selling data in contemporary data marketplaces.
- S. A. Azcoitia and N. Laoutaris, “A survey of data marketplaces and their business models,” SIGMOD Rec., vol. 51, no. 3, pp. 18–29, 2022. [Online]. Available: https://doi.org/10.1145/3572751.3572755
- A. Lutu, D. Perino, M. Bagnulo, E. Frias-Martinez, and J. Khangosstar, “A characterization of the covid-19 pandemic impact on a mobile network operator traffic,” in Proceedings of the ACM Internet Measurement Conference, ser. IMC ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3419394.3423655
- D. Evans, V. Kolesnikov, and M. Rosulek, “A pragmatic introduction to secure multi-party computation,” Found. Trends Priv. Secur., 2018. [Online]. Available: https://doi.org/10.1561/3300000019
- C. Gentry, “A fully homomorphic encryption scheme,” Ph.D. dissertation, Stanford University, USA, 2009. [Online]. Available: https://searchworks.stanford.edu/view/8493082
- D. Boneh, A. Sahai, and B. Waters, “Functional encryption: Definitions and challenges,” in Theory of Cryptography Conference, TCC. Springer, 2011. [Online]. Available: https://doi.org/10.1007/978-3-642-19571-6_16
- M. Sabt, M. Achemlal, and A. Bouabdallah, “Trusted execution environment: What it is, and what it is not,” in TrustCom/BigDataSE/ISPA. IEEE, 2015. [Online]. Available: https://doi.org/10.1109/Trustcom.2015.357
- Y. Li, D. Ghosh, P. Gupta, S. Mehrotra, N. Panwar, and S. Sharma, “PRISM: private verifiable set computation over multi-owner outsourced databases,” in SIGMOD: International Conference on Management of Data, Virtual. ACM, 2021. [Online]. Available: https://doi.org/10.1145/3448016.3452839
- R. Poddar, T. Boelter, and R. A. Popa, “Arx: An encrypted database using semantically secure encryption,” Proc. VLDB Endow., 2019. [Online]. Available: http://www.vldb.org/pvldb/vol12/p1664-poddar.pdf
- N. Anciaux, L. Bouganim, P. Pucheral, I. S. Popa, and G. Scerri, “Personal database security and trusted execution environments: A tutorial at the crossroads,” Proc. VLDB Endow., 2019. [Online]. Available: http://www.vldb.org/pvldb/vol12/p1994-anciaux.pdf
- X. Ren, L. Su, Z. Gu, S. Wang, F. Li, Y. Xie, S. Bian, C. Li, and F. Zhang, “HEDA: multi-attribute unbounded aggregation over homomorphically encrypted database,” Proc. VLDB Endow., 2022. [Online]. Available: https://www.vldb.org/pvldb/vol16/p601-gu.pdf
- W. Zhou, Y. Cai, Y. Peng, S. Wang, K. Ma, and F. Li, “Veridb: An sgx-based verifiable database,” in SIGMOD: International Conference on Management of Data. ACM, 2021. [Online]. Available: https://doi.org/10.1145/3448016.3457308
- P. Jougleux, “Data ownership (and succession law),” in Facebook and the (EU) Law: How the Social Network Reshaped the Legal Framework. Springer, 2022, pp. 129–143.
- J. Kennedy, P. Subramaniam, S. Galhotra, and R. C. Fernandez, “Revisiting online data markets in 2022: A seller and buyer perspective,” SIGMOD Rec., vol. 51, no. 3, pp. 30–37, 2022. [Online]. Available: https://doi.org/10.1145/3572751.3572757
- R. C. Fernandez, P. Subramaniam, and M. J. Franklin, “Data market platforms: Trading data assets to solve data problems,” Proc. VLDB Endow., vol. 13, no. 11, pp. 1933–1947, 2020. [Online]. Available: http://www.vldb.org/pvldb/vol13/p1933-fernandez.pdf
- F. Banterle, “Data ownership in the data economy: a european dilemma,” EU Internet Law in the Digital Era: Regulation and Enforcement, pp. 199–225, 2020. [Online]. Available: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3277330
- M. Asikuzzaman and M. R. Pickering, “An overview of digital video watermarking,” IEEE Trans. Circuits Syst. Video Technol., 2018. [Online]. Available: https://doi.org/10.1109/TCSVT.2017.2712162
- M. Begum and M. S. Uddin, “Digital image watermarking techniques: A review,” Inf., 2020. [Online]. Available: https://doi.org/10.3390/info11020110
- H. Ma, C. Jia, S. Li, W. Zheng, and D. Wu, “Xmark: Dynamic software watermarking using collatz conjecture,” IEEE Trans. Inf. Forensics Secur., 2019. [Online]. Available: https://doi.org/10.1109/TIFS.2019.2908071
- X. Zhou, H. Pang, K. Tan, and D. Mangla, “Wmxml: A system for watermarking XML data,” in International Conference on Very Large Data Bases (VLDB). ACM, 2005. [Online]. Available: http://www.vldb.org/conf/2005/papers/p1318-zhou.pdf
- R. Agrawal and J. Kiernan, “Watermarking relational databases,” in Proceedings ofInternational Conference on Very Large Data Bases, VLDB, 2002. [Online]. Available: http://www.vldb.org/conf/2002/S05P03.pdf
- R. Agrawal, P. J. Haas, and J. Kiernan, “A system for watermarking relational databases,” in ACM SIGMOD International Conference, 2003. [Online]. Available: https://doi.org/10.1145/872757.872865
- T. Wang and F. Kerschbaum, “RIGA: covert and robust white-box watermarking of deep neural networks,” in WWW: The Web Conference, 2021. [Online]. Available: https://doi.org/10.1145/3442381.3450000
- S. Rani and R. Halder, “Comparative analysis of relational database watermarking techniques: An empirical study,” IEEE Access, vol. 10, pp. 27 970–27 989, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3157866
- N. Agarwal, A. K. Singh, and P. K. Singh, “Survey of robust and imperceptible watermarking,” Multim. Tools Appl., 2019. [Online]. Available: https://doi.org/10.1007/s11042-018-7128-5
- R. Agrawal, P. J. Haas, and J. Kiernan, “Watermarking relational data: framework, algorithms and analysis,” VLDB J., 2003. [Online]. Available: https://doi.org/10.1007/s00778-003-0097-x
- T. Ji, E. Yilmaz, E. Ayday, and P. Li, “The curse of correlations for robust fingerprinting of relational databases,” in RAID : International Symposium on Research in Attacks, Intrusions and Defenses. ACM, 2021. [Online]. Available: https://doi.org/10.1145/3471621.3471853
- E. Quiring, D. Arp, and K. Rieck, “Forgotten siblings: Unifying attacks on machine learning and digital watermarking,” in IEEE European Symposium on Security and Privacy, EuroS&P. IEEE, 2018. [Online]. Available: https://doi.org/10.1109/EuroSP.2018.00041
- A. Cohen, J. Holmgren, R. Nishimaki, V. Vaikuntanathan, and D. Wichs, “Watermarking cryptographic capabilities,” SIAM J. Comput., 2018. [Online]. Available: https://doi.org/10.1137/18M1164834
- X. Tang, Z. Cao, X. Dong, and J. Shen, “Pkmark: A robust zero-distortion blind reversible scheme for watermarking relational databases,” in IEEE International Conference on Big Data Science and Engineering, 2021. [Online]. Available: https://doi.org/10.1109/BigDataSE53435.2021.00020
- W. Li, N. Li, J. Yan, Z. Zhang, P. Yu, and G. Long, “Secure and high-quality watermarking algorithms for relational database based on semantic,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–14, 2022.
- M. L. P. Gort, M. Olliaro, A. Cortesi, and C. F. Uribe, “Semantic-driven watermarking of relational textual databases,” Expert Syst. Appl., 2021. [Online]. Available: https://doi.org/10.1016/j.eswa.2020.114013
- C. Lin, T. Nguyen, and C. Chang, “LRW-CRDB: lossless robust watermarking scheme for categorical relational databases,” Symmetry, 2021. [Online]. Available: https://doi.org/10.3390/sym13112191
- S. Kumar, B. K. Singh, and M. Yadav, “A recent survey on multimedia and database watermarking,” Multim. Tools Appl., vol. 79, no. 27-28, pp. 20 149–20 197, 2020. [Online]. Available: https://doi.org/10.1007/s11042-020-08881-y
- M. H. Jony, F. T. Johora, and J. F. Katha, “A robust and efficient numeric approach for relational database watermarking,” in IEEE International Conference on Sustainable Technologies for Industry 4.0 (STI), 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9732582
- M. Shehab, E. Bertino, and A. Ghafoor, “Watermarking relational databases using optimization-based techniques,” IEEE Trans. Knowl. Data Eng., 2008. [Online]. Available: https://doi.org/10.1109/TKDE.2007.190668
- D. Ibosiola, B. A. Steer, Á. García-Recuero, G. Stringhini, S. Uhlig, and G. Tyson, “Movie pirates of the caribbean: Exploring illegal streaming cyberlockers,” in Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM. AAAI Press, 2018. [Online]. Available: https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17835
- W. Zhou, J. Hu, and S. Wang, “Enhanced locality-sensitive hashing for fingerprint forensics over large multi-sensor databases,” IEEE Trans. Big Data, 2021. [Online]. Available: https://doi.org/10.1109/TBDATA.2017.2736547
- Y. Lei, Q. Huang, M. S. Kankanhalli, and A. K. H. Tung, “Locality-sensitive hashing scheme based on longest circular co-substring,” in Proceedings of the 2020 International Conference on Management of Data, SIGMOD. ACM, 2020. [Online]. Available: https://doi.org/10.1145/3318464.3389778
- D. Chang, M. Ghosh, S. K. Sanadhya, M. Singh, and D. R. White, “Fbhash: A new similarity hashing scheme for digital forensics,” Digit. Investig., 2019. [Online]. Available: https://doi.org/10.1016/j.diin.2019.04.006
- C. N. K. Osiakwan and S. G. Akl, “The maximum weight perfect matching problem for complete weighted graphs is in pc*,” Parallel Algorithms Appl., 1995. [Online]. Available: https://doi.org/10.1080/10637199508915506
- Z. Galil, “Efficient algorithms for finding maximum matching in graphs,” in ACM CSUR, 1986.
- E. Ayday, E. Yilmaz, and A. Yilmaz, “Robust optimization-based watermarking scheme for sequential data,” in International Symposium on Research in Attacks, Intrusions and Defenses, RAID, 2019. [Online]. Available: https://www.usenix.org/conference/raid2019/presentation/ayday
- T. Ji, E. Ayday, E. Yilmaz, and P. Li, “Robust fingerprinting of genomic databases,” CoRR, vol. abs/2204.01801, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2204.01801
- M. Kamran and M. Farooq, “A comprehensive survey of watermarking relational databases research,” in arXiv preprint arXiv:1801.08271, 2018.
- A. S. Panah, R. G. van Schyndel, T. K. Sellis, and E. Bertino, “On the properties of non-media digital watermarking: A review of state of the art techniques,” IEEE Access, 2016. [Online]. Available: https://doi.org/10.1109/ACCESS.2016.2570812
- M. E. Farfoura, S. Horng, J. Lai, R. Run, R. Chen, and M. K. Khan, “A blind reversible method for watermarking relational databases based on a time-stamping protocol,” Expert Syst. Appl., 2012. [Online]. Available: https://doi.org/10.1016/j.eswa.2011.09.005
- Y. Li and R. H. Deng, “Publicly verifiable ownership protection for relational databases,” in Proceedings of th ACM Symposium on Information, Computer and Communications Security, ASIACCS. ACM, 2006. [Online]. Available: https://doi.org/10.1145/1128817.1128832
- D. Hu, D. Zhao, and S. Zheng, “A new robust approach for reversible database watermarking with distortion control,” IEEE Trans. Knowl. Data Eng., 2019. [Online]. Available: https://doi.org/10.1109/TKDE.2018.2851517
- H. M. El-Bakry and M. Hamada, “A novel watermark technique for relational databases,” in Artificial Intelligence and Computational Intelligence - International Conference, AICI 2010, Sanya, China, October 23-24, 2010, Proceedings, Part II, ser. Lecture Notes in Computer Science. Springer, 2010. [Online]. Available: https://doi.org/10.1007/978-3-642-16527-6_29
- S. M. Darwish, H. A. Selim, and M. M. El-Sherbiny, “Distortion free database watermarking system based on intelligent mechanism for content integrity and ownership control,” J. Comput., 2018. [Online]. Available: https://doi.org/10.17706/jcp.13.9.1053-1066
- Y. Zhang, B. Yang, and X.-M. Niu, “Reversible watermarking for relational database authentication,” 2008.
- W. Wang, C. Liu, Z. Wang, and T. Liang, “FBIPT: A new robust reversible database watermarking technique based on position tuples,” in International Conference on Data Intelligence and Security, ICDIS. IEEE, 2022, pp. 67–74. [Online]. Available: https://doi.org/10.1109/ICDIS55630.2022.00018
- G. Gupta and J. Pieprzyk, “Reversible and blind database watermarking using difference expansion,” Int. J. Digit. Crime Forensics, 2009. [Online]. Available: https://doi.org/10.4018/jdcf.2009040104
- K. Jawad and A. Khan, “Genetic algorithm and difference expansion based reversible watermarking for relational databases,” J. Syst. Softw., 2013. [Online]. Available: https://doi.org/10.1016/j.jss.2013.06.023
- M. B. Imamoglu, M. Ulutas, and G. Ulutas, “A new reversible database watermarking approach with firefly optimization algorithm,” Mathematical Problems in Engineering, 2017. [Online]. Available: https://doi.org/10.1155/2017/1387375
- C. Chang, T. Nguyen, and C. Lin, “A reversible database watermark scheme for textual and numerical datasets,” in IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD. IEEE, 2021. [Online]. Available: https://doi.org/10.1109/SNPD51163.2021.9704991
- C. Iordanou, N. Kourtellis, J. M. Carrascosa, C. Soriente, R. Cuevas, and N. Laoutaris, “Beyond content analysis: detecting targeted ads via distributed counting,” in Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies, CoNEXT. ACM, 2019. [Online]. Available: https://doi.org/10.1145/3359989.3365428
- G. Cormode, S. Maddock, and C. Maple, “Frequency estimation under local differential privacy,” Proc. VLDB Endow., 2021. [Online]. Available: http://www.vldb.org/pvldb/vol14/p2046-cormode.pdf
- D. İşler, E. Cabana, A. Garcia-Recuero, G. Koutrika, and N. Laoutaris, “Freqywm: Frequency watermarking for the new data economy,” IMDEA Networks Technical Report, Tech. Rep., 2022.
- “Chicago Data Portal,” 2022, https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew.
- “Adult Dataset,” 1996, https://archive.ics.uci.edu/ml/datasets/Adult.
- A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,” SIAM Rev., 2009. [Online]. Available: https://doi.org/10.1137/070710111
- A. Kerckhoffs, “A. kerckhoffs, la cryptographie militaire, journal des sciences militaires ix, 38 (1883),” in Journal des sciences militaires, 1883.
- A. Adelsbach, S. Katzenbeisser, and H. Veith, “Watermarking schemes provably secure against copy and ambiguity attacks,” in ACM workshop on Digital rights management, 2003. [Online]. Available: https://doi.org/10.1145/947380.947395
- S. Behnezhad, “Dynamic algorithms for maximum matching size,” in ACM-SIAM Symposium on Discrete Algorithms, SODA. SIAM, 2023. [Online]. Available: https://doi.org/10.1137/1.9781611977554.ch6
- S. Solomon, “Fully dynamic maximal matching in constant update time,” in IEEE Annual Symposium on Foundations of Computer Science, FOCS. IEEE Computer Society, 2016. [Online]. Available: https://doi.org/10.1109/FOCS.2016.43
- T. Ji, E. Ayday, E. Yilmaz, and P. Li, “Differentially-private fingerprinting of relational databases,” CoRR, vol. abs/2109.02768, 2021. [Online]. Available: https://arxiv.org/abs/2109.02768
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.