Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification
Abstract: We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.
- An ensemble of bayesian networks for multilabel classification. In Twenty-third international joint conference on artificial intelligence, 2013.
- Higher order conditional random fields in deep neural networks. In B. Leibe, J. Matas, N. Sebe, and M. Welling, editors, Computer Vision – ECCV 2016, pages 524–540, Cham, 2016. Springer International Publishing.
- Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation: Combining Probabilistic Graphical Models with Deep Learning for Structured Prediction. IEEE Signal Processing Magazine, 35(1):37–52, Jan. 2018. ISSN 1558-0792. doi:10.1109/MSP.2017.2762355.
- Transformation and Linearization Techniques in Optimization: A State-of-the-Art Survey. Mathematics, 10(2):283, Jan. 2022. ISSN 2227-7390. doi:10.3390/math10020283.
- J. Besag. Statistical Analysis of Non-Lattice Data. The Statistician, 24:179–195, 1975.
- Learning deep structured models. In International Conference on Machine Learning, pages 1785–1794. PMLR, 2015.
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, Apr. 2018. ISSN 0162-8828, 2160-9292. doi:10.1109/TPAMI.2017.2699184.
- Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition, Aug. 2019a.
- X. Chen and A. L. Yuille. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations. In Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
- Multi-Label Image Recognition with Joint Class-Aware Map Disentangling and Label Correlation Embedding. 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 622–627, July 2019b. doi:10.1109/ICME.2019.00113.
- NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 1–9, Santorini, Fira Greece, July 2009. ACM. ISBN 978-1-60558-480-5. doi:10.1145/1646396.1646452.
- Integrating Convolutional Neural Networks and Probabilistic Graphical Modeling for Epileptic Seizure Detection in Multichannel EEG. In A. C. S. Chung, J. C. Gee, P. A. Yushkevich, and S. Bao, editors, Information Processing in Medical Imaging, volume 11492, pages 291–303. Springer International Publishing, Cham, 2019. ISBN 978-3-030-20350-4 978-3-030-20351-1. doi:10.1007/978-3-030-20351-1_22. Series Title: Lecture Notes in Computer Science.
- Multi-label classification with cutset networks. In A. Antonucci, G. Corani, and C. P. Campos, editors, Proceedings of the Eighth International Conference on Probabilistic Graphical Models, volume 52 of Proceedings of Machine Learning Research, pages 147–158, Lugano, Switzerland, 06–09 Sep 2016. PMLR.
- The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2):303–338, June 2010. ISSN 0920-5691, 1573-1405. doi:10.1007/s11263-009-0275-4.
- Pyslowfast. https://github.com/facebookresearch/slowfast, 2020.
- Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211, October 2019.
- Transfer learning for boosted relational dependency networks through genetic algorithm. In International Conference on Inductive Logic Programming, pages 125–139. Springer, 2021.
- Dependency networks based classifiers: learning models by using independence test. In Third European Workshop on Probabilistica Graphical Models (PGM06), pages 115–122. Citeseer, 2006.
- Robust classification using mixtures of dependency networks. In Proceedings of the Fourth European Workshop on Probabilistic Graphical Models, pages 129–136, 2008.
- Using Piecewise Linear Functions for Solving MINLPs. In J. Lee and S. Leyffer, editors, Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and Its Applications, pages 287–314, New York, NY, 2012. Springer. ISBN 978-1-4614-1927-3. doi:10.1007/978-1-4614-1927-3_10.
- Linear and Nonlinear Optimization. SIAM, Society for Industrial and Applied Mathematics, Philadelphia, 2. ed edition, 2009. ISBN 978-0-89871-661-0.
- Q. Guo and Q. Dou. Semantic Image Segmentation based on SegNetWithCRFs. Procedia Computer Science, 187:300–306, 2021. ISSN 18770509. doi:10.1016/j.procs.2021.04.066.
- X. Guo and Y. Weng. Deep Dependency Network for Multi-label Text Classification. In Y. Peng, Q. Liu, H. Lu, Z. Sun, C. Liu, X. Chen, H. Zha, and J. Yang, editors, Pattern Recognition and Computer Vision, Lecture Notes in Computer Science, pages 298–309, Cham, 2020. Springer International Publishing. ISBN 978-3-030-60636-7. doi:10.1007/978-3-030-60636-7_25.
- Y. Guo and S. Gu. Multi-label classification using conditional dependency networks. In Twenty-Second International Joint Conference on Artificial Intelligence, pages 1300–1305, 2011.
- Y. Guo and W. Xue. Probabilistic multi-label classification with sparse feature learning. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, page 1373–1379. AAAI Press, 2013. ISBN 9781577356332.
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023. URL https://www.gurobi.com.
- Image Crowd Counting Using Convolutional Neural Network and Markov Random Field. Journal of Advanced Computational Intelligence and Intelligent Informatics, 21(4):632–638, July 2017. ISSN 1883-8014, 1343-0130. doi:10.20965/jaciii.2017.p0632.
- Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1(Oct):49–75, 2000.
- Composing graphical models with neural networks for structured representations and fast inference. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 2954–2962, 2016.
- Joint Training of Generic CNN-CRF Models with Stochastic Optimization, 2016.
- End-to-end training of hybrid cnn-crf models for stereo. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1456–1465, 2017. doi:10.1109/CVPR.2017.159.
- D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. Adaptive computation and machine learning. MIT Press, 2009. ISBN 9780262013192.
- Adaptive graphical model network for 2d handpose estimation. In BMVC, 2019.
- Multi-label classification by mining label and instance correlations from heterogeneous information networks. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 614–622, Chicago Illinois USA, Aug. 2013. ACM. ISBN 978-1-4503-2174-7. doi:10.1145/2487575.2487577.
- Deep kalman filters. stat, 1050:25, 2015.
- M. Kumar. Converting some global optimization problems to mixed integer linear problems using piecewise linear approximations. Master’s thesis, University of Missouri–Rolla, 2007.
- Max-Margin Learning of Deep Structured Models for Semantic Segmentation. In P. Sharma and F. M. Bianchi, editors, Image Analysis, Lecture Notes in Computer Science, pages 28–40, Cham, 2017. Springer International Publishing. ISBN 978-3-319-59129-2. doi:10.1007/978-3-319-59129-2_3.
- A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials. Technical Report arXiv:1701.06805, arXiv, Jan. 2018. arXiv:1701.06805 [cs] type: article.
- Efficient structure learning of markov networks using l_1-regularization. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
- Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 64(3):130104, Mar. 2021. ISSN 1674-733X, 1869-1919. doi:10.1007/s11432-020-3118-7.
- FPGA Implementation for the Sigmoid with Piecewise Linear Fitting Method Based on Curvature Analysis. Electronics, 11(9):1365, Jan. 2022. ISSN 2079-9292. doi:10.3390/electronics11091365.
- A Limb-Based Graphical Model for Human Pose Estimation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 48(7):1080–1092, July 2018. ISSN 2168-2232. doi:10.1109/TSMC.2016.2639788. Conference Name: IEEE Transactions on Systems, Man, and Cybernetics: Systems.
- Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3194–3203, 2016.
- A Review of Piecewise Linearization Methods. Mathematical Problems in Engineering, 2013:e101376, Nov. 2013. ISSN 1024-123X. doi:10.1155/2013/101376.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Text multi-label learning method based on label-aware attention and semantic dependency. Multimedia Tools and Applications, 81(5):7219–7237, Feb. 2022. ISSN 1573-7721. doi:10.1007/s11042-021-11663-9.
- Deep convolutional neural fields for depth estimation from a single image. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5162–5170, 2015.
- J. Liu. Monte Carlo strategies in scientific computing. Springer Verlag, New York, Berlin, Heidelberg, 2008. ISBN 0-387-95230-6.
- Query2Label: A Simple Transformer Way to Multi-Label Classification, July 2021.
- D. Lowd. Closed-form learning of markov networks from dependency networks. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pages 533–542, 2012.
- D. Lowd and A. Shamaei. Mean Field Inference in Dependency Networks: An Empirical Study. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1):404–410, Aug. 2011. ISSN 2374-3468, 2159-5399. doi:10.1609/aaai.v25i1.7936.
- Join-graph propagation algorithms. Journal of Artificial Intelligence Research, 37:279–328, 2010.
- Unsupervised alignment of natural language instructions with video segments. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), Jun. 2014. doi:10.1609/aaai.v28i1.8939.
- Discriminative unsupervised alignment of natural language instructions with corresponding video segments. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 164–174, Denver, Colorado, May–June 2015. Association for Computational Linguistics. doi:10.3115/v1/N15-1017.
- J. Neville and D. Jensen. Collective classification with relational dependency networks. In Workshop on Multi-Relational Data Mining (MRDM-2003), page 77, 2003.
- Modular Graph Transformer Networks for Multi-Label Image Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10):9092–9100, May 2021. ISSN 2374-3468, 2159-5399. doi:10.1609/aaai.v35i10.17098.
- Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 915–924, Sydney NSW Australia, Aug. 2015. ACM. ISBN 978-1-4503-3664-2. doi:10.1145/2783258.2783302.
- Multi-layered Semantic Representation Network for Multi-label Image Classification, June 2021.
- Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics (TACL), 1:25–36, 2013.
- Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research, 239(1):32–45, 2014.
- A. G. Schwing and R. Urtasun. Fully Connected Deep Structured Networks. Technical report, arXiv, Mar. 2015.
- Local search strategies for satisfiability testing. In Cliques, Coloring, and Satisfiability, 1993. URL https://api.semanticscholar.org/CorpusID:3215289.
- Hollywood in homes: Crowdsourcing data collection for activity understanding. In European Conference on Computer Vision, pages 510–526. Springer, 2016.
- Thin-slicing network: A deep structured model for pose estimation in videos. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5563–5572, 2017.
- Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, Las Vegas, NV, USA, June 2016. IEEE. ISBN 978-1-4673-8851-1. doi:10.1109/CVPR.2016.308.
- Learning graph structure for multi-label image classification via clique generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4100–4109, June 2015.
- C. Tarantola and E. Blanc. Dependency networks and bayesian networks for web mining. WIT Transactions on Information and Communication Technologies, 28, 2002.
- Joint training of a convolutional network and a graphical model for human pose estimation. CoRR, abs/1406.2984, 2014.
- Graph Attention Networks. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
- High-dimensional graphical model selection using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized logistic regression. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
- A generative probabilistic model for multi-label classification. In 2008 Eighth IEEE International Conference on Data Mining, pages 628–637. IEEE, 2008.
- Semi-supervised dual relation learning for multi-label classification. IEEE Transactions on Image Processing, 30:9125–9135, 2021a.
- A novel reasoning mechanism for multi-label text classification. Information Processing & Management, 58(2):102441, Mar. 2021b. ISSN 03064573. doi:10.1016/j.ipm.2020.102441.
- Enhancing multi-label classification by modeling dependencies among labels. Pattern Recognition, 47(10):3405–3413, Oct. 2014. ISSN 00313203. doi:10.1016/j.patcog.2014.04.009.
- Learning label-specific features with global and local label correlation for multi-label classification. Applied Intelligence, 53(3):3017–3033, Feb. 2023. ISSN 1573-7497. doi:10.1007/s10489-022-03386-7.
- Deep graph pose: a semi-supervised deep graphical model for improved animal pose tracking. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6040–6052. Curran Associates, Inc., 2020.
- End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3073–3082, June 2016. doi:10.1109/CVPR.2016.335. ISSN: 1063-6919.
- Recurrent conditional random field for language understanding. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4077–4081, Florence, Italy, May 2014. IEEE. ISBN 978-1-4799-2893-4. doi:10.1109/ICASSP.2014.6854368.
- Generalized Belief Propagation. In Advances in Neural Information Processing Systems, volume 13. MIT Press, 2000.
- A probabilistic graphical model based on neural-symbolic reasoning for visual relationship detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10609–10618, 2022.
- Neural Probabilistic Graphical Model for Face Sketch Synthesis. IEEE Transactions on Neural Networks and Learning Systems, 31(7):2623–2637, July 2020. ISSN 2162-237X, 2162-2388. doi:10.1109/TNNLS.2019.2933590.
- Conditional Random Fields as Recurrent Neural Networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1529–1537, Santiago, Chile, Dec. 2015. IEEE. ISBN 978-1-4673-8391-2. doi:10.1109/ICCV.2015.179.
- Double Attention Based on Graph Attention Network for Image Multi-Label Classification. ACM Transactions on Multimedia Computing, Communications, and Applications, 19(1):1–23, Jan. 2023. ISSN 1551-6857, 1551-6865. doi:10.1145/3519030.
- Learning Deep Patch representation for Probabilistic Graphical Model-Based Face Sketch Synthesis. International Journal of Computer Vision, 129(6):1820–1836, June 2021. ISSN 0920-5691, 1573-1405. doi:10.1007/s11263-021-01442-2.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.