Papers
Topics
Authors
Recent
Search
2000 character limit reached

BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

Published 4 Apr 2024 in cs.LG, cs.AI, and stat.ML | (2404.03830v2)

Abstract: We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Dnf-net: A neural architecture for tabular data. arXiv preprint arXiv:2006.06465, 2020.
  2. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6679–6687, 2021.
  3. Conformal prediction for time series with modern hopfield networks. arXiv preprint arXiv:2303.12783, 2023.
  4. Lukas Biewald et al. Experiment tracking with weights and biases. Software available from wandb. com, 2:233, 2020.
  5. Deep neural networks and tabular data: A survey. CoRR, abs/2110.01889, 2021. URL https://arxiv.org/abs/2110.01889.
  6. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  7. Johannes Brandstetter. Blog post: Hopfield networks is all you need, 2021. URL https://ml-jku.github.io/hopfield-layers/. Accessed: April 4, 2023.
  8. A novel method for classification of tabular data using convolutional neural networks. BioRxiv, pages 2020–05, 2020.
  9. Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4):1–4, 2015.
  10. Adaptively sparse transformers. arXiv preprint arXiv:1909.00015, 2019.
  11. On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168:288–299, 2017.
  12. Cloob: Modern hopfield networks with infoloob outperform clip. Advances in neural information processing systems, 35:20450–20468, 2022.
  13. Benchmarking distribution shift in tabular data with tableshift. arXiv preprint arXiv:2312.07577, 2023.
  14. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34:18932–18943, 2021.
  15. On embeddings for numerical features in tabular deep learning. Advances in Neural Information Processing Systems, 35:24991–25004, 2022.
  16. Tabr: Unlocking the power of retrieval-augmented tabular deep learning, 2023.
  17. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems, 35:507–520, 2022.
  18. Deep residual learning for image recognition, 2015.
  19. Tabpfn: A transformer that solves small tabular classification problems in a second, 2023.
  20. Energy transformer. arXiv preprint arXiv:2302.07253, 2023.
  21. John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.
  22. John J Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10):3088–3092, 1984.
  23. On sparse modern hopfield model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2309.12673.
  24. Outlier-efficient hopfield layers for large transformer-based models. 2024a.
  25. Nonparametric modern hopfield models. 2024b.
  26. On computational limits of modern hopfield models: A fine-grained complexity analysis. arXiv preprint arXiv:2402.04520, 2024c. URL https://arxiv.org/abs/2402.04520.
  27. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678, 2020.
  28. Tangos: Regularizing tabular neural networks through gradient orthogonalization and specialization. arXiv preprint arXiv:2303.05506, 2023.
  29. Well-tuned simple nets excel on tabular datasets. Advances in neural information processing systems, 34:23928–23941, 2021.
  30. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
  31. Building transformers from neurons and astrocytes. bioRxiv, pages 2022–10, 2022.
  32. Dmitry Krotov. Hierarchical associative memory. arXiv preprint arXiv:2107.06446, 2021.
  33. Large associative memory problem in neurobiology and machine learning. arXiv preprint arXiv:2008.06996, 2020.
  34. Dense associative memory for pattern recognition. Advances in neural information processing systems, 29, 2016.
  35. Progressive transformation of hippocampal neuronal representations in “morphed” environments. Neuron, 48(2):345–358, 2005.
  36. Ptab: Using the pre-trained language model for modeling tabular data. arXiv preprint arXiv:2209.08060, 2022.
  37. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  38. When do neural nets outperform boosted trees on tabular data?, 2023.
  39. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023.
  40. Tabular transformers for modeling multivariate time series. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3565–3569. IEEE, 2021.
  41. History compression via language models in reinforcement learning. In International Conference on Machine Learning, pages 17156–17185. PMLR, 2022.
  42. Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702, 2019.
  43. Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312, 2019.
  44. Charley Presigny and Fabrizio De Vico Fallani. Colloquium: Multiscale modeling of brain network organization. Reviews of Modern Physics, 94(3):031002, 2022.
  45. Catboost: unbiased boosting with categorical features. Advances in neural information processing systems, 31, 2018.
  46. Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020.
  47. Improving few-and zero-shot reaction template prediction using modern hopfield networks. Journal of chemical information and modeling, 62(9):2111–2120, 2022.
  48. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342, 2021.
  49. Dynamic coding for cognitive control in prefrontal cortex. Neuron, 78(2):364–375, 2013.
  50. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
  51. Constantino Tsallis. Possible generalization of boltzmann-gibbs statistics. Journal of statistical physics, 52:479–487, 1988.
  52. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  53. Modern hopfield networks and attention for immune repertoire classification. Advances in Neural Information Processing Systems, 33:18832–18845, 2020.
  54. Non-holographic associative memory. Nature, 222(5197):960–962, 1969.
  55. Uniform memory retrieval with larger capacity for modern hopfield models. 2024a.
  56. STanhop: Sparse tandem hopfield model for memory-enhanced time series prediction. In The Twelfth International Conference on Learning Representations, 2024b. URL https://arxiv.org/abs/2312.17346.
  57. T2g-former: organizing tabular features into relation graphs promotes heterogeneous feature interaction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10720–10728, 2023.
  58. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023.
  59. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
Citations (13)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 14 likes about this paper.