Optimizing Layout of Recursive Datatypes with Marmoset
Abstract: While programmers know that the low-level memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbons baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.
- Godmar Back. Datascript - a specification and scripting language for binary data. In Proceedings of the 1st ACM SIGPLAN/SIGSOFT Conference on Generative Programming and Component Engineering, GPCE ’02, page 66–77, Berlin, Heidelberg, 2002. Springer-Verlag.
- Bit-stealing made legal: Compilation for custom memory representations of algebraic data types. Proc. ACM Program. Lang., 7(ICFP), aug 2023. doi:10.1145/3607858.
- Dargent: A silver bullet for verified data layout refinement. Proc. ACM Program. Lang., 7(POPL), jan 2023. doi:10.1145/3571240.
- Cache-conscious structure definition. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI ’99, page 13–24, New York, NY, USA, 1999. Association for Computing Machinery. doi:10.1145/301618.301635.
- Cache-conscious structure layout. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI ’99, page 1–12, New York, NY, USA, 1999. Association for Computing Machinery. doi:10.1145/301618.301633.
- Using generational garbage collection to implement cache-conscious data placement. In Proceedings of the 1st International Symposium on Memory Management, ISMM ’98, page 37–48, New York, NY, USA, 1998. Association for Computing Machinery. doi:10.1145/286860.286865.
- Adam Chlipala. An optimizing compiler for a purely functional web-application language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, pages 10–21, New York, NY, USA, 2015. ACM. URL: http://doi.acm.org/10.1145/2784731.2784741, doi:10.1145/2784731.2784741.
- Floorplan: Spatial layout in memory management systems. In Proceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2019, page 81–93, New York, NY, USA, 2019. Association for Computing Machinery. doi:10.1145/3357765.3359519.
- Narcissus: Correct-by-construction derivation of decoders and encoders from binary formats. Proc. ACM Program. Lang., 3(ICFP), jul 2019. doi:10.1145/3341686.
- Pads: A domain-specific language for processing ad hoc data. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, page 295–304, New York, NY, USA, 2005. Association for Computing Machinery. doi:10.1145/1065010.1065046.
- The pads project: An overview. In Proceedings of the 14th International Conference on Database Theory, ICDT ’11, page 11–17, New York, NY, USA, 2011. Association for Computing Machinery. doi:10.1145/1938551.1938556.
- The essence of compiling with continuations. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, PLDI ’93, page 237–247, New York, NY, USA, 1993. Association for Computing Machinery. doi:10.1145/155090.155113.
- You can have it all: Abstraction and good cache performance. In Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2017, page 148–167, New York, NY, USA, 2017. Association for Computing Machinery. doi:10.1145/3133850.3133861.
- Safely abstracting memory layouts, 2019. URL: https://arxiv.org/abs/1901.08006, doi:10.48550/ARXIV.1901.08006.
- Concurrent data representation synthesis. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, page 417–428, New York, NY, USA, 2012. Association for Computing Machinery. doi:10.1145/2254064.2254114.
- Automatic pool allocation for disjoint data structures. In Proceedings of the 2002 Workshop on Memory System Performance, MSP ’02, page 13–24, New York, NY, USA, 2002. Association for Computing Machinery. doi:10.1145/773146.773041.
- Pads/ml: A functional data description language. In Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’07, page 77–83, New York, NY, USA, 2007. Association for Computing Machinery. doi:10.1145/1190216.1190231.
- Packet types: Abstract specification of network protocol messages. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’00, page 321–333, New York, NY, USA, 2000. Association for Computing Machinery. doi:10.1145/347059.347563.
- Cogent: certified compilation for a functional systems language. arXiv preprint arXiv:1601.05520, 2016.
- Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998.
- Everparse: Verified secure zero-copy parsers for authenticated message formats. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC’19, page 1465–1482, USA, 2019. USENIX Association.
- Marcell van Geest and Wouter Swierstra. Generic packet descriptions: Verified parsing and pretty printing of low-level data. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Type-Driven Development, TyDe 2017, page 30–40, New York, NY, USA, 2017. Association for Computing Machinery. doi:10.1145/3122975.3122979.
- Local: A language for programs operating on serialized data. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, page 48–62, New York, NY, USA, 2019. Association for Computing Machinery. doi:10.1145/3314221.3314631.
- Compiling Tree Transforms to Operate on Packed Representations. In Peter Müller, editor, 31st European Conference on Object-Oriented Programming (ECOOP 2017), volume 74 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1–26:29, Dagstuhl, Germany, 2017. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. URL: http://drops.dagstuhl.de/opus/volltexte/2017/7273, doi:10.4230/LIPIcs.ECOOP.2017.26.
- A verified protocol buffer compiler. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, page 222–233, New York, NY, USA, 2019. Association for Computing Machinery. doi:10.1145/3293880.3294105.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.