Consistent Injective Mappings in Pseudo‑RL Encoder–Decoder Training

Develop training procedures that consistently achieve injective mappings between a finite color set and a finite name set in the pseudo‑reinforcement learning setup where one language model encodes colors as names and another decodes names back to colors using in‑context rewards followed by supervised fine‑tuning.

Background

The authors design a pseudo‑RL setup with two closed‑source models (encoder/decoder) learning a mapping between five colors and seven names using in‑context reward and subsequent supervised fine‑tuning.

After initial iterations they only obtained injective mappings for a subset of colors and explicitly note that it remains to be determined how to consistently achieve injective mappings.

References

It remains to be determined how to consistently achieve injective mappings.

— Secret Collusion among Generative AI Agents: Multi-Agent Deception via Steganography (2402.07510 - Motwani et al., 2024) in Appendix, Additional Case Studies — Pseudo‑RL Optimisation

Consistent Injective Mappings in Pseudo‑RL Encoder–Decoder Training

Background

References

Related Problems