Unknown production model weights underlying commercial agents (LeChat and ChatGLM)

Ascertain the specific neural network weights of the models deployed in the commercial LLM agents Mistral’s LeChat and ChatGLM to enable precise replication and assessment of transferability from open-weight counterparts.

Background

The paper evaluates optimization-based adversarial prompts by computing them on open-weight models and transferring them to production agents. To choose targets, the authors rely on the existence of similar open-weight LLMs, because their optimization pipeline requires gradient access. They then report transfer experiments on Mistral’s LeChat and ChatGLM.

However, the exact architectures and weights of the proprietary models used in these agents are not publicly disclosed. The authors explicitly state that they do not know the deployed weights, and only infer similarity to Mistral-Nemo-0714 (12B) and GLM-4-9B based on observed behavior and product lineage. Precisely identifying the deployed weights would improve reproducibility, help explain observed transfer performance, and clarify the boundary between open-weight surrogates and production systems.

References

We do not know the weights of the specific models being used in these agents, but it is reasonable to deduce that they are not too different from the open-weight releases - Mistral-Nemo-0714 (12B) [49] and GLM-4-9B [7].

Imprompter: Tricking LLM Agents into Improper Tool Use  (2410.14923 - Fu et al., 2024) in Section 5: Evaluation, Choice of Target LLMs and Agents