Unknown resources for LLM inference hinder fair speed comparisons

Determine the computational resources required for inference by the large language models used in the FAMOSE experiments (AWS Bedrock Sonnet 3.5 V2 and Deepseek-R1) to enable fair and meaningful comparisons of algorithm inference times between LLM-based and classical feature engineering methods.

Background

The methodology section describes the experimental setup and notes that inference-time comparisons were avoided due to uncertainty about the resources required to run LLMs. This uncertainty could bias measured speeds and makes it difficult to compare LLM-based methods to classical methods on runtime.

Explicitly identifying the resource requirements for the LLMs used in FAMOSE (Sonnet 3.5 V2 and Deepseek-R1) would allow principled measurement and comparison of inference speed across methods.

References

Because the resources needed to run LLM inference are unknown, and could affect LLM speed, we do not compare algorithm inference times, especially the speed of LLM methods against classical methods.

FAMOSE: A ReAct Approach to Automated Feature Discovery  (2602.17641 - Burghardt et al., 19 Feb 2026) in Subsection LLMs and Resources