Open questions in designing AGI benchmarks
Ascertain whether the construction of Artificial General Intelligence (AGI) evaluation benchmarks should use human values as the starting point for test design or adopt alternative, non-human-centric perspectives, to guide the development of suitable AGI benchmarks that meaningfully assess AGI capabilities.
References
For instance, does it make sense to use human values as a starting point for test construction, or should alternative perspectives be considered? Developing suitable AGI benchmarks presents many open questions demanding further exploration.
— A Survey on Evaluation of Large Language Models
(2307.03109 - Chang et al., 2023) in Subsection “Designing AGI Benchmarks,” Section 7 (Grand Challenges and Opportunities for Future Research)