Designing AGI Benchmarks
Identify tasks that can truly measure Artificial General Intelligence capabilities and ascertain whether human values should serve as the basis for constructing AGI benchmark tests or whether alternative perspectives are more appropriate, in order to guide the development of suitable AGI benchmarks.
References
As we discussed earlier, while all tasks can potentially serve as evaluation tools for LLMs, the question remains as to which can truly measure AGI capabilities. Nonetheless, there remains a plethora of unresolved issues. For instance, does it make sense to use human values as a starting point for test construction, or should alternative perspectives be considered? Developing suitable AGI benchmarks presents many open questions demanding further exploration.