Decide whether to normalise ChatGPT scores for abstract length in addition to field and year
Ascertain whether residual length-related bias exists in ChatGPT 4o-mini REF-style research quality scores after accounting for second-order effects (such as journal abstract policies and short-form article types), and determine whether normalising scores for abstract length, in addition to field and year, is warranted.
References
Finally, the abstract length factor found potentially indicates another ChatGPT bias, such as against articles in journals with stricter abstract length restrictions, but, from the discussion, it seems more likely that weaker articles are more likely to appear in journals that allow short abstracts or to be for shorter contributions, an acceptable second order effect. Again, more research is needed to investigate this and decide whether it would ever be appropriate to normalise ChatGPT scores for abstract length in addition to field and year.