Game-theory behaviour of large language models: The case of Keynesian beauty contests

Authors

DOI:

https://doi.org/10.18559/ebr.2025.2.2182

Keywords:

economic games, large language models, strategic interactions

Abstract

The growing adoption of large language models (LLMs) presents potential for deeper understanding of human behaviours within game theory frameworks. This paper examines strategic interactions among multiple types of LLM-based agents in a classical beauty contest game. LLM-based agents demonstrate varying depth of reasoning that fall within a range of level-0 to 1, which are lower than experimental results conducted with human subjects in previous literature, but they display similar convergence pattern towards Nash Equilibrium choice in repeated setting. Through simulations that varies the group composition of agent types, I found that environment with lower strategic uncertainty enhances convergence for LLM-based agents, and environments with mixed strategic types accelerate convergence for all. Results with simulated agents not only convey insights on potential human behaviours in competitive setting, they also offer valuable understanding of strategic interactions among algorithms.

JEL Classification

Computational Techniques • Simulation Modeling (C63)
General (C70)
General (C90)

Downloads

Download data is not yet available.

References

Aher, G. V., Arriaga, R. I., & Kalai, A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. International Conference on Machine Learning, 337–371. https://proceedings.mlr.press/v202/aher23a.html
View in Google Scholar

Akata, E., Schulz, L., Coda-Forno, J., Oh, S. J., Bethge, M., & Schulz, E. (2023). Playing repeated games with large language models. arXiv preprint arXiv:2305.16867.
View in Google Scholar

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351. DOI: https://doi.org/10.1017/pan.2023.2
View in Google Scholar

Bauer, K., Liebich, L., Hinz, O., & Kosfeld, M. (2023). Decoding gpt’s hidden ‘rationality’of cooperation. DOI: https://doi.org/10.2139/ssrn.4576036
View in Google Scholar

Bosch-Domenech, A., Montalvo, J. G., Nagel, R., & Satorra, A. (2002). One, two,(three), infinity,. . . : Newspaper and lab beauty-contest experiments. American Economic Review, 92(5), 1687–1701. DOI: https://doi.org/10.1257/000282802762024737
View in Google Scholar

Brown, Z. Y., & MacKay, A. (2023). Competition in pricing algorithms. American Economic Journal: Microeconomics, 15(2), 109–156. DOI: https://doi.org/10.1257/mic.20210158
View in Google Scholar

Camerer, C. F., Ho, T. - H., & Chong, J.- K. (2004). A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3), 861–898. DOI: https://doi.org/10.1162/0033553041502225
View in Google Scholar

Chen, L., Mislove, A., & Wilson, C. (2016). An empirical analysis of algorithmic pricing on amazon marketplace. Proceedings of the 25th international conference on World Wide Web, 1339–1349. DOI: https://doi.org/10.1145/2872427.2883089
View in Google Scholar

Chen, Y., Liu, T. X., Shan, Y., & Zhong, S. (2023). The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences, 120(51), e2316205120. DOI: https://doi.org/10.1073/pnas.2316205120
View in Google Scholar

Coricelli, G., & Nagel, R. (2009). Neural correlates of depth of strategic reasoning in medial prefrontal cortex. Proceedings of the National Academy of Sciences, 106(23), 9163–9168. DOI: https://doi.org/10.1073/pnas.0807721106
View in Google Scholar

Costa-Gomes, M. A., & Weizsäcker, G. (2008). Stated beliefs and play in normal-form games. The Review of Economic Studies, 75(3), 729–762. DOI: https://doi.org/10.1111/j.1467-937X.2008.00498.x
View in Google Scholar

Devetag, G., Di Guida, S., & Polonio, L. (2016). An eye-tracking study of feature-based choice in one-shot games. Experimental Economics, 19, 177–201. DOI: https://doi.org/10.1007/s10683-015-9432-5
View in Google Scholar

Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can ai language models replace human participants? Trends in Cognitive Sciences. DOI: https://doi.org/10.1016/j.tics.2023.04.008
View in Google Scholar

Fan, C., Chen, J., Jin, Y., & He, H. (2023). Can large language models serve as rational players in game theory? a systematic analysis. arXiv preprint arXiv:2312.05488. DOI: https://doi.org/10.1609/aaai.v38i16.29751
View in Google Scholar

Guo, F. (2023). Gpt in game theory experiments. arXiv:2305.05516.
View in Google Scholar

Guo, S., Bu, H., Wang, H., Ren, Y., Sui, D., Shang, Y., & Lu, S. (2024). Economics arena for large language models. arXiv preprint arXiv:2401.01735.
View in Google Scholar

Hamill, L. and Gilbert, N. (2015). Agent-based modelling in economics. John Wiley & Sons. DOI: https://doi.org/10.1002/9781118945520
View in Google Scholar

Horton, J. J. (2023). Large language models as simulated economic agents: What can we learn from homo silicus? (Tech. rep.). National Bureau of Economic Research. DOI: https://doi.org/10.3386/w31122
View in Google Scholar

HuggingFace. (2022). Illustrating reinforcement learning from human feedback (rlhf). Retrieved January 31, 2024, from https://huggingface.co/blog/rlhf
View in Google Scholar

Huijzer, R., & Hill, Y. (2023, January). Large language models show human behavior. DOI: https://doi.org/10.31234/osf.io/munc9
View in Google Scholar

Ireson, J., & Hallam, S. (1999). Raising standards: Is ability grouping the answer? Oxford review of education, 25(3), 343–358. DOI: https://doi.org/10.1080/030549899104026
View in Google Scholar

Kalton, G., & Schuman, H. (1982). The effect of the question on survey responses: A review. Journal of the Royal Statistical Society Series A: Statistics in Society, 145(1), 42–57. DOI: https://doi.org/10.2307/2981421
View in Google Scholar

Keynes, J. M. (1936). The general theory of interest, employment and money. Macmillan and Co, Limited.
View in Google Scholar

Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083.
View in Google Scholar

Liem, G. A. D., Marsh, H. W., Martin, A. J., McInerney, D. M., & Yeung, A. S. (2013). The big-fish-little-pond effect and a national policy of within-school ability streaming: Alternative frames of reference. American Educational Research Journal, 50(2), 326–370. DOI: https://doi.org/10.3102/0002831212464511
View in Google Scholar

Mauersberger, F., & Nagel, R. (2018). Levels of reasoning in keynesian beauty contests: A generative framework. In Handbook of computational economics (pp. 541–634, Vol. 4). Elsevier. DOI: https://doi.org/10.1016/bs.hescom.2018.05.002
View in Google Scholar

Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A turing test of whether ai chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), e2313925121. DOI: https://doi.org/10.1073/pnas.2313925121
View in Google Scholar

Nagel, R. (1995). Unraveling in guessing games: An experimental study. The American economic review, 85(5), 1313–1326. https://www.jstor.org/stable/2950991
View in Google Scholar

Nagel, R., Bühren, C., & Frank, B. (2017). Inspired and inspiring: Hervé moulin and the discovery of the beauty contest game. Mathematical Social Sciences, 90, 191–207. DOI: https://doi.org/10.1016/j.mathsocsci.2016.09.001
View in Google Scholar

OpenAI. (2024). How chatgpt and our language models are developed. Retrieved January 18, 2024, from https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed
View in Google Scholar

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
View in Google Scholar

Phelps, S., & Russell, Y. I. (2023). Investigating emergent goal-like behaviour in large language models using experimental economics. arXiv preprint arXiv:2305.07970.
View in Google Scholar

Sclar, M., Choi, Y., Tsvetkov, Y., & Suhr, A. (2023). Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324.
View in Google Scholar

Strachan, James WA and Albergo, Dalila and Borghini, Giulia and Pansardi, Oriana and Scaliti, Eugenio and Gupta, Saurabh and Saxena, Krati and Rufo, Alessandro and Panzeri, Stefano and Manzi, Guido., et al. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295. DOI: https://doi.org/10.1038/s41562-024-01882-z
View in Google Scholar

Trality. (2024). Crypto trading bots: The ultimate beginner’s guide. Retrieved January 23, 2024, from https://www.trality.com/blog/crypto-trading-bots
View in Google Scholar

Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. science, 211(4481), 453–458. DOI: https://doi.org/10.1126/science.7455683
View in Google Scholar

Webb, T., Holyoak, K. J., & Lu, H. (2023). Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9), 1526–1541. DOI: https://doi.org/10.1038/s41562-023-01659-w
View in Google Scholar

Downloads

Published

2025-07-01

Issue

Section

Research article- regular issue

How to Cite

Lu, S. E. (2025). Game-theory behaviour of large language models: The case of Keynesian beauty contests. Economics and Business Review, 11(2), 119-148. https://doi.org/10.18559/ebr.2025.2.2182

Similar Articles

21-30 of 232

You may also start an advanced similarity search for this article.