Judgements of research co-created by generative AI: experimental evidence

Paweł Niszczota; Paul Conway

doi:10.18559/ebr.2023.2.744

Authors

Paweł Niszczota Poznań University of Economics and Business, Humans & AI Laboratory (HAI Lab), Department of International Finance, Poznań, Poland https://orcid.org/0000-0002-4150-3646
Paul Conway University of Southampton, Department of Psychology, Southampton, United Kingdom https://orcid.org/0000-0003-4649-6008

DOI:

https://doi.org/10.18559/ebr.2023.2.744

Keywords:

experiment, generative AI, large language models, ChatGPT, metascience, trust in science

Abstract

The introduction of ChatGPT has fuelled a public debate on the appropriateness of using generative AI (large language models; LLMs) in work, including a debate on how they might be used (and abused) by researchers. In the current work, we test whether delegating parts of the research process to LLMs leads people to distrust researchers and devalues their scientific work. Participants (N = 402) considered a researcher who delegates elements of the research process to a PhD student or LLM and rated three aspects of such delegation. First, they rated whether it is morally appropriate to do so. Secondly, they judged whether – after deciding to delegate the research process – they would trust the scientist (that decided to delegate) to oversee future projects. Thirdly, they rated the expected accuracy and quality of the output from the delegated research process. Our results show that people judged delegating to an LLM as less morally acceptable than delegating to a human (d = -0.78). Delegation to an LLM also decreased trust to oversee future research projects (d = -0.80), and people thought the results would be less accurate and of lower quality (d = -0.85). We discuss how this devaluation might transfer into the underreporting of generative AI use.

Downloads

Download data is not yet available.

References

Alper, S., & Yilmaz, O. (2020). Does an abstract mind-set increase the internal consistency of moral attitudes and strengthen individualizing foundations? Social Psychological and Personality Science, 11(3), 326–335. https://doi.org/10.1177/1948550619856309
View in Google Scholar DOI: https://doi.org/10.1177/1948550619856309

American Psychological Association. (2019). Publication manual of the American Psychological Association (7th ed.). APA.
View in Google Scholar

Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixedeffects models using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01
View in Google Scholar DOI: https://doi.org/10.18637/jss.v067.i01

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3–5. https://doi.org/10.1177/1745691610393980
View in Google Scholar DOI: https://doi.org/10.1177/1745691610393980

Cargill, M., & O’Connor, P. (2021). Writing scientific research articles: Strategy and steps. John Wiley & Sons.
View in Google Scholar

Cha, Y. J., Baek, S., Ahn, G., Lee, H., Lee, B., Shin, J., & Jang, D. (2020). Compensating for the loss of human distinctiveness: The use of social creativity under Human–Machine comparisons. Computers in Human Behavior, 103, 80–90. https://doi.org/10.1016/j.chb.2019.08.027
View in Google Scholar DOI: https://doi.org/10.1016/j.chb.2019.08.027

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum. Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033
View in Google Scholar DOI: https://doi.org/10.1037/xge0000033

Dowling, M., & Lucey, B. (2023). ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters, 103662. https://doi.org/10.1016/j.frl.2023.103662
View in Google Scholar DOI: https://doi.org/10.1016/j.frl.2023.103662

Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., … Wright, R. (2023). “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642
View in Google Scholar DOI: https://doi.org/10.1016/j.ijinfomgt.2023.102642

Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models (arXiv:2303.10130). arXiv. https://doi.org/10.48550/arXiv.2303.10130
View in Google Scholar

Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. https://doi.org/10.1177/2515245919847202
View in Google Scholar DOI: https://doi.org/10.1177/2515245919847202

King, M. (2023). Can GPT-4 formulate and test a novel hypothesis? Yes and no. TechRxiv. https://doi.org/10.36227/techrxiv.22517278.v1
View in Google Scholar DOI: https://doi.org/10.36227/techrxiv.22517278

Korinek, A. (2023). Language models and cognitive automation for economic research. Working Paper, 30957. National Bureau of Economic Research. https://doi.org/10.3386/w30957
View in Google Scholar DOI: https://doi.org/10.3386/w30957

Korzynski, P., Mazurek, G., Altmann, A., Ejdys, J., Kazlauskaite, R., Paliszkiewicz, J., Wach, K., & Ziemba, E. (2023). Generative Artificial Intelligence as a new context for management theories: Analysis of ChatGPT. Central European Management Journal, 31(1). https://doi.org/10.1108/CEMJ-02-2023-0091
View in Google Scholar DOI: https://doi.org/10.1108/CEMJ-02-2023-0091

Kung, T. H., Cheatham, M., ChatGPT, Medenilla, A., Sillos, C., Leon, L. D., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., & Tseng, V. (2022). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. medRxiv. https://doi.org/10.1101/2022.12.19.22283643
View in Google Scholar DOI: https://doi.org/10.1101/2022.12.19.22283643

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26.
View in Google Scholar DOI: https://doi.org/10.18637/jss.v082.i13

OpenAI. (2022, November 30). ChatGPT: Optimizing language models for dialogue. OpenAI. https://openai.com/blog/chatgpt/
View in Google Scholar

OpenAI. (2023). GPT-4 technical report (arXiv:2303.08774). arXiv. https://doi.org/10.48550/arXiv.2303.08774
View in Google Scholar

Palan, S., & Schitter, C. (2018). Prolific.ac—A subject pool for online experiments. Journal of Behavioral and Experimental Finance, 17, 22–27. https://doi.org/10.1016/j.jbef.2017.12.004
View in Google Scholar DOI: https://doi.org/10.1016/j.jbef.2017.12.004

Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology, 70, 153–163. https://doi.org/10.1016/j.jesp.2017.01.006
View in Google Scholar DOI: https://doi.org/10.1016/j.jesp.2017.01.006

Peer, E., Rothschild, D., Gordon, A., Evernden, Z., & Damer, E. (2022). Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 54(4), 1643–1662. https://doi.org/10.3758/s13428-021-01694-3
View in Google Scholar DOI: https://doi.org/10.3758/s13428-021-01694-3

Satariano, A. (2023, March 31). ChatGPT is banned in Italy over privacy concerns. The New York Times. https://www.nytimes.com/2023/03/31/technology/chatgpt-italy-ban.html
View in Google Scholar

Stokel-Walker, C. (2023). ChatGPT listed as author on research papers: Many scientists disapprove. Nature, 613(7945), 620–621. https://doi.org/10.1038/d41586-023-00107-z
View in Google Scholar DOI: https://doi.org/10.1038/d41586-023-00107-z

Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313–313. https://doi.org/10.1126/science.adg7879
View in Google Scholar DOI: https://doi.org/10.1126/science.adg7879

Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. (2023). Nature, 613(7945), 612–612. https://doi.org/10.1038/d41586-023-00191-1
View in Google Scholar DOI: https://doi.org/10.1038/d41586-023-00191-1

Wach, K., Duong, C. D., Ejdys, J., Kazlauskaitė, R., Korzynski, P., Mazurek, G., Paliszkiewicz, J., & Ziemba, E. (2023). The dark side of Generative Artificial Intelligence: A critical analysis of controversies and risks of ChatGPT. Entrepreneurial Business and Economics Review, 11(2), 7–24. https://doi.org/10.15678/EBER.2023.110201
View in Google Scholar DOI: https://doi.org/10.15678/EBER.2023.110201

Wang, S. H. (2023). OpenAI—explain why some countries are excluded from ChatGPT. Nature, 615(7950), 34–34. https://doi.org/10.1038/d41586-023-00553-9
View in Google Scholar DOI: https://doi.org/10.1038/d41586-023-00553-9

Wu, Y., Mou, Y., Li, Z., & Xu, K. (2020). Investigating American and Chinese subjects’ explicit and implicit perceptions of AI-generated artistic work. Computers in Human Behavior, 104, 106186. https://doi.org/10.1016/j.chb.2019.106186
View in Google Scholar DOI: https://doi.org/10.1016/j.chb.2019.106186

Judgements of research co-created by generative AI: experimental evidence

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Journal identifiers

Evaluation markers

Make a Submission

Licenses

Sciendo

Latest publications

Keywords