Evaluation of the Progress of Generative Artificial Intelligence in Creating Higher Mathematics Tasks: A Comparative Study
Synopsis
Generative artificial intelligence (GenAI) brings significant changes to the educational process, including the automation of knowledge assessment preparation. In this article, we discuss the use of GenAI tools in designing tasks for higher mathematics, where accuracy is of key importance. Although GenAI can save educators time in compiling exams, the technology requires critical judgment due to its tendency to "hallucinate" - generating convincing but incorrect information. In our study, we compared five leading artificial intelligence tools (ChatGPT, DeepSeek, Gemini, Copilot, Grok) using five complex math problems as examples. The longitudinal comparison adds particular value to the study: we analyzed the answers obtained in the first phase of the study in April 2025 and compared them with the results of the same tools in January 2026. The aim of the paper is to show the degree of progress of the tools and to justify the need for a "human-in-the-loop" approach in pedagogical practice.
Downloads
Pages
Published
Categories
- Economics
- Logistics
- Mathematics
- Entrepreneurship
- Bussiness
- Computer Science and Informatics
- Sociology
- Mechanical Engineering
- Tourism
- Organizational Sciences
- Criminal Justice and Security
- Ecology
- Educational sciences
- Health Sciences
- 2026
- Conference proceedings
- Open Access
- University of Maribor, Faculty of Organizational Sciences
- Slovene language
- English language
- Multilingual






