Paper “StatBot.Swiss: Bilingual Open Data Exploration in Natural Language“ Accepted in “Findings of the ACL 2024” - The Major International Conference on Natural Language Processing
The paper “StatBot.Swiss: Bilingual Open DataExploration in Natural Language” by ZHAW researchers Farhad Nooralahzadeh, Yi Zhang and Kurt Stockinger has been accepted in “Findings of the ACL 2024” (Association of Computational Linguistics). ACL is considered as the most prestigious international conference on natural language processing research.
The paper is a collaboration between the ZHAW Institute of Computer Science, the Data Science Competence Center of the Swiss Federal Statistical Office and the Swiss Data Science Center. The paper releases the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL pairs over 35 big databases with varying level of complexity for both English and German. Experimental analysis illustrates that current Large Language Models struggle to generalize well in generating SQL queries on this novel bilingual dataset. The paper thus lays the foundation for new research efforts in the broader field of generative AI.
This publication is the output of the INODE4Statbot project, funded by the Swiss Federal Statistical Office.