Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis

Avtorji

Kratka vsebina

In this paper, we propose a novel approach to exploring cross-linguistic connections, with a focus on false friends, using Large Language Model embeddings and graph databases. We achieve a classification performance on the Spanish-Portuguese false friend dataset of F1 = 83.81% using BERT and a multi-layer perceptron neural network. Furthermore, using advanced translation models to match words between vocabularies, we also construct a ground truth false friends dataset between Slovenian and Macedonian - two languages with significant historical and cultural ties. Subsequently, we construct a graph-based representation using a Neo4j database, wherein nodes correspond to words, and various types of edges capture semantic relationships between them.

Prenosi

Izdano

30.10.2024

Kako citirati

Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis. (2024). In Proceedings of the10th Student Computing Research Symposium (SCORES’24) (pp. 33-36). Univerzitetna založba Univerze v Mariboru. https://press.um.si/index.php/ump/catalog/book/886/chapter/147