Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis

Authors

Synopsis

In this paper, we propose a novel approach to exploring cross-linguistic connections, with a focus on false friends, using Large Language Model embeddings and graph databases. We achieve a classification performance on the Spanish-Portuguese false friend dataset of F1 = 83.81% using BERT and a multi-layer perceptron neural network. Furthermore, using advanced translation models to match words between vocabularies, we also construct a ground truth false friends dataset between Slovenian and Macedonian - two languages with significant historical and cultural ties. Subsequently, we construct a graph-based representation using a Neo4j database, wherein nodes correspond to words, and various types of edges capture semantic relationships between them.

Downloads

Published

October 30, 2024

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis. (2024). In Proceedings of the10th Student Computing Research Symposium (SCORES’24) (pp. 33-36). University of Maribor Press. https://press.um.si/index.php/ump/catalog/book/886/chapter/147