Methods of solving the problem of coreference and searching for noun phrases in natural languages

А. А. Kozlova; Козлова А. А.; I. D. Kudinov; Кудинов И. Д.; D. V. Lemtyuzhnikova; Лемтюжникова Д. В.

doi:10.31857/S0002338825010122

Методы решения задачи кореференции и поиска именных групп в естественных языках

Авторы: Козлова А.А.¹, Кудинов И.Д.¹, Лемтюжникова Д.В.¹
Учреждения:
1. Институт проблем управления им. В.А. Трапезникова РАН
Выпуск: № 1 (2025)
Страницы: 149-162
Раздел: ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ
URL: https://permmedjournal.ru/0002-3388/article/view/684564
DOI: https://doi.org/10.31857/S0002338825010122
EDN: https://elibrary.ru/AIJIND
ID: 684564

Цитировать

Полный текст

Открытый доступ
Доступ закрыт

Доступ предоставлен
Доступ закрыт

Только для подписчиков

Аннотация
Полный текст
Об авторах
Список литературы
Дополнительные файлы
Статистика

Аннотация

Кореференция – это задача области обработки естественных языков, направленная на связывание слов и фраз в тексте, которые указывают на один и тот же объект реального мира. Она применима при суммаризация текста, ответах на вопросы, информационном поиске и диалоговых системах. Приводится разбор существующих методов решения задачи кореференции, а также предлагается способ, основанный на применении двухэтапной модели машинного обучения. Языковая модель преобразует токены текста в векторные представления. Далее для каждой пары токенов на основе их векторных представлений вычисляется оценка вероятности нахождения этих токенов либо в одной именной группе, либо в двух кореферентных именных группах. Таким образом, метод одновременно производит поиск именных групп и предсказывает кореферентную связь между ними.

Ключевые слова

кореференция, обработка естественного языка, машинное обучение, языковые модели

Полный текст

Список литературы

Гируцкий А.А. Введение в языкознание. Минск: Высш. шк., 2022. ISBN 978-985-06-3430-6.
Chomsky N. Aspects of the Theory of Syntax. Cambridge: MIT press, 2014. № 11.
Nivre J., Zeman D., Ginter F., Tyers F. Universal Dependencies // 15th Conf. of the European Chapter of the Association for Computational Linguistics. Valencia, 2017.
Sukthanker R., Poria S., Cambria E., Thirunavukarasu R. Anaphora and Coreference Resolution: A Review // Information Fusion. 2020. V. 59. P. 139–162; https://doi.org/10.1016/j.inffus.2020.01.010
Soon W.M., Lim D.C.Y., Ng H.T. A Machine Learning Approach to Coreference Resolution of Noun Phrases // Computational Linguistics. 2001. V. 27. № 4. P. 521–544; https://doi.org/10.1162/089120101753342653
Toldova S., Ionov M. Coreference Resolution for Russian: The Impact of Semantic Features // Computational Linguistics and Intellectual Technologies. 2017. V. 1. № 16. P. 339–348.
Haghighi A., Klein D. Simple Coreference Resolution with Rich Syntactic and Semanticfeatures // Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2009. P. 1152–1161; https://doi.org/10.3115/1699648.1699661
Le. K., He L., Lewis M., Zettlemoyer L. End-to-end Neural Coreference Resolution // Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, 2017. P. 188–197; https://doi.org/10.18653/v1/d17-1018
Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. 1997. V. 9. № 8. P. 1735–1780; https://doi.org/10.1162/neco.1997.9.8.1735.
Olah C. Understanding LSTM Networks. 2015. [Электронный ресурс] URL: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Lee K., He L., Zettlemoyer L. Higher-order Coreference Resolution with Coarse-to-fine Inference // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 2018. V. 2. P. 687–692; https://doi.org/10.18653/v1/n18-2108
Le T.A., Petrov M.A., Kuratov Y.M., Burtsev M.S. Sentence Level Representation and Language Models in the Task of Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2019. V. 2. № 18. P. 364–373.
Shen T., Zhou T., Long G., Jiang J., Pan S., Zhang C. Disan: Directional Self-Attention Network for RnN/CNN-free Language Understanding // 32nd AAAI Conference on Artificial Intelligence (AAAI). 2018. P. 5446–5455.
Peng H., Khashabi D., Roth D. Solving Hard Coreference Problems // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 2015. P. 809–819; https://doi.org/10.3115/v1/n15-1082
Sysoev A.A. Coreference Resolution in Russian: State-of-the-Art // Approaches Application and Evolvement. 2017. V. 16. P. 327–347.
Toldova S.Ju., Roytberg A., Ladygina A.A. et al. RU-EVAL-2014: Evaluating Anaphora and Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2014. № 13. P. 681–694.
Bogdanov A.V., Dzhumaev S.S., Skorinkin D.A., Starostin A.S. Anaphora Analysis Based on ABBYY Compreno Linguistic Technologies // Computational Linguistics and Intellectual Technologies. 2014; https://doi.org/10.13140/2.1.2600.7688
Anisimovich K.V., Druzhkin K.Y., Zuev K.A. Syntactic and Semantic Parser Based on ABBYY Compreno Linguistic Technologies // Computational Linguistics and Intellectual Technologies. 2012. V. 11. № 18. P. 90–103.
Ionov M., Kutuzov A. The Impact of Morphology Processing Quality on Automated Anaphora Resolution for Russian. M., 2014. № 13. P. 232–241.
Kamenskaya M., Khramoin I., Smirnov I. et al. Data-driven Methods for Anaphora Resolution of Russian Texts // Computational Linguistics and Intellectual Technologies. 2014. P. 241–250.
Protopopova E.V., Bodrova A.A., Volskaya S.A. et al. Anaphoric Annotation and Corpus-based Anaphora Resolution: An Experiment // Computational Linguistics and Intellectual Technologies. 2014. № 13. P. 562–571.
Budnikov A.E., Toldova S.Y., Zvereva D.S. et al. Ru-eval-2019: Evaluating Anaphora and Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2019.
Vilain M., Burger J.D., Aberdeen J. et al. A Model-Theoretic Coreference Scoring Scheme // Conference on Message Understanding. Columbia: Association for Computational Linguistics, 1995. P. 45–52; https://doi.org/10.3115/1072399
Bagga A., Baldwin B. Algorithms for Scoring Coreference Chains // The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference. Citeseer. 1998. V. 1. P. 563–566.
Luo X. On Coreference Resolution Performance Metrics // Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language. Vancouver: Association for Computational Linguistics, 2005. P. 25–32; https://doi.org/10.3115/1220575.1220579
Pradhan S., Moschitti A., Xue N. et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes // Joint Сonference on EMNLP and CoNLL-shared task. Jeju Island, 2012. P. 1–40.
Moosavi N.S., Strube M. Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric // Proc. 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016. V. 1. P. 632–642; https://doi.org/10.18653/v1/P16-1060
Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // ArXiv preprint arXiv:1301.3781. 2013.
Olah C. Understanding LSTM Networks. 2015. [Электронный ресурс] URL: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Hochreiter S., Schmidhuber J. Long Short-term Memory // Neural computation. 1997. V. 9. P. 1735–1780; https://doi.org/10.1162/neco.1997.9.8.1735
Bahdanau D., Cho K., Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate // ArXiv preprint arXiv:1409.0473. 2014.
Luong M.-T., Pham H., Manning C.D. Effective Approaches to Attention Based Neural Machine Translation // ArXiv preprint arXiv:1508.04025. 2015.
Abadi M., Agarwal A., Barham et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. [Электронный ресурс] URL: https://www.tensorflow.org/
Abdaoui A., Pradel C., Sigel G. Load What You Need: Smaller Versions of Mutlilingual BERT // SustaiNLP / EMNLP. ArXiv:2010.05609. 2020.