Application of boosting in recommender systems

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

In today's digital era, recommender systems have gained a strong foothold, becoming an important tool for effectively managing information flows. Their demand is largely due to the dynamics of current society, namely information overload and the need to personalize data. With the expansion of the scope of application of recommendation algorithms, many non-standard cases appear, for which the use of classical approaches is not so effective. This paper examines one of these: a small number of objects with a relatively large number of users in conditions of high correlation between some objects. For modeling, it is proposed to use gradient boosting, a machine learning algorithm based on an ensemble of decision trees.

About the authors

M. A. Zharova

Moscow Institute of Physics and Technology (MIPT)

Author for correspondence.
Email: zharova.ma@phystech.edu
Russian Federation, Dolgoprudny, Moscow oblast

V. I. Tsurkov

Federal Research Center “Computer Science and Control”, Russian Academy of Sciences

Email: v.tsurkov@frccsc.ru
Russian Federation, Moscow

References

  1. Cano E., Morisio M. Hybrid Recommender Systems: A Systematic Literature Review // Intelligent Data Analysis. 2017. V. 21. P. 1487–1524.
  2. Al-bashiri H., Abdulhak M., Romli A., Hujainah F. Collaborative Filtering Recommender System: Overview and Challenges // J. Computational and Theoretical Nanoscience. 2017. V. 23. P. 9045–9049.
  3. Jahrer M., Toscher A. Collaborative Filtering Ensemble // J. Machine Learning Research. 2012. V. 18. P. 61–74.
  4. Ahn H., Kang H., Lee J. Selecting a Small Number of Products for Effective User Profiling in Collaborative Filtering // Expert Systems with Applications. 2010. V. 37. P. 3055–3062.
  5. Zharova M., Tsurkov V. Neural Network Approaches for Recommender Systems // J. Computer and Systems Sciences International. 2024. V. 62. P. 1048–1062.
  6. Castells P., Moffat A. Offline Recommender System Evaluation: Challenges and New Directions // AI Magazine. 2022. V. 43. P. 225–238.
  7. Bokde D., Girase S., Mukhopadhyay D. Matrix Factorization Model in Collaborative Filtering Algorithms: A Survey // Procedia Computer Science. 2015. V. 49. P. 136–146.
  8. Filho T., Song H., Perello-Nieto M. Classifer Calibration: a Survey on How to Assess and Improve Predicted Class Probabilities // Machine Learning. 2023. P. 3211–3260.
  9. Alzubaidi L., Bai J., Al-Sabaawi A. A Survey on Deep Learning Tools Dealing with Data Scarcity: Definitions, Challenges, Solutions, Tips, and Applications // J. Big Data. 2023. V. 10. № 46.
  10. Grinsztajn L., Oyallon E., Varoquaux G. Why do Tree-based Models Still Outperform Deep Learning on Tabular Data? // arXiv:2207.08815v1, 2022.
  11. Alzubaidi L., Zhang J., Humaidi A. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions // arXiv:2207.08815v1, 2022.
  12. Borisov V., Leemann T., Sebler K. Deep Neural Networks and Tabular Data: A Survey // arXiv:2110.01889v3, 2022.
  13. Bentejac C., Csorgo A., Martinez-Munoz G. A Comparative Analysis of Gradient Boosting Algorithms // Artificial Intelligence Review. 2020. V. 54. P. 1937–1967.
  14. Sahour H., Gholami V., Torkaman J. Random Forest and Extreme Gradient Boosting Algorithms for Streamflow Modeling Using Vessel Features and Tree-rings // Environmental Earth Sciences. 2021. V. 80. № 747.
  15. Имплементация модели LightGBM на Python // GitHub. Microsoft LightGBM: webcite https://github.com/microsoft/LightGBM (accessed: 10.07.2024).
  16. Имплементация модели XGBoost на Python // GitHub. Distributed (Deep) Machine Learning Community XGBoost: webcite https://github.com/dmlc/xgboost (accessed: 10.07.2024).
  17. Имплементация модели CatBoost на Python // GitHub. CatBoost: webcite https://github.com/catboost/catboost (accessed: 10.07.2024).
  18. Ke1 G., Meng Q., Finley T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree // Advances in Neural Information Processing Systems. 2017. P. 3146–3154.
  19. Эксперименты с моделью LightGBM // Kaggle. LightGBM experiments: webcite https://www.kaggle.com/code/prashant111/lightgbm-classifier-in-python (accessed: 10.07.2024).
  20. Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System // arXiv:1603.02754v3, 2016.
  21. Dorogush A., Prokhorenkova L., Gusev G. CatBoost: Unbiased Boosting with Categorical Features // arXiv:1706.09516v5, 2019.
  22. Pargentn F., Pfisterer F., Thomas J., Bischl D. Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning with High Cardinality Features // Computational Statistics. 2022. V. 37. P. 2671–2692.
  23. Niculescu-Mizil A., Caruana R. Predicting Good Probabilities with Supervised Learning // Machine Learning, Proc. 22nd Intern. Conf. (ICML). Bonn, Germany, 2005. P. 625–632.
  24. Guo C., Pleiss G., Sun Y., Weinberger K. On Calibration of Modern Neural Networks // arXiv:1706.04599v2, 2017.
  25. Barlow R., Bartholomew D., Bremner J., Brunk H. Statistical Inference under Order Restrictions: The Theory and Application of Isotonic Regression // Royal Statistical Society. Series A: General. 1974. V. 137. P. 92–93.
  26. Zadrozny B., Elkan C. Transforming Classifier Scores into Accurate Multiclass Probability Estimates // The Eighth ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining. Edmonton, 2002.
  27. Platt J. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods // Advances in Large Margin Classifiers. Cambridge: MIT Press, 2000. P. 61–74.
  28. Zadrozny B., Elkan C. Transforming Classifier Scores into Accurate Multiclass Probability Estimates // Proc. 8th ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining. N. Y., 2002. P. 694–699.
  29. Guo C., Pleiss G., Sun Y., Weinberger K. On Calibration of Modern Neural Networks // Machine Learning, Proc. 34th Intern. Conf. (ICML). Sydney, 2017.
  30. Gupta C., Ramdas A. Distribution-free Calibration Guarantees for Histogram Binning without Sample Splitting // arXiv:2105.04656v2, 2021.
  31. Naeini M., Cooper G. Binary Classifier Calibration Using an Ensemble of Piecewise Linear Regression Models // Knowledge and Information Systems. 2018. V. 54. P. 151–170.
  32. Filho T., Song H., Perello-Nieto M. Classifier Calibration: a Survey on How to Assess and Improve Predicted Class Probabilities // Machine Learning. 2023. V. 112. P. 3211–3260.
  33. Wang H., Liang Q., Hancock J., Khoshgoftaar T. Feature Selection Strategies: a Comparative Analysis of SHAP-value and Importance-based Methods // J. Big Data. 2024. V. 11. № 44.
  34. Gebreyesus Y., Dalton D., Nixon S., Chiara D., Chinnic M. Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP) // Future Internet. 2023. V. 15. № 88.
  35. Имплементация библиотеки для подбора гиперпараметров Optuna на Python // GitHub. Optuna: webcite https://github.com/optuna/ optuna (accessed: 20.07.2024).

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Russian Academy of Sciences