Yandex is the largest IT company in Russia. CatBoost is a popular machine learning library that uses gradient boosted decision trees models. It allows to train models on tabular data with different kinds of features: numeric, categorical, and textual, as well as embeddings, while providing good quality even with default parameters.
In this presentation, we introduce CatBoost distributed training on Spark.
We will discuss the key features, the overall architecture and also present some benchmarks.
Join our talk, if you:
• have a lot of data on hand;
• are using or planning to start using Spark clusters for data processing;
• need to use distributed training for your tasks
The talk will consist of approximately 30 minute long video presentation followed by a live chat discussion. Videos from the presentation will be published in advance on our YouTube channel on March 18 and 19.
The event will take place online on March 23, 10am PST (9pm GMT+3 for Moscow). Here you need to register.