TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling

9 March 2021

Pedro Mendes IST / INESC-ID

This work introduces TrimTuner – the first system for optimizing machine learning jobs in the cloud by exploiting sub-sampling techniques to reduce the cost of the optimization process, while keeping into account user-specified constraints. TrimTuner jointly optimizes the cloud and application-specific parameters and, unlike state of the art works for cloud optimization, eschews the need to train the model with the full training set every time a new configuration is sampled. Indeed, by leveraging sub-sampling techniques and datasets that are up to 60x smaller than the original one, we show that TrimTuner can reduce the cost of the optimization process by up to 50x. Further, TrimTuner speeds-up the recommendation process by 65x with respect to state of the art techniques for hyperparameter optimization that use sub-sampling techniques. The reasons for this improvement are twofold: i) a novel domain specific heuristic that reduces the number of configurations for which the acquisition function has to be evaluated; ii) the adoption of an ensemble of decision trees that enables boosting the speed of the recommendation process by one additional order of magnitude.

 



TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling banner Pedro Mendes is a doctoral student of Computer Science and Engineering at Instituto Superior Técnico (IST) - Universidade de Lisboa, being advised by Prof. Paolo Romano. His research and main areas of interest are focused on Distributed Systems, Cloud Computing, Virtualization, Optimization, Machine Learning, Computer Networks, and Artificial Intelligence (AI). The current work was developed during his master's thesis and presented last year at the international conference MASCOTS 2020. Currently, Pedro is working on a research project that aims at improving the efficiency of AI platforms while ensuring compliance with real-time constraints during the training and inference phases of machine learning models on the cloud. This work is developed in the context of the CAMELOT project.