The cost of AI/ML experiments has a significant impact on the ROI of the project. By using serverless deployment of Machine learning applications, we can significantly reduce the running cost of the project. Some of the ML models are large in size (250MB+). Serving these models requires specialised hardware, like a GPU. But if you run it on a CPU server, the time required for inference — increases exponentially.
Aakash Gupta explores the concept of quantization, which reduces a 250MB model to less than 10MB in size. We can now deploy it as a serverless function (AWS Lambda or Cloud Function). This can save up to 90% of the project’s running cost. And save thousands of dollars while improving the RoI of the project.
Aakash gives an overview of the points in this presentation:
- What is the serverless deployment of models
- How we can use AWS lambda to deploy large SOTA models
- Quantization of ML models
- Drawbacks of the approach – pros & cons
About Aakash Gupta
Aakash Gupta has rich & diversified experience in building & mentoring Data Science teams. He helps in designing innovative data strategies for new-age startups. He is CEO and Co-Founder of Think Evolve Consultancy which enables startups and enterprises to make data-driven business decisions. Prior to his entrepreneurship journey, he was part of the leadership team and VP (Data Science) at Edelweiss (an NBFC based out of Mumbai, India). He was instrumental in setting up their CoE in AI/ML and evangelised Data Science-based products across different industry verticals. He has also worked in various senior roles in the BFSI sectors. He is an alumnus of IIM-Indore (Planet-I), multiple hackathon winners, Kaggle Award winner, and an open-source advocate.