Automating Machine Learning Training Model
Submitted By Jenkins User Abhinav Dubey
MLOps & DevOps intern designs a system that can train machine learning & deep learning models to achieve the desired accuracy
Organization: DevOps Internship with a Software Company
Industry: Machine Learning
Programming Language: Shell Scripting
Platform: Docker or Kubernetes, Linux, macOS, Windows
Version Control System: GitHub
Build Tool: Git
Community Support: Spoke with colleagues and peers
Automating MLOps accuracy training using Jenkins and
a "Containerization within a container" concept.
Background: Having been involved in machine learning for a few months, the biggest challenge I faced was while training a model to achieve a certain, required accuracy. To do so, I needed to train the model again and again. I would do this by manipulating the number of epochs, adding layers, etc. For a number of reasons, I felt that this would be a challenge for almost every machine learning / deep learning engineer, especially because training a model takes all your system resources. I believed that if your system is not capable, diverting all system resources may cause serious overheating issues and performance lag.
Goals: Simplifying the tasks that Machine Learning engineers typically faced while training bigger models.
Solution & Results:
The project that I built is an Accuracy Achiever. With the help of Jenkins, I built a setup such that I just only need to commit my code to GitHub from my development environment and everything will be done by Jenkins with just one click. The code is written in such a way that it will automatically manipulate the code of my training model as well as within Docker containers.
I have tried to achieve “containerization within a container” by using some of the Docker commands, but before launching another container within this container I would need to install and configure Docker inside the container. I can do it manually by running Docker exec commands for running Docker installation commands, but I have automated it by using Jenkins.
I have used Docker containers for training my model separately so that it could be made global and wouldn’t exhaust my system resources while training. I have used Jenkins to integrate different tools like Git, Docker, and my local development environment to create something which can be used by any developer, whether be it be a company employee or a beginner in the ML domain.
Also with the help of Jenkins, I made a monitoring system in the same project which continuously checks whether the models which are training in containers are running or have failed. If failed, then automatically it will launch another one and send an email notifying the developer.
Jenkins is a vast solution which has many features and plugins, but if you know the core concepts of any technology, just by the basics itself you can do a lot in this domain. While I didn’t use any extraneous plugins, I did use the Git plugin, build pipeline, and email plugin. That’s all it took, along with shell scripting and automatic job triggers, to make it completely automated.
The results? I first achieved an accuracy level of 84%, but my desired accuracy was above 85%. So the system automatically added more layers and trained my model until it achieved an accuracy of 89%. On top of that, I walked away with:
- everything automated
- model training becomes easier
- a more efficient method that allows me to better use my system resources