Build, Train, Deploy Machine Learning Models using AWS SageMaker
Introduction to AWS SageMaker.
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.
It can build models trained by data dumped into the S3 buckets, or a streaming data source like Kinesis shards. Once models are trained, SageMaker allows us to deploy them into production without any effort.
- Build step is done by connecting to other AWS services like S3 and transforming data in Amazon SageMaker notebooks.
- Train step is about using AWS SageMaker’s algorithms and frameworks, or bringing our own, for distributed training.
- Once training is completed, models can be deployed to Amazon SageMaker endpoints, for real-time or batch predictions.
Let’s start and I will be explaining stuffs on the go!
Step 1 : For creating an account and login information please refer to my blog on Build and Deploy a Machine Learning Model using AWS .
Step 2 : Open AWS Management Console and Search for Amazon SageMaker.
Step 3: Open Notebook Instances and click on Create notebook instance.
- It will prompt for Notebook Instance Settings. Give it a good name and choose the ml.t2.medium. That is the lowest and will serve our purpose for now. We can choose higher ones when working on “real” models. Don’t worry about the cost. AWS promises us to pay per use pricing.
- Leave the rest on the default and move forward.
- Next, we need to create an IAM role for our Notebook. Remember that our data and models need to live in an S3 bucket that we just created. So the Notebook should have access to that bucket. We do not need any other particular privilege for the Notebook.
- AWS takes some time to get the Notebook ready. We can see on the console that the Notebook Instance status is “Pending”.
- Let it work its way, as AWS provisions the underlying instance and prepares it for our purpose. Once it is ready, we can see the console says the Notebook instance is “InService”
- On the Notebook instance console, we can see two links in the Actions column — Open Jupyter and Open JupyterLab.
- Now we can create new file or upload files as per our need.
- Lets create a ipython file → Click on ‘New’ → Select ‘conda_python3’ since we will be creating a simple ML Project for our demo tutorial.
- After creating and renaming it with a meaningful name it looks like this.
Let’s start coding !
Step 1: Importing necessary libraries
Step 2: Creating S3 Bucket
- The SageMaker uses an S3 bucket to dump its model as it works. It is also convenient to dump the data and model into an S3 bucket as we train the model.
- We can view the S3 bucket created in AWS Console too.
- We can also see that initially the S3 Bucket is empty .
Step 3: Setting the output path where the trained model will be saved
Step 4: Downloading the Dataset and Storing it in S3 Bucket.
- In this tutorial we would focus more on build,train and deploy using AWS Sagemaker and not on the dataset and EDA.
Step 5: Train — Test Split and Mapping The path of the models in S3 [We always have to set the path to the bucket]
- Also remember that while using Amazon SageMaker , the dependent feature or target should be the 1st column.
- So just take dependent feature and concat it with the dataset by dropping the dependent feature from the dataset. Like this the dependent feature would come up as the 1st column.
- We can also view this in the Jupyter Instance too.
- We can view the folders created and train and test data in AWS Console too.
Step 6: Bulding Machine Learning Models.
- XGBoost is in-built algorithm in Amazon SageMaker, so we will use that in out demo .
- After training successfully , we can see that the trained model has been stored to the specified path.
- So, every time you run a new ML algorithm it will be saved in our S3 Bucket with new timestamp.
Step 7: Deployment using Amazon SageMaker.
- Here, we have used ml.m4.xlarge instance and it is powerful instance so don’t keep it running for longer time.
Step 8: Prediction of Test Data.
Step 9: Evaluating the model using Confusion Matrix
Step 10: Delete all the end points
- Whatever objects, folders and everything which was created with respect to model file in S3 Bucket will get deleted and thus we will not be charged by Amazon.
- After deleting the end points we can see that the bucket is again empty.
- AWS SageMaker has been a great deal for most data scientists who would want to accomplish a truly end-to-end ML solution. It takes care of abstracting a ton of software development skills necessary to accomplish the task while still being highly effective and flexible and cost-effective.
- Machine learning is a very powerful tool that can create a lot of value when used the right way: to experiment and find the best business ideas.
- But having the right idea and the right data is not enough. We also need the ability to scale quickly when we find a winning case.
- AWS and SageMaker provides the ability to experiment and find the right idea fast and cheap, and to deploy it and scale — all in a single package.
Just follow all the steps to build, train, deploy your Machine Learning Model using AWS SageMaker.