Azure

Data Science With Azure: Do You Really Need It? This Will Help You Decide!

Azure is Microsoft’s well-known cloud platform, competing with the Google Cloud and Amazon Web Services. Microsoft Azure gives us the freedom to build, manage, and deploy applications on a massive global network using our favorite tools and frameworks.

Azure provides over 100 services that enable us to do everything from running our existing applications on virtual machines to exploring new software paradigms such as intelligent bots and mixed reality. It also provides storage solutions that dynamically grow to accommodate massive amounts of data.

Azure services enable solutions that are simply not feasible without the power of the cloud. Here we are going to mainly focus on the Azure Machine Learning services.

Azure Machine Learning services are a relatively new addition to Microsoft Azure, released publicly in December 2018. Azure Machine Learning services contain many advanced capabilities designed to simplify and accelerate the process of building, training, and deploying machine learning models. Support for popular open-source frameworks such as PyTorch, TensorFlow, and sci-kit learn allows data scientists to use the tools of their choice.

Azure ML services provide multiple services and we will look at a few of those here. We will talk about Notebook VMs, which provide a Jupyter notebook interface for experimenting, Visual Interface, which provides a simple drag and drop interface suitable for rapid prototyping, and at the end.

We will look at its Visual Studio Code extension and Python SDK to create experiments and pipelines. We will explain, step by step, how the ML services are used in an actual project.

Various Azure Technologies:

Azure Machine Learning Studio: It is a completely free tool that you can use to quickly go and try out various machine learning algorithms on your data.

We all know how important can a starter quick and dirty implementation of a basic algorithm turns out in providing crucial insights on data. It is a completely GUI tool where you can pull and drop, done!

A screenshot of a cell phone
Description automatically generated

Azure Machine Learning Service: Well when I started doing my project, one thing which constantly bugged me was the various libraries and their dependencies which propped up time to time.

Downloading these heavy libraries was a very tedious job. That’s what I like the best about Azure Machine Learning Service. It helps you to really focus on the things which actually matter.

A screenshot of a computer
Description automatically generated

Azure Data Science Virtual Machine:

The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft’s Azure cloud built specifically for doing data science. It has much popular data science and other tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics. It is available for Windows Server 2016 and Ubuntu 16.04 LTS. We also offer Windows Server 2012 and CentOS versions, although Windows 2016 and Ubuntu are the recommended options.

The Data Science Virtual Machine for Linux is an Ubuntu-based virtual machine image that makes it easy to get started with deep learning on Azure. The Microsoft Cognitive Toolkit, TensorFlow, MXNet, Caffe, Caffe2, Chainer, NVIDIA DIGITS, Deep Water, Keras, Theano, Torch, and PyTorch are built, installed, and configured so they are ready to run immediately.

The NVIDIA driver, CUDA 9, and cuDNN 7 are also included. All frameworks are the GPU versions but work on the CPU as well. Many sample Jupyter notebooks are included. TensorFlow Serving, MXNet Model Server, and TensorRT are included to test inferencing.

Types of Azure Data Science Virtual Machines:

Each type of DSVM is designed to support different types of machine learning, for example, general predictive modeling, deep learning, and geospatial AI.

Currently, there are two different types of Azure DSVM: Windows DSVMs and Linux DSVMs. Azure offers a Windows Server 2016 and Windows Server 2012 version. Linux DSVMs offer Ubuntu 16.04 LTS and CentOS 7.4 versions.

  • Deep learning is a popular approach when trying to tackle questions involving neural networks with large datasets. Deep learning requires specific tools to manage requirements for the problem at hand. The Deep Learning DSVM comes preconfigured and preinstalled with many of these tools.

Training deep learning models is computationally intense. For better model training performance select a high-speed GPU-based machine, which will significantly speed up machine model training.

Data Lake Analytics:

Azure Data Lake Analytics (ADLA) is one of the main three components of Microsoft’s Azure Data Lake. It is an on-demand job service built on Apache YARN offered by Microsoft to simplify big data by eliminating the need to deploy, configure, and maintain hardware environments to handle heavy analytics workloads. Not only this allow data consumers to focus on what matters, but also it allows them to do so in the most cost-effective way thanks to ADLA’s pay-per-use price model.

Azure Data Lake is a service offered by Microsoft Azure to store and analyze huge amount of data. It can be divided in three parts:

  • Azure Data Lake Storage(Gen1 & Gen2)
  • Azure Data Lake Analytics

Pricing for Azure Data Lake is dependent upon numerous variables, including storage capacity, the number of analytics units (AUs) per minute, the number of completed jobs, and the cost of managed Hadoop and Spark clusters. As of this writing, the Azure Data Lake Store service is priced at $0.039 per GB per month for pay as you go, with capacity-based discounts up to 33% for monthly commitments. The Azure Pricing Calculator can help customers determine exact data lake costs.

A screenshot of a cell phone
Description automatically generated

Streamlining the research to production lifecycle with Azure Machine Learning:

One of the benefits of using PyTorch 1.3 in Azure Machine Learning is Machine Learning Operations (MLOps). MLOps streamlines the end-to-end machine learning (ML) lifecycle so you can frequently update models, test new models, and continuously roll out new ML models alongside your other applications and services. MLOps provides:

  • Reproducible training with powerful ML pipelines that stitch together all the steps involved in training your PyTorch model, from data preparation, to feature extraction, to hyperparameter tuning, to model evaluation.
  • Asset tracking with dataset and model registries so you know who is publishing PyTorch models, why changes are being made, and when your PyTorch models were deployed or used in production.
  • Packaging, profiling, validation, and deployment of PyTorch models anywhere from the cloud to the edge.
  • Monitoring and management of your PyTorch models at scale in an enterprise-ready fashion with eventing and notification of business impacting issues like data drift.

The Azure Machine Learning pipeline does the following tasks:

  • Train model
  • Evaluate model
  • Register model

Reference:

Data Science Virtual Machines:

https://azure.microsoft.com/en-in/services/virtual-machines/data-science-virtual-machines/

Azure ML: https://azure.microsoft.com/en-in/services/machine-learning/

Data Lake Analytics: https://azure.microsoft.com/en-in/services/data-lake-analytics/

 

For any consulting requirements, please email us on cloud@proarch.com

Leave a Reply

Your email address will not be published. Required fields are marked *