Tag Archives: Microsoft R Server

Build & Deploy Machine Learning Apps on Big Data Platforms with Microsoft Linux Data Science Virtual Machine

This post is authored by Gopi Kumar, Principal Program Manager in the Data Group at Microsoft.

This post covers our latest additions to the Microsoft Linux Data Science Virtual Machine (DSVM), a custom VM image on Azure, purpose-built for data science, deep learning and analytics. Offered in both Microsoft Windows and Linux editions, DSVM includes a rich collection of tools, seen in the picture below, and makes you more productive when it comes to building and deploying advanced machine learning and analytics apps.

The central theme of our latest Linux DSVM release is to enable the development and testing of ML apps for deployment to distributed scalable platforms such as Spark, Hadoop and Microsoft R Server, for operating on data at a very large scale. In addition, with this release, DSVM also offers Julia Computing’s JuliaPro on both Linux and Windows editions.


Here’s more on the new DSVM components you can use to build and deploy intelligent apps to big data platforms:

Microsoft R Server 9.0

Version 9.0 of Microsoft R Server (MRS) is a major update to enterprise-scale R from Microsoft, supporting parallel and distributed computation. MRS 9.0 supports analytics execution in the Spark 2.0 context. There’s a new architecture and simplified interface for deploying R models and functions as web services via a new library called mrsdeploy, which makes it easy to consume models from other apps using the open Swagger framework.

Local Spark Standalone Instance

Spark is one of the premier platforms for highly scalable big data analytics and machine learning. Spark 2.0 launched in mid-2016 and brings several improvements such as the revised machine learning library (MLLib), scaling and performance optimization, better ANSI SQL compliance and unified APIs. The Linux DSVM now offers a standalone Spark instance (based on the Apache Spark distribution), PySpark kernel in Jupyter to help you build and test applications on the DSVM and deploy them on large scale clusters like Azure HDInsight Spark or your own on-premises Spark cluster. You can develop your code using either Jupyter notebook or with the included community edition of the Pycharm IDE for Python or RStudio for R.

Single Node Local Hadoop (HDFS and YARN) Instance

To make it easier to develop Hadoop programs and/or use HDFS storage locally for development and testing, a single node Hadoop installation is built into the VM. Also, if you are developing on the Microsoft R Server for execution in Hadoop or Spark remote contexts, you can first test things locally on the Linux DSVM and then deploy the code to a remote scaled out Hadoop or Spark cluster or to Microsoft R Server. These DSVM additions are designed to help you iterate rapidly when developing and testing your apps, before they get deployed into large-scale production big data clusters.

The DSVM is also a great environment for self-learning and running training classes on big data technologies. We provide sample code and notebooks to help you get started quickly on the different data science tools and technologies offered.

DSVM Resources

New to DSVM? Here are resources to get you started:

Linux Edition

Windows Edition

The goal of DSVM is to make data scientists and developers highly productive in their work and provide a broad array of popular tools. We hope you find it useful to have these new big data tools pre-installed with the DSVM.

We always appreciate feedback, so please send in your comments below or share your thoughts with us at the DSVM community forum.

Gopi

New Year & New Updates to the Windows Data Science Virtual Machine

This post is authored by Gopi Kumar, Principal Program Manager in the Data Group at Microsoft.

First of all, a big thank you to all users of the Data Science Virtual Machine (DSVM) for your tremendous response to our offering in 2016. We’re looking forward to a similarly great year in 2017.

The new year also brings in some interesting new tools to our DSVM users, to help you be more productive with data science. In this post, we summarize key recent changes on the Windows Server side of our DSVM offering, below.

  1. Microsoft R Server 9.0.1 (MRS9) developer edition, a major update to the enterprise scalable R extension from Microsoft, is now available on the VM. This version brings a lot of exciting changes including several fast ML / deep learning algorithms developed by Microsoft in a new library called Microsoft ML. There’s a new architecture and interface for deploying R models and functions as web services, this follows a paradigm and interface library very similar to Azure ML operationalization. The library is called mrsdeploy. We have some R deployment samples for both notebook and R Tools for Visual Studio (RTVS) and RStudio. The olapR package in Microsoft R Server lets you run MDX queries and connect directly to OLAP cubes on SQL Server 2016 Analysis Services from your R solution. SQL Server 2016 Developer edition and the associated Microsoft R In-DB analytics is also updated to Service Pack 1.
  2. R Studio Desktop open source edition is now preinstalled into the VM, by popular demand.
  3. R Tools for Visual Studio is now updated to version 0.5, bringing in multi-window plotting and SQL tooling to run R code on SQL Server 2016.
  4. Microsoft Cognitive Toolkit (formerly called CNTK) is now on Version 2 Beta 6, and features several improvements and sample notebooks to perform fast deep learning using Python interface or the CNTK Brainscript interface.
  5. Apache Drill, a SQL based query tool that can work with various data sources and formats (e.g. JSON, CSV), was part of our previous update. We now prepackage and configure drivers to access various Azure data services such as Blobs, SQLDW/Azure SQL, HDI and Document DB. See this tutorial in our gallery for information on how to query data in various Azure data sources from within the Drill SQL query language.
  6. JuliaPro is available to DSVM users and is now pre-installed and pre-configured on the VM, thanks to Julia Computing (a company founded by the creators of Julia programming language). JuliaPro is a curated distribution of the open source Julia language along with a set of popular packages for scientific computing, data science, AI and optimization. The JuliaPro distribution comes with an Atom based IDE, Jupyter notebooks and several sample notebooks on the DSVM Jupyter instance to help you get started. Julia Computing also provides an Enterprise edition with commercial support.
  7. The Deep Learning Toolkit for the Windows DSVM is an extension to help you jump start deep learning on Azure GPU VMs, and without having to spend time installing GPU framework dependencies and drivers or configuring the various deep learning tools. This extension has been updated to include the latest versions of CNTK 2, mxNet for GPU along with new samples. It also features the Windows version of TensorFlow.

We also offer a Linux Edition of the data science virtual machine and there will be a separate post on major updates there.

Meanwhile, here are some resources to get you started with the DSVM.

Windows Edition

Linux Edition

Webinar

I’d like to end this post with a graphical summary of the DSVM, showing a [non-exhaustive] list of the various tools that are preinstalled. DSVM helps you focus more on data science and spend less time on installing, configuring and administering tools, thereby making you more productive. Give DSVM a shot today and send us feedback on how we can make it even better for your data science needs.


Gopi