Intro to R

R is Microsoft’s Language for the Big Data Age

R is more than just a programming language. An entire online ecosystem has grown up around it. What is R all about, and why is it so important?

Big data analytics is among the hottest topics in technology, and the statistical computing language lies right at the heart of it. Yet R goes beyond being a mere language of the type that programmers and coders grew up around.

Microsoft has launched an enormous range of tools that form an intrinsic part of what those in data analytics circles are calling the R ecosystem. These can be downloaded from a variety of sources – many are available to those who have chosen to download the free, open source software, while others require a Microsoft- specific R distribution.

Here, we take a look at what is what.

Using R from within your existing Microsoft products

If you have already invested in software from Microsoft, the chances are, you already have the opportunity right at your fingertips. The following all have R-capability:

  • SQL Server 2016/17

If you use SQL Server, you can call up R, or you can publish R functions onto the server, which your database administrators can then use from SQL.

  • Visual Studio

R Tools for Visual Studio is an open-source extension for Visual Studio 2017,  and also works with Visual Studio 2015, provided you have Update 3 or higher.

  • Power BI

If you are using Microsoft Power BI, you have the ability to run R scripts directly within the Power BI desktop and import the resulting data sets. You will first need to download and install R from elsewhere, however – see below.

  • Azure

R is supported in Azure Machine Learning Studio (AzureML) the Data Science Virtual Machine and a number of the other cloud-based services that you can find in Azure. Users can also use the AzureML package to publish R functions to Azure, and then call them from Excel.

Open source options

The release of a free, open source version of Microsoft R led to more than a few raised eyebrows. As well as being able to download R itself for free, there are numerous add-ons and packages available from sources like the CRAN repository and Github. These include the following resources:

  • MRAN – the download repository hosts anything and everything from CRAN.
  • Checkpoint package – useful with a static version of R to ensure you use the right packages from that date.
  • Programming packages – Foreach and Iterators are two packages that allow for parallel programming.
  • RStudio – an integrated development environment that provides a user-friendly platform for using R.

Machine Learning Algorithms

Finally, at the top of the tree, Microsoft offers a suite of machine learning and statistical algorithms. In general, these are only available as closed source, and cannot be found on CRAN or Github.

Two of the most important for the future of data analytics and machine learning are Microsoft ML and RevoScaleR. The former contains cutting-edge machine learning algorithms such as neural networks and random forests, while the latter is designed for the largest data sets, and uses parallel computing methods.