CIO

What Open Source Hadoop Coming to Windows Means to IT

Big data analysis has exploded onto the business stage over thelast 12 months or so, and one of the most important Big Dataanalysis platforms is the open source Apache Hadoop project.It's generally run on Linux, and it's used by some big-namecompanies, including Yahoo!, Facebook and Twitter.

Hortonworks Data Platform (HDP) for Windows

What's about to change over the next few months is thatHadoop is coming to Windows in the form of Hortonworks DataPlatform (HDP) for Windows, a fully supported open source Hadoopdistribution that runs on Windows Server. (Hortonworks, aCalifornia-based company, is a sponsor of and contributor to theApache Hadoop project, and it already offers its Linux-based HDPdistribution on a commercial basis.)

[Related: HadoopIs Not Just for Linux Anymore]

This will open up Hadoop to a large number of organizations thathave no in-house Linux skills. Shaun Connolly, vice president ofCorporate Strategy at Hortonworks, explains the thinking behindmoving HDP to Windows in this way: "Essentially it's amarket-driven decision," he says. "Hadoop is built forthe scaleout commodity hardware market, and the commodity hardwaremarket is 70% Windows by install base and expertise."

Employees in Windows-only companies will be able to make use ofHadoop easily because Excel can be used as a business intelligencetool to view the results of Hadoop Big Data analysis (whetherHadoop is running on Windows or Linux). "Ideally we wantMicrosoft users to be oblivious to the fact that everything iscoming from Hadoop," says Connolly. "If end users canconsume data without any learning curve, thanks to tools likeExcel, then they get more value."

[Related: MicrosoftBrings Big Data to Windows]

Windows shops will also be able to benefit from Hadoop onWindows because IT staff with Windows skills will be able to writeHadoop applications using Microsoft's VisualStudio and .Netframework, without the need for any Linux expertise. (As an aside,both Hortonworks' and Microsoft's Windows offerings are100% Apache Hadoop -- there have been no tweaks to the code--so anyLinux Hadoop app could easily be ported to Windows, Connollysays.)

But it turns out that HDP for Windows is not the only way thatHadoop is coming to Windows. Microsoft has been working behind thescenes with Hortonworks since late 2011, and the Redmond giant isabout to release its own distribution of Hadoop which it callsHDInsight. This will be available as a service running in thecompany's Azure cloud, or as a product that's intended tobe used as the basis of an on-premise private cloud Hadoopinstallation.

A decade or so ago Microsoft was resolutely anti open sourcesoftware, and ironically it may be that its support for Hadoopstems from this old animosity, according to Wes Miller, an analystat Directions on Microsoft. "I think part of the reason thatMicrosoft wants Hadoop on Windows is out of concern about thecompetition Linux poses," he says.

[Slideshow: 10Real-World Big Data Deployments That Will Change OurLives]

But there's another reason to, he says. "The companyalso wants to ensure that if you do use Hadoop, you can also useSQL's BI stack for the business intelligence part."

3 Ways Businesses Will Buy and Use Hadoop Capabilities on Windows

The quick and easy option is to use the Azure service, accordingto Eron Kelly, general manager of product marketing forMicrosoft's data platform. "This is an ideal way toconsume Hadoop technology as it can be complicated to run usingopen source projects," he says. "With Azure this can bedone in a couple of clicks and you then pay for what youuse."

Companies do have other options for accessing Hadoop in thecloud -- by running Hadoop in Amazon or Rackspace clouds, forexample. But for companies that already use Azure, with largeamounts of data (and logs) in that cloud, then the HDInsightservice certainly would appear to make sense.

For companies that want to manage and run their own Hadoopinstallation, HDP for Windows is probably the option to go for --as long as they have their own Windows servers and are prepared toinstall the software and get it going. Support is available fromHortonworks. "This could be appealing to companies that havegenerally had a no open source software (on Linux) policy. The sortthat may have wanted Hadoop, but only wanted it on Windows,"Wes Miller points out.

HDInsight Server for Windows will allow larger enterprises totake advantage of their existing investment in Microsoft'ssoftware stack -- particularly the cloud management capabilities ofSystem Center -- to incorporate it into a private cloud. And itneedn't be expensive -- the product is available as a freedownload, Kelly explains.

"There's no incremental fee for using HDInsight Serverfor Windows--we will monetize it by customers having to buy WindowsServer to use it--and maybe from them using our data warehousing orBI environments as well," he says. "We may also monetizeit by selling newer versions of Excel," he adds. (Microsoftoffers an Excel 2013 and 2010 add-in called Data Explorer which canconnect to HDInsight instances in Azure or on-premises.)

The big question mark over HDInsight Server for Windows iswhether any but the largest companies will actually want to use it,Miller fears it may be too complicated. "Will smaller sizedbusinesses really want to run Hadoop in a private cloud? Maybe ifit is all automated, but I don't think that Microsoft is goingto be doing that," he says.

There's no doubt that Hadoop's arrival on Windows issignificant, because it will bring Big Data analysis within easyreach of the vast Windows market. (Hortonworks' Shaun Connollyestimates that by releasing HDP for Windows, the company willdouble the potential market for HDP immediately.) And by loweringthe barriers to entry for making use of Hadoop, smallerorganizations and even business departments will be able to benefitfrom insights gained from the analysis of Big Data.

Paul Rubens is a technology journalist based in England. Contact him at paul@rubens.org.Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.

Read more about data management in CIO's Data Management Drilldown.