Four factors for comparing the top Hadoop distributions – Business Intelligence Info

AlthoughthesoftwarecomponentsthatconstitutetheHadoopecosystemstackareopensourcetechnologies,therearenumerousbenefitstopayingavendorforasubscriptiontouseitscommercialHadoopplatform。

Forexample,asubscriptionprovidestechnicalsupportandtraining,aswellasaccesstoenterprisefeaturesnotavailabletotheopensourcecommunity。

WhiletheenterpriseeditionsofvendorHadoopdistributionsallprovidethecorecomponentsoftheHadoopecosystemstack,thekeydifferentiatorsarewhatthesevendorsofferbeyondtheopenlyaccessiblefunctionality。

RecentchangesinthemarkethavethinnedtheranksofHadoopvendors。

Justthismonth,forexample,PivotalSoftwarepulledtheplugonitsownHadoopdistributionandsaiditwouldstartresellingHortonworksinstead。

Buttheresstilladiversegroupofsupplierstoconsider,includingindependentHadoopspecialists,cloudprovidersandtwoofthelargestITvendors。

TohelpyoudeterminewhichHadoopproviderisrightforyourorganization,thisarticledistinguishesthetopHadoopdistributionsbasedonseveralkeycharacteristics;theseincludedeploymentmodels,enterprise-classfeatures,securityanddataprotectionfeatures,andsupportservices。

NotethatwhiletheHadoopbigdatamanagementecosystemisengineeredtosupportscalabledatastorageandhigh-performancedistributedcomputing,youractualperformancemayvaryforseveralreasons,includingthesoftwareimplementation。

Butmanyperformanceissuesaredependentontheplannedapplicationsthemselves。

Toaddressthis,wellfurtherexaminehowtheHadoopproductdistributionsaretargetedtomeetthebusinessneedsofuserorganizations。

1。

Hadoopdeploymentmodels

MostoftheHadoopvendorssupportamixofdeploymentmethods,butHadoopofferingsfromMicrosoftandAmazonWebServicesaredeployedsolelyincloudenvironments。

MicrosoftleveragesitsAzurecloudinfrastructureforHDInsight,amanagedservicebasedonthe?

HortonworksDataPlatform(HDP)?

thesameHadoopdistributionthatPivotalisnowreselling。

AWSusesitsAmazonElasticCloudComputingplatformandS3datastoretounderpinAmazonElasticMapReduce(EMR),whichbundlesitsHadoopdistributionwithvariousothertoolsandtechnologies。

Inaddition,AmazonEMRprovidestheoptionofusingMapRsHadoopdistributioninsteadoftheAmazonone。

Theclouddeploymentmodelprovidesarapidyetlow-effortmeansofprovisioningaHadoopcluster,andbothMicrosoftandAWSenableuserstoresizetheirenvironmentsondemandtohandledynamiccomputingandstoragecapacityneeds。

Thiselasticityisdesirablefororganizationswithcomputationalandstorageneedsthatmayvaryovertime。

WhiletheothermajorHadoopvendorsCloudera,Hortonworks,IBMandMapRalloffercloud-baseddeployments,theyarentlimitedtothatmodel。

Theyallowuserstodownloaddistributionsthatcanbedeployedon-premisesorinprivatecloudsonavarietyofservers,includingLinuxandWindowssystems。

Inaddition,ClouderaandMapRalsoprovidesandboxversionsthatcanberuninavirtualenvironmentsuchasVMware。

Thebottomline:Considerwhetheryourorganizationpreferstomanageitsbigdataenvironmentin-houseoruseahostedservice。

In-housemanagementimpliesoversightandmaintenanceofthesoftwareenvironmentandcontinuousmonitoringofthesystem,whetherthatenvironmentisaphysicalplatformonpremisesorhousedusingacloud-basedservice。

Theon-premisesoptionmaybepreferableifyouhaveexperiencedstaffandknowthepropersystemsizingcharacteristics,orifsecurityconcernswarrantmanagingthesystembehindatrustedfirewall。

Thealternativeistouseavendorwithahostedservicesplatformthatwillhelpconfigure,launch,manageandmonitoryouroperations。

Thismaybepreferableifyouarentsurewhatsizesystemyouwillneedorexpectthatthesystemsizewillgrowbasedonincreasingdemand。

Thebenefitofworkingwithacloudorhostedserviceisthatitwillprovidethenecessaryelasticityforbothstorageandprocessingresources。

2。

Enterprise-classfeaturesofthetopHadoopdistributions

TherearesomenotabledifferencesinthedevelopmentapproachesofthethreeindependentHadoopvendors。

ClouderaoftenaugmentstheHadoopcorewithinternallydevelopedadd-ontechnologies,forexample;itsImpalaSQL-on-Hadoopqueryengine;ClouderaManageradministrationtools;andKudu,analternativedatastoretotheHadoopDistributedFileSystem(HDFS)foruseinreal-timeanalyticsapplications。

Typically,thecompanynowopensourcessuchtechnologiesafterdoingtheinitialdevelopmentworkitself。

Hortonworks,ontheotherhand,promotesthatitsinnovating100%ofitssoftwareintheApacheHadoopcommunity,andtherearenoproprietaryextensions。

Add-ontechnologiesthatitsthedrivingforcebehind,suchastheAmbariprovisioningandmanagementsoftware,arelaunchedasopensourceprojectsfromtheoutset。

Inaddition,HortonworkshasbandedtogetherwithIBMandothercompaniestoformtheOpenDataPlatformInitiative(ODPi),anorganizationdevotedtocreatingacommonsetofcoretechnicalspecificationsforHadoopplatforms。

ODPimembersclaimthatwillimproveinteroperabilityandminimizevendorlock-in。

MapRhastakenathirdpathbydevelopingitsownfilesystem,MapR-FS,insteadofusingHDFS,aswellasitsownNoSQLdatabase,MapR-DB,andotherfoundationaltechnologiesinanefforttosupportdeploymentsoflargeclusterswithenterprise-classperformanceneeds。

MapRalsoisincreasinglyfocusingonreal-timeandstreamprocessingapplications。

Inlate2015,thecompanyrebrandeditsproductastheMapRConvergedDataPlatform,whichcombinesHadoopandtheMapRfilesystemanddatabasewiththeApacheSparkprocessingengineandaneweventstreamingtechnologycalledMapRStreamsinordertohandlebothbatchandreal-timejobs。

Fromafeaturesstandpoint,theenterpriseversionoftheClouderaCDHdistributionprovidestoolsforoperationalmanagementandreportingandforsupportingbusinesscontinuity。

Thisincludessuchitemsasconfigurationhistoryandrollbacks,rollingupdatesandservicerestarts,andautomateddisasterrecovery。

MapRsenterpriseofferingprovidestoolstobettermanageandensuretheresiliencyandreliabilityofdatainHadoopclusters,aswellasmulti-tenancyandhighavailabilitycapabilities。

HortonworksprovidesproactivemonitoringandmaintenancewithitsHDPsupportsubscriptions。

IBM,meanwhile,hasadoptedananalytics-orientedstrategyonitsBigInsightsforApacheHadoopdistribution,inkeepingwithitsbroaderfocusonsellingbusinessintelligenceandadvancedanalyticstools。

IBMoffersdifferentvalue-addmoduleswithenterprise-gradefeaturesaspartofBigInsights,includingseparateAnalystandDataScientistmodules。

ItsAnalystmoduleprovidesBigSQLforfederatedSQLaccesstoHadoopandotherdatasources。

BigSheets,whichispartoftheAnalystmodule,allowsuserstoexplore,transformandperformvisualizationsonlargedatasetsstoredinHadoop,usinganintuitivespreadsheet-likeinterface。

TheBigInsightsDataScientistModuleincludesaversionoftheRlanguage,textanalyticsandamachinelearninglibrarycalledSystemMLthathasbeencontributedtotheopensourcecommunity。

WhileitscloudplatformisAWSprimarycallingcardforAmazonEMR,italsoofferstoolsformonitoringandmanagingclustersandenablingapplicationandclusterinteroperabilityaspartoftheHadoopservice。

AmazonEMRcollectsmetricsthatareusedtotrackprogressandmeasurethehealthofacluster。

Clusterhealthmetricscanbeaccessedthroughthecommandlineinterface,softwaredeveloperkitsorAPIsandcanbeviewedthroughtheEMRmanagementconsole。

Additionally,AmazonsCloudWatchmonitoringservicecanbeusedalongwithitsimplementationoftheApacheGangliaperformancemonitoringcomponenttochecktheclusterandsetalarmsoneventstriggeredbythesemetrics。

Thebottomline:Choosingavendorthatprovidesvalue-addcomponentsaspartofitsenterprisesubscriptionmaymeancommittingtoalong-termrelationshipespeciallyifthesecomponentsaretightlyintegratedwithitsstandardstackdistribution。

Ifyoureconcernedaboutvendorlock-in,considerthosevendorsthatareparticipatingintheOPDi。

3。

SecurityandprotectionofferingsfromtheHadoopvendors

Despitetheexpandinguseofopensourcesoftwareforenterprise-classapplications,thereremainsuspicionsaboutitssuitabilityforproductionusefromasecurityandprotectionperspective。

SeveralHadoopvendorshavetakenstepstoalleviatesomeofthisanxiety。

Forexample,HortonworkshasteamedupwithothervendorsandcustomerstolaunchaDataGovernanceInitiativeforHadoop,withaninitialfocusonanewApacheprojectcalledAtlasformanagingsharedmetadata,dataclassification,auditing,andsecurityandpolicymanagementfordataprotection。

ItsalsoworkingtointegrateAtlaswithRanger,anopensourcesecuritytoolforenforcingdataaccesspolicies。

ClouderaprovidestoolsthatenableuserstomanagedatasecurityandgovernancefortheCDHplatform,supportinganorganizationsneedtomeetcomplianceandregulatoryrequirements。

Inaddition,Hortonworks,Cloudera,MapRandIBMallprovidedataencryption。

BothHortonworksandClouderasupportencryptionofdataatrest。

MapRprovidesencryptionofdatatransmittedto,fromandwithinacluster。

IBMofferstheproductInfoSphereGuardium,whichenforcesdataprivacyaswellasprovidesencryptionandmaskingofconfidentialdata。

Thebottomline:TheHadoopvendorsprovidedifferentapproachestoauthentication,role-basedaccesscontrol,securitypolicymanagementanddataencryption。

Carefullyspecifyyoursecurityandprotectionrequirementsandreviewhoweachvendoraddressesthoseneeds。

4。

SupportsubscriptionsforthetopHadoopdistributions

Thefundamentalvaluepropositionfortheopensourcesoftwaremodelisthebundlingandsimplificationofsystemdeploymentwithsupportandservices。

OnealternativefordeployingHadoopinvolvesdownloadingthesourcecodeforeachcomponentfromtheopensourcerepositoryandthenbuildingandintegratingallthepartstogether。

Thistakesbothskillandeffort,andislikelytobeaniterativeprocess。

Opensourcevendorshavealreadydonetheheavylifting,providingpreconfigureddistributionsandmaintaininganup-to-dateintegratedstack。

Whatdifferentiatesthevendorstoalargedegreeistheirsupportmodels。

Hortonworksprovidesseveralmodels,rangingfromitsJumpstarteditionwithWeb-basedsupportduringbusinesshoursandone-dayresponsetimetoitsEnterpriseeditionwith24/7supportandmuchshorterresponsetimesdependingontheseverityoftheissue。

Clouderaoffersasupportsubscriptionwithone-hourand24/7supportoptionsforenterpriselicenseholders。

ItalsoofferspremiumsupportfororganizationswiththeFlexorDataHubeditionlicensesthatincludea15-minuteresponsetimeforcriticalissues。

AllAWSaccountsincludebasicsupport,whichprovides24/7customerservice,accesstocommunityforumsanddocumentation,aswellasaccesstotheAWSTrustedAdvisorapplication。

Developersupportincludesone-hourresponseforsevereissueswith12-or24-hourresponsetimesformostissues。

Business-levelsupportprovides24/7emailaccesstocloudsupportengineersaswellasshortenedresponsetimesbasedonseverity。

Enterprise-levelsupportaddslessthan15-minuteresponsetimeforcriticalissuesaswellasadedicatedtechnicalaccountmanager,plusadditionallaunchandoperationsupportbenefits。

MapRoffersaPremiumsupportservicethataddsWebandemailsupport,customportal,training,urgentbugfixes,follow-the-sunsupportand24/7phonesupportforpriorityissues。

ThecompanysPremium+Supportaddspriorityqueuingofticketsandsinglepointofcontactsupport,andoffersoptionsforonsiteorremotededicatedsupport。

IBMprovidessupportfororganizationsthatpurchasethelicensedcomponentsalsoreferredtoastheirvalue-addmodulesthatextendtheirOpenPlatformwithApacheHadoop。

Thebottomline:Ifsupportservicesarethesourceofaddedvaluefromthevendor,thecostsforthedifferentsupportsubscriptionsshouldbealignedwithcustomerexpectations。

Subscriptionsprovidingone-houroreven15-minuteresponsetimesona24/7basiswithdedicatedsupportstaffwillcostalotmorethan24-hourresponsetimefromaWeb-basedinterfaceduringbusinesshours。

Hadoophastransformedthebusinessintelligenceandanalyticsindustryduringthepast10years。

But,asweveexamined,theopensourceHadoopframeworkoffersonlysomuch,andcompaniesthatneedmorerobustperformanceandfunctionalitycapabilitiesaswellasmaintenanceandsupportareturningtocommercialHadoopsoftwaredistributions。

Hopefully,thisinformationwillhelpyoumakeamoreinformedchoicewhenpurchasingaHadoopdistribution?

Letsblockads!

(Why?

)

SearchBusinessAnalytics:BI,CPMandanalyticsnews,tipsandresources


bck体育_bck体育客服_bck体育手机版官网下载