Pentaho Labs Unleashes Apache Spark(TM) Integration

ORLANDO, Fla., May 12, 2015 (GLOBE NEWSWIRE) -- Delivering the future of analytics, Pentaho Corporation, today announced the native integration of Pentaho Data Integration (PDI) with Apache Spark, enabling orchestration of Spark jobs. A development effort initiated by Pentaho Labs, this integration will enable customers to increase productivity, reduce maintenance costs, and dramatically lower the skill sets required as Spark is incorporated into big data projects.

Pentaho Labs drives innovation in big data integration and analytics through incubation of new breakthrough advanced technologies.

Spark is a powerful open source processing engine built around speed, ease of use, and machine learning. Engineered from the bottom-up for performance, Spark is a next-generation big data technology to store, blend, and govern data at entirely new levels of speed, scale and simplicity. Building on complementary open source foundations, allowed Pentaho to innovate early with this emerging big data technology.

“For two years, we experimented with possible use cases based on our big data blueprints and sizing the enterprise market opportunity for Spark. Our customers now benefit from that work with simplified, real-time analytic capabilities,” said James Dixon, Chief Technology Officer at Pentaho. “Our open-source heritage and modern extensible platform, allows us to quickly evolve our capabilities keeping our customers’ big data technology options open, reducing risk and saving considerable development time while taking advantage of the latest innovations in popular big data stores.”

As big data technologies evolve at breakneck speed, the Pentaho Labs team continues to leverage and drive innovation in big data integration and analytics allowing customers to advance their big data deployments without risk. Today’s integration with Spark follows other labs efforts that have led to support for YARN and the Adaptive Big Data Layer. Following the native support of YARN alone, enterprise customers like RichRelevance, edo Interactive and MultiPlan have been able to innovate and drive greater value from Hadoop.

“Apache Spark couples high-performance, in-memory data processing and multiple computation models that make it well-suited to power next-generation data processing platforms,” said Matt Aslett, Research Director, Data Platforms and Analytics, 451 Research. “The integration with Spark illustrates how Pentaho’s open source approach enables it to respond as emerging technologies rise to prominence in the ever-evolving big data market. And integrate them with its data management and analytics platform."

Pentaho Data Integration for Apache Spark is currently available in Pentaho Labs. It will be GA in June 2015. To learn more about the innovation in Pentaho Labs visit:

Attend the webinar, Emerging Big Data Technologies: Pentaho Labs Presents Apache Spark on Tuesday, June 2, 2015 at 10am/pt. Register at

About Pentaho Labs
Pentaho Labs, led by Pentaho founders Richard Daley and James Dixon, is staffed with top industry experts to incubate breakthrough advanced analytic capabilities driven by big data. Pentaho Labs encourages seeding of new approaches and technologies that can over time be merged into the Pentaho roadmap based on market demand.

About Pentaho Corporation
Pentaho is delivering the future of business analytics. Pentaho's open source heritage drives our continued innovation in a modern, integrated, embeddable platform built for the future of analytics, including diverse and big data requirements. Powerful business analytics are made easy with Pentaho's cost-effective suite for data access, visualization, integration, analysis and mining. For a free evaluation, download Pentaho Business Analytics at

Pentaho Media Contact US & Worldwide Rebecca Shomair
 Director of Corporate Communications +1 646-484-8150

Source:Pentaho Corporation