Jay Kreps, Apache Kafka Architect, Visits Cloudera
It was good to see Jay Kreps (@jaykreps), the LinkedIn engineer who is the tech lead for that company’s online data infrastructure, visit Cloudera Engineering yesterday to spread the good word about...
View ArticleBuilding Lambda Architecture with Spark Streaming
The versatility of Apache Spark’s API for both batch/ETL and streaming workloads brings the promise of lambda architecture to the real world. Few things help you concentrate like a last-minute change...
View ArticleApache Kafka for Beginners
When used in the right way and for the right use case, Kafka has unique attributes that make it a highly attractive option for data integration. Apache Kafka is creating a lot of buzz these days. While...
View ArticleFlafka: Apache Flume Meets Apache Kafka for Event Processing
The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure. In this previous post you learned some Apache Kafka basics and...
View ArticleHow Cerner Uses CDH with Apache Kafka
Our thanks to Micah Whitacre, a senior software architect on Cerner Corp.’s Big Data Platforms team, for the post below about Cerner’s use case for CDH + Apache Kafka. (Kafka integration with CDH is...
View ArticleHow-to: Deploy and Configure Apache Kafka in Cloudera Enterprise
With Kafka now formally integrated with, and supported as part of, Cloudera Enterprise, what’s the best way to deploy and configure it? Earlier today, Cloudera announced that, following an incubation...
View ArticleHow-to: Do Real-Time Log Analytics with Apache Kafka, Cloudera Search, and Hue
Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to make web log analysis, powered in part by Kafka, one of your first steps in a pervasive analytics...
View ArticleExactly-once Spark Streaming from Apache Kafka
Thanks to Cody Koeninger, Senior Software Engineer at Kixer, for the guest post below about Apache Kafka integration points in Apache Spark 1.3. Spark 1.3 will ship in CDH 5.4. The new release of...
View ArticleArchitectural Patterns for Near Real-Time Data Processing with Apache Hadoop
Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment. The Apache Hadoop ecosystem has become a preferred platform...
View ArticleDeploying Apache Kafka: A Practical FAQ
This post contains answers to common questions about deploying and configuring Apache Kafka as part of a Cloudera-powered enterprise data hub. Cloudera added support for Apache Kafka, the open standard...
View ArticleDesigning Fraud-Detection Architecture That Works Like Your Brain Does
To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka). At its core, fraud detection is about detection whether...
View ArticleInside Santander’s Near Real-Time Data Ingest Architecture
Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK. Cloudera Professional Services has...
View ArticleHow-to: Build a Complex Event Processing App on Apache Spark and Drools
Combining CDH with a business execution engine can serve as a solid foundation for complex event processing on big data. Event processing involves tracking and analyzing streams of data from events to...
View ArticleHow Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache...
Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture. Real-time stream processing with Apache Kafka as a backbone...
View ArticleHow-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and...
Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data...
View ArticleWhat’s New in Cloudera’s Distribution of Apache Kafka?
Cloudera’s distribution (now on release 2.0) of Kafka is based on Apache Kafka 0.9 and includes various new features (especially for security and usability), enhancements, and bug fixes. Kafka is...
View ArticleBuilding, Benchmarking, and Tuning Syslog Ingest Architecture
A large UK telco’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and...
View ArticleInside Santander’s Near Real-Time Data Ingest Architecture (Part 2)
Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine...
View ArticleNew in Cloudera Labs: Envelope (for Apache Spark Streaming)
As a warm-up to Spark Summit West in San Francisco (June 6-8), we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier. Spark Streaming is the...
View ArticleHow-to: Ingest Email into Apache Hadoop in Real Time for Analysis
Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest...
View Article
More Pages to Explore .....