Quantcast
Channel: Kafka – Cloudera Engineering Blog
Browsing all 40 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Jay Kreps, Apache Kafka Architect, Visits Cloudera

It was good to see Jay Kreps (@jaykreps), the LinkedIn engineer who is the tech lead for that company’s online data infrastructure, visit Cloudera Engineering yesterday to spread the good word about...

View Article



Building Lambda Architecture with Spark Streaming

The versatility of Apache Spark’s API for both batch/ETL and streaming workloads brings the promise of lambda architecture to the real world. Few things help you concentrate like a last-minute change...

View Article

Image may be NSFW.
Clik here to view.

Apache Kafka for Beginners

When used in the right way and for the right use case, Kafka has unique attributes that make it a highly attractive option for data integration. Apache Kafka is creating a lot of buzz these days. While...

View Article

Image may be NSFW.
Clik here to view.

Flafka: Apache Flume Meets Apache Kafka for Event Processing

The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure. In this previous post you learned some Apache Kafka basics and...

View Article

Image may be NSFW.
Clik here to view.

How Cerner Uses CDH with Apache Kafka

Our thanks to Micah Whitacre, a senior software architect on Cerner Corp.’s Big Data Platforms team, for the post below about Cerner’s use case for CDH + Apache Kafka. (Kafka integration with CDH is...

View Article


How-to: Deploy and Configure Apache Kafka in Cloudera Enterprise

With Kafka now formally integrated with, and supported as part of, Cloudera Enterprise, what’s the best way to deploy and configure it? Earlier today, Cloudera announced that, following an incubation...

View Article

Image may be NSFW.
Clik here to view.

How-to: Do Real-Time Log Analytics with Apache Kafka, Cloudera Search, and Hue

Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to make web log analysis, powered in part by Kafka, one of your first steps in a pervasive analytics...

View Article

Exactly-once Spark Streaming from Apache Kafka

Thanks to Cody Koeninger, Senior Software Engineer at Kixer, for the guest post below about Apache Kafka integration points in Apache Spark 1.3. Spark 1.3 will ship in CDH 5.4. The new release of...

View Article


Image may be NSFW.
Clik here to view.

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment. The Apache Hadoop ecosystem has become a preferred platform...

View Article


Deploying Apache Kafka: A Practical FAQ

This post contains answers to common questions about deploying and configuring Apache Kafka as part of a Cloudera-powered enterprise data hub. Cloudera added support for Apache Kafka, the open standard...

View Article

Image may be NSFW.
Clik here to view.

Designing Fraud-Detection Architecture That Works Like Your Brain Does

To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka). At its core, fraud detection is about detection whether...

View Article

Image may be NSFW.
Clik here to view.

Inside Santander’s Near Real-Time Data Ingest Architecture

Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK. Cloudera Professional Services has...

View Article

How-to: Build a Complex Event Processing App on Apache Spark and Drools

Combining CDH with a business execution engine can serve as a solid foundation for complex event processing on big data. Event processing involves tracking and analyzing streams of data from events to...

View Article


How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache...

Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture. Real-time stream processing with Apache Kafka as a backbone...

View Article

How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and...

Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data...

View Article


What’s New in Cloudera’s Distribution of Apache Kafka?

Cloudera’s distribution (now on release 2.0) of Kafka is based on Apache Kafka 0.9 and includes various new features (especially for security and usability), enhancements, and bug fixes. Kafka is...

View Article

Building, Benchmarking, and Tuning Syslog Ingest Architecture

A large UK telco’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and...

View Article


Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)

Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine...

View Article

New in Cloudera Labs: Envelope (for Apache Spark Streaming)

As a warm-up to Spark Summit West in San Francisco (June 6-8),  we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier. Spark Streaming is the...

View Article

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest...

View Article
Browsing all 40 articles
Browse latest View live




Latest Images