Spring Boot: Apache Kafka Integration

Project Overview

In standard APIs, if the server is busy, the user waits. With Kafka, the server "publishes" a message to a topic and moves on. A separate "consumer" processes that message whenever it’s ready. This project demonstrates this Pub-Sub (Publisher-Subscriber) model.

Apache Kafka is more than just a "messaging queue." It is a distributed event streaming platform designed to handle trillions of events a day. Think of it as a giant, unbreakable digital ledger where data is written as it happens and can be read by anyone authorized to see it.

1. The Core Architecture (The 10,000 Foot View)

At its simplest level, Kafka consists of three main players:

Producers: Applications that send (write) data to Kafka.
Brokers (The Cluster): The servers that store the data. Kafka usually runs as a cluster of multiple brokers for safety.
Consumers: Applications that read (process) data from Kafka.

2. Internal Data Structure: Topics, Partitions, and Segments

This is where the "magic" happens. Kafka doesn't just throw all data into one big bucket.

Topics

A Topic is like a folder in a filesystem. If you are building an e-commerce app, you might have a topic for orders, one for payments, and one for shipping.

Partitions

A Topic is split into Partitions. This is the key to Kafka's massive scale.

Parallelism: Different consumers can read different partitions at the same time.
Order: Within a single partition, messages are strictly ordered by time (First-In, First-Out).
Immutability: Once a message is written to a partition, it can never be changed.

Offsets

Every message in a partition gets an ID number called an Offset. Consumers use this number to "bookmark" where they left off reading.

3. How it Works Internally: The Log

Unlike a traditional database that uses complex "B-Trees" to store data, Kafka uses a Distributed Commit Log.

Append Only: When a Producer sends a message, Kafka simply sticks it at the very end of the partition file. This is incredibly fast because the disk head doesn't have to move around to find a spot.
Sequential I/O: Reading and writing sequentially is orders of magnitude faster than random access.
Zero-Copy: Kafka uses a trick in the Linux kernel called sendfile to move data directly from the disk to the network card without the CPU having to touch the data. This is why Kafka is so fast.

4. Fault Tolerance: Replication

Kafka is designed to stay alive even if a server (Broker) catches fire.

Leader and Followers: Each partition has one Leader broker and several Followers.
Writes: All writes go to the Leader.
Replication: The Followers automatically copy the data from the Leader.
Failover: If the Leader broker dies, one of the Followers is instantly "promoted" to be the new Leader.

5. Consumer Groups: Scaling the Read

If you have millions of messages, one consumer app might be too slow. You can put multiple consumers into a Consumer Group.

Kafka will automatically assign different partitions to different consumers in the group.
If one consumer crashes, Kafka re-assigns its partitions to the remaining healthy consumers. This is called Rebalancing.

6. Why Use Kafka? (The "So What?")

Decoupling: Your "Order Service" doesn't need to know that the "Email Service" exists. It just sends a message to Kafka, and the Email Service picks it up whenever it's ready.
Buffering: If your database is slow, Kafka can hold onto the messages (for days or weeks) until the database catches up.
Real-time Processing: You can analyze data as it flies through Kafka (e.g., detecting credit card fraud in milliseconds).

Technical Concepts

1. The Producer (`KafkaProducerService`)

The Producer uses KafkaTemplate<String, String> to send data to a Topic. Think of a Topic like a category or a folder in a filing cabinet.

Key Point: The Producer doesn't care who is listening or if the listener is even online. It just delivers to the Kafka Broker.

2. The Consumer (`KafkaConsumerService`)

The Consumer uses the @KafkaListener annotation.

Topic: It "subscribes" to my_topic.
Group ID: gfg-group. Kafka uses Group IDs to manage which consumer gets which message, allowing you to scale up by adding more consumers to the same group.

3. Serialization & Deserialization

Since data travels over the network, it must be converted:

Serializer: Converts Java Strings into bytes (Producer side).
Deserializer: Converts bytes back into Java Strings (Consumer side). We configured these in the application.yml to keep our Java code clean.

Component Reference

`KafkaController.java`

Provides a REST endpoint (/api/kafka/publish) so we can trigger a message push using a simple URL parameter. This acts as the entry point for our event-driven flow.

`application.yml`

Instead of writing 50 lines of Java configuration (as seen in your commented-out KafkaConfig files), we use Spring Boot's Auto-Configuration. By defining bootstrap-server: localhost:9092, Spring automatically builds the KafkaTemplate for us.

Key Concepts Explained Simply:

Component	Detailed Responsibility & Implementation
application.yaml	Producer: Sends messages to Kafka topics. Serializers convert Java objects → bytes before sending. Here, both key and value are plain strings. Consumer: Reads messages from Kafka topics. Deserializers convert bytes → Java objects. group-id ensures multiple consumers can share work. auto-offset-reset=earliest → consumer starts from the beginning if no offset exists. bootstrap-server: Entry point to Kafka cluster. In production, you’d list multiple brokers for fault tolerance.
KafkaProducerService	KafkaTemplate: Spring Boot abstraction for sending messages to Kafka. Handles serialization (String → bytes) and broker communication. Topic: A logical channel in Kafka where producers publish and consumers subscribe. Here, "my_topic" is the topic name. Logging with @Slf4j: Provides a logger (log.info) to track message flow. Useful for debugging and monitoring. Producer Flow: sendMessage("Hello Kafka") → logs message → sends to Kafka broker → stored in topic "my_topic".
KafkaConsumerService	@KafkaListener: Makes the method a Kafka consumer. Automatically triggered when new messages arrive in the topic. Topic ("my_topic"): Logical channel where producers publish and consumers subscribe. Consumer Group ("gfg-group"): Ensures multiple consumers can share work. Kafka distributes messages among consumers in the same group. Logging with @Slf4j: Provides a logger (log.info) to track incoming messages. Useful for debugging and monitoring.
KafkaController	@RestController: Makes this a REST API controller → methods return JSON/text directly. @RequestMapping("/api/kafka"): Sets base path → all endpoints start with /api/kafka. @PostMapping("/publish"): Defines POST endpoint for publishing messages. @RequestParam("message"): Extracts message from request query string. Service Delegation: Controller calls KafkaProducerService.sendMessage() → keeps controller lightweight.
SpringbootKafkaExampleApplication	@SpringBootApplication: Marks this as the main Spring Boot app → auto-configures everything. SpringApplication.run(): Starts the application → loads beans, controllers, services, and Kafka configs. Integration: KafkaController → exposes REST endpoint /api/kafka/publish; KafkaProducerService → sends messages; KafkaConsumerService → listens and processes messages.
KafkaProducerConfig	ProducerFactory: Creates Kafka producer instances with given configuration. KafkaTemplate: Simplifies sending messages to Kafka topics. Serializers: Convert Java objects → bytes before sending to Kafka. Manual Config vs application.yml: Old way → define beans manually in config class. New way → just configure in application.yml and Spring Boot auto-configures.
KafkaConsumerConfig	ConsumerFactory: Creates Kafka consumer instances with given configuration. ConcurrentKafkaListenerContainerFactory: Manages listener containers for @KafkaListener methods. @EnableKafka: Enables Kafka listener support in Spring Boot. Deserializers: Convert Kafka message bytes → Java objects (here, Strings). Manual Config vs application.yml: Old way → define beans manually in config class. New way → just configure in application.yml and Spring Boot auto-configures.

Overall Flow

This is a classic Spring Boot + Apache Kafka integration example representing a basic Producer-Consumer pattern.

Here is a complete breakdown of the application flow using architectural and sequence diagrams based on the code provided.

1. High-Level Architecture Diagram

This diagram shows the main components and how data moves between them at a system level.

Component Breakdown:

REST Client: The external trigger (like Postman or a browser) that sends an HTTP POST request.
KafkaController: The entry point into the Spring Boot app. It exposes the REST endpoint.
KafkaProducerService: The service layer responsible for business logic related to sending messages. It holds the topic name definition (my_topic).
KafkaTemplate: A Spring-provided helper class that abstracts away the low-level details of the Apache Kafka Producer client (serialization, connection management, sending asynchronously).
Apache Kafka Broker: The external messaging middleware running on localhost:9092. It receives messages, stores them in the my_topic topic, and serves them to consumers.
Spring MessageListenerContainer (Internal): You don't see this class in your code, but Spring Boot creates it automatically because it sees the @KafkaListener annotation. It runs in the background, constantly polling the Kafka broker for new messages.
KafkaConsumerService: Your service containing the specific method decorated with @KafkaListener that executes only when a message is successfully read from Kafka.

2. Detailed Sequence Diagram

This diagram shows the precise order of method calls and the asynchronous nature of the communication.

3. Technical Flow Explanation

Phase 1: Configuration (Startup)

Before any messages flow, the Spring Boot application starts up.

**Reading application.yaml**: Spring Boot detects the spring.kafka.* properties.
Auto-Configuration:

Because it sees bootstrap-server, it knows how to connect to Kafka.
It creates a default ProducerFactory and the KafkaTemplate bean, configuring it to use string serializers as defined in the YAML.
It creates a default ConsumerFactory and a ConcurrentKafkaListenerContainerFactory.

Annotation Scanning:

Spring finds the @KafkaListener annotation in KafkaConsumerService.
It uses the container factory to create a background message listener container that immediately starts trying to connect to localhost:9092, subscribe to my_topic as part of group gfg-group, and poll for messages.

Note on commented code: The KafkaProducerConfig and KafkaConsumerConfig classes you commented out are indeed not needed because Spring Boot's auto-configuration handles all that boilerplate code based on the settings in application.yaml.

Phase 2: The Producer Flow (Steps 1-7 in Sequence Diagram)

A client (e.g., a browser or Postman) sends a POST request: http://localhost:8080/api/kafka/publish?message=HelloWorld.
The KafkaController receives this request. It extracts "HelloWorld" from the query parameter.
The controller calls kafkaProducerService.sendMessage("HelloWorld").
The KafkaProducerService takes the message and passes it to the injected KafkaTemplate, specifying the hardcoded topic "my_topic".
The KafkaTemplate handles the heavy lifting: it converts the Java String "HelloWorld" into raw bytes using the configured StringSerializer and sends it over the network to the Kafka Broker.
Crucial: The call to Kafka is usually asynchronous. The Java thread doesn't wait for Kafka to acknowledge storage to disk before moving on.
The controller immediately returns the string "Message send to kafka successfully!" to the client with an HTTP 200 status. The HTTP interaction finishes here.

Phase 3: The Consumer Flow (Steps 8-12 in Sequence Diagram)

This happens asynchronously, completely separate from the HTTP request thread.

The background Spring MessageListenerContainer (which has been polling Kafka constantly) notices a new message in my_topic.
It retrieves the raw bytes from Kafka.
It uses the configured StringDeserializer to convert bytes back into the Java String "HelloWorld".
It identifies that the method KafkaConsumerService.consume() is decorated with @KafkaListener for this topic.
It invokes that method, passing "HelloWorld" as the argument.
Inside consume(), the log.info(...) line runs, printing the message to your console.
Once the method exits successfully, Spring tells Kafka that the message has been processed successfully (commits the offset) so it isn't read again by this consumer group.

How to Setup

Since you have Kafka 4.1.1, you are using the most modern version of Kafka. This version has completely removed the need for Zookeeper. It uses KRaft (Kafka Raft), which is faster, more secure, and easier to manage.

Here is the exact setup and usage guide for Kafka 4.x on Windows using the Command Prompt as Administrator.

1. Initial Setup (One-Time Only)

Kafka 4.x requires you to "format" your storage with a unique Cluster ID before you can start the server. This replaces the old Zookeeper registration.

Open Command Prompt as Administrator.
Navigate to your Kafka folder:

cd C:\kafka_2.13-4.1.1

Generate a Cluster ID:

.\bin\windows\kafka-storage.bat random-uuid

Copy the long string it produces (e.g., 764c12b6-abcd-1234...). 4. Format the Storage Directory: Replace YOUR_CLUSTER_ID with the string you just copied.

.\bin\windows\kafka-storage.bat format --standalone -t YOUR_CLUSTER_ID -c .\config\server.properties

The --standalone flag is crucial for Kafka 4.x to tell it you are running a single-node cluster.

2. Starting the Kafka Environment

In Kafka 4.x, you only need one window to run the entire server.

Start the Broker/Server:

.\bin\windows\kafka-server-start.bat .\config\server.properties

Keep this window open. When you see [KafkaRaftServer nodeId=1] Kafka Server started, your environment is ready.

3. Basic Usage (Topic, Producer, Consumer)

Open new Command Prompt windows for each of these steps.

Step	Command (Run from `C:\kafka_2.13-4.1.1`)
1. Create Topic	`.\bin\windows\kafka-topics.bat --create --topic my-first-topic --bootstrap-server localhost:9092`
2. List Topics	`.\bin\windows\kafka-topics.bat --list --bootstrap-server localhost:9092`
3. Start Producer	`.\bin\windows\kafka-console-producer.bat --topic my-first-topic --bootstrap-server localhost:9092`
4. Start Consumer	`.\bin\windows\kafka-console-consumer.bat --topic my-first-topic --from-beginning --bootstrap-server localhost:9092`

4. Understanding the Internal Architecture

KRaft Mode: The Broker now acts as its own "Controller." It manages its own metadata (like partition leaders and topic configurations) without needing a separate Zookeeper service.
Log Storage: In your server.properties, the log.dirs setting (default C:\tmp\kraft-combined-logs) is where the actual data resides.
Metadata Log: Kafka stores its internal state in a special topic called __cluster_metadata. This is how it remembers your topics even after a restart.

5. Troubleshooting Common Windows Errors

NoSuchFileException: Ensure you are not trying to use a config\kraft subfolder. In 4.1.1, the server.properties is usually directly inside the config folder.
The system cannot find the path specified: This usually means you are in the wrong directory or there is a typo in the path. Always run from the root C:\kafka_2.13-4.1.1.
Reconfiguration failed Error: You might see a log error regarding Log4j2. This is a known Windows bug in 4.x. You can ignore it as long as the server says "Started."

How to Test

Step 1: Start Kafka

You must have Apache Kafka and Zookeeper running on your local machine (usually at localhost:9092).

Step 2: Run the App

Start SpringbootKafkaExampleApplication. You will see logs indicating that the Kafka Consumer has successfully joined the group gfg-group.

Step 3: Publish a Message

Open Postman or your browser and hit: POST http://localhost:8080/api/kafka/publish?message=HelloKafka

Step 4: Check the Logs

Check your IDE console. You will see:

INFO: Sending message to Kafka : HelloKafka (From Producer)
INFO: Successfully received message from Kafka : HelloKafka (From Consumer)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.mvn/wrapper		.mvn/wrapper
images		images
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

Spring Boot: Apache Kafka Integration

Project Overview

1. The Core Architecture (The 10,000 Foot View)

2. Internal Data Structure: Topics, Partitions, and Segments

Topics

Partitions

Offsets

3. How it Works Internally: The Log

4. Fault Tolerance: Replication

5. Consumer Groups: Scaling the Read

6. Why Use Kafka? (The "So What?")

Technical Concepts

1. The Producer (KafkaProducerService)

2. The Consumer (KafkaConsumerService)

3. Serialization & Deserialization

Component Reference

KafkaController.java

application.yml

Key Concepts Explained Simply:

Overall Flow

1. High-Level Architecture Diagram

Component Breakdown:

2. Detailed Sequence Diagram

3. Technical Flow Explanation

Phase 1: Configuration (Startup)

Phase 2: The Producer Flow (Steps 1-7 in Sequence Diagram)

Phase 3: The Consumer Flow (Steps 8-12 in Sequence Diagram)

How to Setup

1. Initial Setup (One-Time Only)

2. Starting the Kafka Environment

3. Basic Usage (Topic, Producer, Consumer)

4. Understanding the Internal Architecture

5. Troubleshooting Common Windows Errors

How to Test

Step 1: Start Kafka

Step 2: Run the App

Step 3: Publish a Message

Step 4: Check the Logs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

1. The Producer (`KafkaProducerService`)

2. The Consumer (`KafkaConsumerService`)

`KafkaController.java`

`application.yml`

Packages