Static Factories are Great!

Every now and then I jump on classes with multiple constructors or classes that are rigorous to work with. Let alone not being able to mock part of their components and at the end being forced to use reflection for testing (mockito based, old school, you choose).

Imagine a Producer class that you use for Kafka. A class that provides you with some abstraction on sending messages.

package com.gkatzioura.kafka.producer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

@Slf4j
public class StrMessageProducer {

	private Producer<String,String> producer;
	private String topic = "test-topic";
	
	StrMessageProducer() {
		var kafkaProperties = new Properties();
		kafkaProperties.put("bootstrap.servers",System.getProperty("bootstrap.servers"));
		kafkaProperties.put("key.serializer",System.getProperty("key.serializer"));
		kafkaProperties.put("value.serialize",System.getProperty("value.serializer"));
		var kafkaProducer = new KafkaProducer<String,String>(kafkaProperties);
		this.producer = kafkaProducer;
	}

	public void send(String message) {
		var producerRecord = new ProducerRecord<String,String>(topic,null, message);
		try {
			var metadata = producer.send(producerRecord).get();
			log.info("Submitted {}",metadata.offset());
		}
		catch (InterruptedException |ExecutionException e) {
			log.error("Could not send record",e);
		}
	}
}

Apart from being an ugly class it is also very hard to change some of its components.

For example

  • I cannot use this class to post to another topic
  • I cannot use this class to use a different server configuration apart from the one on the properties
  • It’s difficult to test the functionality of the class since crucial components are created through the constructor

It’s obvious that the constructor in this case serves the purpose on creating a Kafka producer based on the system properties. But the responsibility of the class is to use that producer in order send messages in a specific way. Thus I will move the creation of the Producer from the constructor. Also because we might want to swap the topic used, I will also inject the topic instead of having it hardcoded.
By doing so we encourage dependency injection. We make it easy to swap the ingredients of the class however the execution would be the same.

package com.gkatzioura.kafka.producer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

@Slf4j
public class StrMessageProducer {

	private final Producer<String,String> producer;
	private final String topic;

	StrMessageProducer(Producer<String,String> producer, String topic) {
		this.producer = producer;
		this.topic = topic;
	}

	public void send(String message) {
		var producerRecord = new ProducerRecord<String,String>(topic,null, message);
		try {
			var metadata = producer.send(producerRecord).get();
			log.info("Submitted {}",metadata.offset());
		}
		catch (InterruptedException |ExecutionException e) {
			log.error("Could not send record",e);
		}
	}
}

But we still need the producer to be created somehow. This is where the factory pattern kicks in.

We shall add static factories in order to have instances of the StrMessageProducer class with different configurations.
Let’s add two factory methods
The first factory method would be based on system properties and the second on environment variables.

package com.gkatzioura.kafka.producer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

@Slf4j
public class StrMessageProducer {

	private final Producer<String,String> producer;
	private final String topic;

	StrMessageProducer(Producer<String,String> producer, String topic) {
		this.producer = producer;
		this.topic = topic;
	}

	public void send(String message) {
		var producerRecord = new ProducerRecord<String,String>(topic,null, message);
		try {
			var metadata = producer.send(producerRecord).get();
			log.info("Submitted {}",metadata.offset());
		}
		catch (InterruptedException |ExecutionException e) {
			log.error("Could not send record",e);
		}
	}

	public static StrMessageProducer createFromSystemPros() {
		var kafkaProperties = new Properties();
		kafkaProperties.put("bootstrap.servers",System.getProperty("bootstrap.servers"));
		kafkaProperties.put("key.serializer",System.getProperty("key.serializer"));
		kafkaProperties.put("value.serialize",System.getProperty("value.serializer"));
		var kafkaProducer = new KafkaProducer<String,String>(kafkaProperties);
		return new MessageProducer(kafkaProducer, System.getProperty("main.topic"));
	}

	public static StrMessageProducer createFromEnv() {
		var kafkaProperties = new Properties();
		kafkaProperties.put("bootstrap.servers",System.getenv("BOOTSTRAP_SERVERS"));
		kafkaProperties.put("key.serializer",System.getenv("KEY_SERIALIZER"));
		kafkaProperties.put("value.serialize",System.getenv("VALUE_SERIALIZER"));
		var kafkaProducer = new KafkaProducer<String,String>(kafkaProperties);
		return new MessageProducer(kafkaProducer, System.getProperty("MAIN_TOPIC"));
	}
}

You already see the benefits. You have a clean class ready to use as it is and you have some factory methods for convenience. Eventually you can add more static factories, some of them might also have arguments, for example the topic.

Also we can go one step further when we want to have multiple classes of MessageProducers and we want to utilise an interface. So we are going to introduce the MessageProducer interface which our StrMessageProducer class will implement. Also we are going to put the static factories to the interface.

So this will be our interface with the static factories.

package com.gkatzioura.kafka.producer;

import java.util.Properties;

import org.apache.kafka.clients.producer.KafkaProducer;

public interface MessageProducer {
	
	void send(String message);

	static MessageProducer createFromSystemPros() {
		var kafkaProperties = new Properties();
		kafkaProperties.put("bootstrap.servers",System.getProperty("bootstrap.servers"));
		kafkaProperties.put("key.serializer",System.getProperty("key.serializer"));
		kafkaProperties.put("value.serialize",System.getProperty("value.serializer"));
		var kafkaProducer = new KafkaProducer<String,String>(kafkaProperties);
		return new StrMessageProducer(kafkaProducer, System.getProperty("main.topic"));
	}

	static MessageProducer createFromEnv() {
		var kafkaProperties = new Properties();
		kafkaProperties.put("bootstrap.servers",System.getenv("BOOTSTRAP_SERVERS"));
		kafkaProperties.put("key.serializer",System.getenv("KEY_SERIALIZER"));
		kafkaProperties.put("value.serialize",System.getenv("VALUE_SERIALIZER"));
		var kafkaProducer = new KafkaProducer<String,String>(kafkaProperties);
		return new StrMessageProducer(kafkaProducer, System.getProperty("MAIN_TOPIC"));
	}

}

And this would be our new StrMessageProducer class.

package com.gkatzioura.kafka.producer;

import java.util.concurrent.ExecutionException;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

@Slf4j
public class StrMessageProducer implements MessageProducer {

	private final Producer<String,String> producer;
	private final String topic;

	StrMessageProducer(Producer<String,String> producer, String topic) {
		this.producer = producer;
		this.topic = topic;
	}

	@Override
	public void send(String message) {
		var producerRecord = new ProducerRecord<String,String>(topic,null, message);
		try {
			var metadata = producer.send(producerRecord).get();
			log.info("Submitted {}",metadata.offset());
		}
		catch (InterruptedException |ExecutionException e) {
			log.error("Could not send record",e);
		}
	}

}

Let’s check the benefits

  • We can have various implementations of a MessageProducer class
  • We can add as many factories we want that serve our purpose
  • We can easily test the MessageProducer implementation by passing mocks to the constructors
  • We keep our codebase cleaner

 

Kafka & Zookeeper for Development: Connecting Clients to the Cluster

Previously we achieved to have our Kafka brokers connect to a ZooKeeper ensemble. Also we brought down some brokers checked the election leadership and produced/consumed some messages.

For now we want to make sure that we will be able to connect to those nodes. The problem with connecting to the ensemble we created previously is that it is located inside the container network. When a client interacts with one of the brokers and receives the full list of the brokers he will receive a list of IPs not accessible to it.

So the initial handshake of a client will be successful but then the client will try to interact withs some unreachable hosts.

In order to tackle this we will have a combination of workarounds.

The first one would be to bind the port of each Kafka broker to a different local ip.

kafka-1 will be mapped to 127.0.0.1:9092
kafka-2 will be mapped to 127.0.0.2:9092
kafka-3 will be mapped to 127.0.0.3:9092

So let’s create the aliases of those addresses

sudo ifconfig lo0 alias 127.0.0.2
sudo ifconfig lo0 alias 127.0.0.3

Now it’s possible to do the ip binding. Let’s also put those entries to our /etc/hosts. By doing this, we achieve our local network and our docker network to be in agreement on which broker they should access.

127.0.0.1	kafka-1
127.0.0.2	kafka-2
127.0.0.3	kafka-3

The next step is also to change the KAFKA_ADVERTISED_LISTENERS on each broker. We will adapt this to the DNS entry of each broker. By setting KAFKA_ADVERTISED_LISTENERS the clients from the outside can correctly connect to it, to an address reachable to them and not an address through the internal network. Further explanations can be found on this blog.

  kafka-1:
    container_name: kafka-1
    image: confluent/kafka
    ports:
    - "127.0.0.1:9092:9092"
    volumes:
    - type: bind
      source: ./server1.properties
      target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-1:9092"
  kafka-2:
    container_name: kafka-2
    image: confluent/kafka
    ports:
      - "127.0.0.2:9092:9092"
    volumes:
      - type: bind
        source: ./server2.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-2:9092"
  kafka-3:
    container_name: kafka-3
    image: confluent/kafka
    ports:
      - "127.0.0.3:9092:9092"
    volumes:
      - type: bind
        source: ./server3.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-3:9092"

We see the port binding change as well as the KAFKA_ADVERTISED_LISTENERS. Now let’s wrap everything together in our docker-compose

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: "1"
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    environment:
      ZOO_MY_ID: "2"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    environment:
      ZOO_MY_ID: "3"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  kafka-1:
    container_name: kafka-1
    image: confluent/kafka
    ports:
    - "127.0.0.1:9092:9092"
    volumes:
    - type: bind
      source: ./server1.properties
      target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-1:9092"
  kafka-2:
    container_name: kafka-2
    image: confluent/kafka
    ports:
      - "127.0.0.2:9092:9092"
    volumes:
      - type: bind
        source: ./server2.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-2:9092"
  kafka-3:
    container_name: kafka-3
    image: confluent/kafka
    ports:
      - "127.0.0.3:9092:9092"
    volumes:
      - type: bind
        source: ./server3.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-3:9092"

Last but not least you can find the code on github.

Testing with Hoverfly and Java Part 4: Exact, Glob and Regex Matchers

Previously we used Hoverfly among its state feature.
So far our examples have been close to an absolute request match, thus on this blog we will focus on utilising the matchers.
Having a good range of matchers is very important because most API interactions are dynamic and you can’t always predict the example. Imagine a JWT signature. You can match the body but the signature might change per environment.

 

There are three type of matchers available.

  • The exact matcher: The fields headers should be an exact match
  • The glob matcher: A match that gives the ability to using the `*`
  • The regex matcher: A matcher that requires you to search again on the internet on how to make a regex
  • XML matcher: This about matching the xml as XML node by node value by value
  • Xpath matcher: Match based on a value match through Xpath
  • JSON matcher: Match the json exactly
  • JSON partial matcher: Match if the json submitted contains the Json values specified
  • JSONPath matcher: Just like the xpath match based the on the json path submitted

Let’s start with the exact matcher.

public class ExactMatcherTests {

	private Hoverfly hoverfly;

	@BeforeEach
	void setUp() {
		var simulation = SimulationSource.dsl(service("http://localhost:8085")
				.get("/exact")
				.header("Origin", RequestFieldMatcher.newExactMatcher("internal-server"))
				.willReturn(success("{\"exact\":true}", "application/json")));

		var localConfig = HoverflyConfig.localConfigs().disableTlsVerification().asWebServer().proxyPort(8085);
		hoverfly = new Hoverfly(localConfig, SIMULATE);
		hoverfly.start();
		hoverfly.simulate(simulation);
	}

	@AfterEach
	void tearDown() {
		hoverfly.close();
	}

	@Test
	void testExactMatcherSuccess() {
		var client = HttpClient.newHttpClient();
		var exactRequest = HttpRequest.newBuilder()
				.uri(URI.create("http://localhost:8085/exact"))
				.header("Origin","internal-server")
				.build();

		var exactResponse = client.sendAsync(exactRequest, HttpResponse.BodyHandlers.ofString())
				.thenApply(HttpResponse::body)
				.join();

		Assertions.assertEquals("{\"exact\":true}", exactResponse);
	}

	@Test
	void testExactMatcherFailure() {
		var client = HttpClient.newHttpClient();
		var exactRequest = HttpRequest.newBuilder()
				.uri(URI.create("http://localhost:8085/exact"))
				.build();

		var exactResponse = client.sendAsync(exactRequest, HttpResponse.BodyHandlers.ofString())
				.join();

		Assertions.assertEquals(502, exactResponse.statusCode());
	}

}

The failures or success come based on wether the header was matching exactly or not.

We shall use the glob match for a request parameter.


public class GlobMatcher {

	private Hoverfly hoverfly;

	@BeforeEach
	void setUp() {
		var simulation = SimulationSource.dsl(service("http://localhost:8085")
				.get("/glob")
				.queryParam("userName", RequestFieldMatcher.newGlobMatcher("john*"))
				.willReturn(success("{\"glob\":true}", "application/json")));

		var localConfig = HoverflyConfig.localConfigs().disableTlsVerification().asWebServer().proxyPort(8085);
		hoverfly = new Hoverfly(localConfig, SIMULATE);
		hoverfly.start();
		hoverfly.simulate(simulation);
	}

	@AfterEach
	void tearDown() {
		hoverfly.close();
	}

	@Test
	void testGlobMatcherSuccess() {
		var client = HttpClient.newHttpClient();
		var exactRequest = HttpRequest.newBuilder()
				.uri(URI.create("http://localhost:8085/glob?userName=johnDoe"))
				.build();

		var exactResponse = client.sendAsync(exactRequest, HttpResponse.BodyHandlers.ofString())
				.thenApply(HttpResponse::body)
				.join();

		Assertions.assertEquals("{\"glob\":true}", exactResponse);
	}

	@Test
	void testGlobMatcherFailure() {
		var client = HttpClient.newHttpClient();
		var exactRequest = HttpRequest.newBuilder()
				.uri(URI.create("http://localhost:8085/glob?userName=nojohnDoe"))
				.build();

		var exactResponse = client.sendAsync(exactRequest, HttpResponse.BodyHandlers.ofString())
				.join();

		Assertions.assertEquals(502, exactResponse.statusCode());
	}

}

Last let’s head for the regex matcher. The regex matcher will just check for a capital letter: ([A-Z])\w+

public class RegexMatcherTests {

	private Hoverfly hoverfly;

	@BeforeEach
	void setUp() {
		var simulation = SimulationSource.dsl(service("http://localhost:8085")
				.post("/regex")
				.body(RequestFieldMatcher.newRegexMatcher("([A-Z])\\w+"))
				.willReturn(success("{\"regex\":true}", "application/json")));

		var localConfig = HoverflyConfig.localConfigs().disableTlsVerification().asWebServer().proxyPort(8085);
		hoverfly = new Hoverfly(localConfig, SIMULATE);
		hoverfly.start();
		hoverfly.simulate(simulation);
	}

	@AfterEach
	void tearDown() {
		hoverfly.close();
	}

	@Test
	void testRegexMatcherSuccess() {
		var client = HttpClient.newHttpClient();
		var exactRequest = HttpRequest.newBuilder()
				.uri(URI.create("http://localhost:8085/regex"))
				.POST(HttpRequest.BodyPublishers.ofString("Contains capital letter"))
				.build();

		var exactResponse = client.sendAsync(exactRequest, HttpResponse.BodyHandlers.ofString())
				.thenApply(HttpResponse::body)
				.join();

		Assertions.assertEquals("{\"regex\":true}", exactResponse);
	}

	@Test
	void testRegexMatcherFailure() {
		var client = HttpClient.newHttpClient();
		var exactRequest = HttpRequest.newBuilder()
				.uri(URI.create("http://localhost:8085/regex"))
				.POST(HttpRequest.BodyPublishers.ofString("won't match due to capital letter missing"))
				.build();

		var exactResponse = client.sendAsync(exactRequest, HttpResponse.BodyHandlers.ofString())
				.join();

		Assertions.assertEquals(502, exactResponse.statusCode());
	}

}

That’s it we did use the basic matchers for exact, glob, and regex based. The next blog shall focus on the xml based matchers.

Kafka & Zookeeper for Development: Connecting Brokers to the Ensemble

Previously we created successfully a Zookeeper ensemble, now it’s time to add some Kafka brokers that will connect to the ensemble and we shall execute some commands.

We will pick up from the same docker compose file we compiled previously. First let’s jump on the configuration that a Kafka broker needs.

offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
group.initial.rebalance.delay.ms=0
socket.send.buffer.bytes=102400
delete.topic.enable=true
socket.request.max.bytes=104857600
log.cleaner.enable=true
log.retention.check.interval.ms=300000
log.retention.hours=168
num.io.threads=8
broker.id=0
log4j.opts=-Dlog4j.configuration=file:/etc/kafka/log4j.properties
log.dirs=/var/lib/kafka
auto.create.topics.enable=true
num.network.threads=3
socket.receive.buffer.bytes=102400
log.segment.bytes=1073741824
num.recovery.threads.per.data.dir=1
num.partitions=1
zookeeper.connection.timeout.ms=6000
zookeeper.connect=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181

Will go through the ones that is essential to know.

  • offsets.topic.replication.factor: how the internal offset topic gets replicated – replication factor
  • transaction.state.log.replication.factor: how the internal transaction topic gets replicated – replication factor
  • transaction.state.log.min.isr: the minimum in sync replicas for the internal transaction topic
  • delete.topic.enable: if not true Kafka will ignore the delete topic command
  • socket.request.max.bytes: the maximum size of requests
  • log.retention.check.interval.ms: the interval to evaluate if a log should be deleted
  • log.retention.hours: how many hours a log is retained before getting deleted
  • broker.id: what is the broker id of that installation
  • log.dirs: the directories where Kafka will store the log data, can be a comma separated
  • auto.create.topics.enable: create topics if they don’t exist on sending/consuming messages or asking for topic metadata
  • num.network.threads: threads on receiving requests and sending responses from the network
  • socket.receive.buffer.bytes: buffer of the server socket
  • log.segment.bytes: the size of a log file
  • num.recovery.threads.per.data.dir: threads used for log recovery at startup and flushing at shutdown
  • num.partitions: has to do with the default number of partition a topic will have once created if partition number is not specified.
  • zookeeper.connection.timeout.ms: time needed for a client to establish connection to ZooKeeper
  • zookeeper.connect: is the list of the ZooKeeper servers

Now it’s time to create the properties for each broker. Due to the broker.id property we need to have different files with the corresponding broker.id

So our first’s brokers file would look like this (broker.id 1). Keep in mind that those brokers will run on the same docker-compose file. Therefore the zookeeper.connect property contains the internal docker compose dns names. The name of the file would be named server1.properties.

socket.send.buffer.bytes=102400
delete.topic.enable=true
socket.request.max.bytes=104857600
log.cleaner.enable=true
log.retention.check.interval.ms=300000
log.retention.hours=168
num.io.threads=8
broker.id=1
transaction.state.log.replication.factor=1
log4j.opts=-Dlog4j.configuration\=file\:/etc/kafka/log4j.properties
group.initial.rebalance.delay.ms=0
log.dirs=/var/lib/kafka
auto.create.topics.enable=true
offsets.topic.replication.factor=1
num.network.threads=3
socket.receive.buffer.bytes=102400
log.segment.bytes=1073741824
num.recovery.threads.per.data.dir=1
num.partitions=1
transaction.state.log.min.isr=1
zookeeper.connection.timeout.ms=6000
zookeeper.connect=zookeeper-1\:2181,zookeeper-2\:2181,zookeeper-3\:2181

The same recipe applies for the broker.id=2 as well as broker.id=3

After creating those three broker configuration files it is time to change our docker-compose configuration.

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: "1"
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    environment:
      ZOO_MY_ID: "2"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    environment:
      ZOO_MY_ID: "3"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  kafka-1:
    container_name: kafka-1
    image: confluent/kafka
    ports:
    - "9092:9092"
    volumes:
    - type: bind
      source: ./server1.properties
      target: /etc/kafka/server.properties
  kafka-2:
    container_name: kafka-2
    image: confluent/kafka
    ports:
      - "9093:9092"
    volumes:
      - type: bind
        source: ./server2.properties
        target: /etc/kafka/server.properties
  kafka-3:
    container_name: kafka-3
    image: confluent/kafka
    ports:
      - "9094:9092"
    volumes:
      - type: bind
        source: ./server3.properties
        target: /etc/kafka/server.properties

Let’s spin up the docker-compose file.

> docker-compose -f docker-compose.yaml up

Just like the previous examples we shall run some commands through the containers.

Now that we have a proper cluster with Zookeeper and multiple Kafka brokers it is time to test them working together.
The first action is to create a topic with a replication factor of 3. The expected outcome would be for this topic to be replicated 3 kafka brokers

> docker exec -it kafka-1 /bin/bash
confluent@92a6d381d0db:/$ kafka-topics --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 --create --topic tutorial-topic --replication-factor 3 --partitions 1

Our topic has been created let’s check the description of the topic.

confluent@92a6d381d0db:/$ kafka-topics --describe --topic tutorial-topic --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Topic:tutorial-topic	PartitionCount:1	ReplicationFactor:3	Configs:
	Topic: tutorial-topic	Partition: 0	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3

As we see the Leader for the partition is broker 2

Next step is putting some data to the topic recently created. Before doing so I will add a consumer listening for messages to that topic. While we post messages to the topic those will be printed by this consumer.

> docker exec -it kafka-3 /bin/bash
confluent@4042774f8802:/$ kafka-console-consumer --topic tutorial-topic --from-beginning --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181

Let’s add some topic data.

> docker exec -it kafka-1 /bin/bash
confluent@92a6d381d0db:/$ kafka-console-producer --topic tutorial-topic --broker-list kafka-1:9092,kafka-2:9092
test1
test2
test3

As expected the consumer on the other terminal will print the messages expected.

test1
test2
test3

Due to having a cluster it would be nice to stop the leader broker and see another broker to take the leadership. While doing this the expected results will be to have all the messages replicated and no disruption on consuming and publishing the messages.

Stop the leader which is broker-2

> docker stop kafka-2

Check the leadership from another broker

confluent@92a6d381d0db:/$ kafka-topics --describe --topic tutorial-topic --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Topic:tutorial-topic	PartitionCount:1	ReplicationFactor:3	Configs:
	Topic: tutorial-topic	Partition: 0	Leader: 1	Replicas: 2,1,3	Isr: 1,3

The leader now is kafka-1

Read the messages to see that they did got replicated.

> docker exec -it kafka-3 /bin/bash
confluent@4042774f8802:/$ kafka-console-consumer --topic tutorial-topic --from-beginning --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
test1
test2
test3

As expected apart from the Leadership being in place our data have also been replicated!

If we try to post new messages, it will also be a successful action.

So to summarise we did run a Kafka cluster connected to a zookeeper ensemble. We did create a topic with replication enabled to 3 brokers and last but not least we did test what happens if one broker goes down.

On the next blog we are going to wrap it up so our local machine clients can connect to the docker compose ensemble.

Kafka & Zookeeper for Development: Zookeeper Ensemble

Previously we spun up Zookeeper and Kafka locally but also through Docker. What comes next is spinning up more than just one Kafka and Zookeeper node and create a 3 node cluster. To achieve this the easy way locally docker-compose will be used. Instead of spinning up various instances on the cloud or running various Java processes and altering configs, docker-compose will greatly help us to bootstrap a Zookeeper ensemble and the Kafka brokers, with everything needed preconfigured.

The first step is to create the Zookeeper ensemble but before going there let’s check the ports needed.
Zookeeper needs three ports.

  • 2181 is the client port. On the previous example it was the port our clients used to communicate with the server.
  • 2888 is the peer port. This is the port that zookeeper nodes use to talk to each other.
  • 3888 is the leader port. The port that nodes use to talk to each other when it comes to the leader election.

By using docker compose our nodes will use the same network and the container name will also be an internal dns entry.
The names on the zookeeper nodes would be zookeeper-1, zookeeper-2, zookeeper-3.

Our next goal is to give to each zookeeper node a configuration that will enable the nodes to discover each other.

This is the typical zookeeper configuration expected.

  • tickTime is the unit of time in milliseconds zookeeper uses for heartbeat and minimum session timeout.
  • dataDir is the location where ZooKeeper will store the in-memory database snapshots
  • initlimit and SyncLimit are used for the zookeeper synchronization.
  • server* are a list of the nodes that will have to communicate with each other

zookeeper.properties

clientPort=2181
dataDir=/var/lib/zookeeper
syncLimit=2
DATA.DIR=/var/log/zookeeper
initLimit=5
tickTime=2000
server.1=zookeeper-1:2888:3888
server.2=zookeeper-2:2888:3888
server.3=zookeeper-3:2888:3888

When it comes to a node the server that the node is located, should be bound to `0.0.0.0`. Thus we need three different zookeeper properties per node.

For example for the node with id 1 the file should be the following
zookeeper1.properties

clientPort=2181
dataDir=/var/lib/zookeeper
syncLimit=2
DATA.DIR=/var/log/zookeeper
initLimit=5
tickTime=2000
server.1=0.0.0.0:2888:3888
server.2=zookeeper-2:2888:3888
server.3=zookeeper-3:2888:3888

The next question that arises is the id file of zookeeper. How a zookeeper instance can identify which is its id.

Based on the documentation we need to specify the server ids using the myid file

The myid file is a plain text file located at a nodes dataDir containing only a number the server name.

So three myids files will be created each containing the number of the broker

myid_1.txt

1

myid_2.txt

2

myid_3.txt

3

It’s time to spin up the Zookeeper ensemble. We will use the files specified above. Different client ports are mapped to the host to avoid collision.

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    volumes:
      - type: bind
        source: ./zookeeper1.properties
        target: /conf/zoo.cfg
      - type: bind
        source: ./myid_1.txt
        target: /data/myid
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    volumes:
    - type: bind
      source: ./zookeeper2.properties
      target: /conf/zoo.cfg
    - type: bind
      source: ./myid_2.txt
      target: /data/myid
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    volumes:
      - type: bind
        source: ./zookeeper3.properties
        target: /conf/zoo.cfg
      - type: bind
        source: ./myid_3.txt
        target: /data/myid

Eventually instead of having all those files mounted it would be better if we had a more simple option. Hopefully the image that we use gives us the choice of specifying ids and brokers using environment variables.

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: "1"
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    environment:
      ZOO_MY_ID: "2"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    environment:
      ZOO_MY_ID: "3"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181

Now let’s check the status of our zookeeper nodes.

Let’s use the zookeeper shell to see the leaders and followers

docker exec -it zookeeper-1 /bin/bash
>./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower

And

> docker exec -it zookeeper-3 /bin/bash
root@81c2dc476127:/apache-zookeeper-3.6.2-bin# ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader

So in this tutorial we created a Zookeeper ensemble using docker-compose and we also had a leader election. This recipe can be adapted to apply on the a set of VMs or a container orchestration engine.

On the next tutorial we shall add some Kafka brokers to connect to the ensemble.

Kafka & Zookeeper for Development: Local and Docker

Kafka popularity increases every day more and more as it takes over the streaming world. It is already provided out of the box on cloud providers like AWS, Azure and IBM Cloud.
Eventually for cases of local development it is a bit peculiar due to requiring various moving parts.

This blog will focus on making it easy for a developer to spin up some Kafka instances on a local machine without having to spin up VMs on the cloud.

We shall start with the usual Zookeeper and Kafka configuration. The example bellow will fetch a specific version so after some time is good to check the Apache Website.

> wget https://www.mirrorservice.org/sites/ftp.apache.org/kafka/2.6.0/kafka_2.13-2.6.0.tgz
> tar xvf kafka_2.13-2.6.0.tgz
> cd kafka_2.13-2.6.0

We just downloaded Kafka locally and now is the time to Spin up Kafka.

First we should spin up the Zookeeper

> ./bin/zookeeper-server-start.sh config/zookeeper.properties

Then spin up the Kafka instance

> ./bin/kafka-server-start.sh config/server.properties

As you see we only spun up one instance of Kafka & Zookeeper. This is way different from what we do in production where ZooKeeper servers should be deployed on multiple nodes. More specific 2n + 1 ZooKeeper servers where n > 0 need to be deployed. This number helps the ZooKeeper ensemble to perform majority elections for leadership.

In our case for local development one Kafka broker and one Zookeeper instances are enough in order to create and consume a topic.

Let’s push some messages to a topic. There is no need to create the topic, pushing a message will create it.

bin/kafka-console-producer.sh --topic tutorial-topic --bootstrap-server localhost:9092
>a
>b
>c

Then let’s read it. Pay attention to the –from-beginning flag, all the messages submitted from the beginning shall be read.

bin/kafka-console-consumer.sh --topic tutorial-topic --from-beginning --bootstrap-server localhost:9092
>a
>b
>c

Now let’s try and do this using docker. The advantage of docker is that we can run Kafka on a local docker network and add as many machines as needed and establish a Zookeeper ensemble the easy way.

Start zookeeper first

docker run --rm --name zookeeper -p 2181:2181 confluent/zookeeper

And then start your docker container after doing a link with the zookeeper container.

docker run --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper confluent/kafka

Let’s create the messages through docker. As with most docker images you already have the tools needed bundled inside the image.
So the publish command would very close to the command we executed previously.

> docker exec -it kafka /bin/bash
kafka-console-producer --topic tutorial-topic --broker-list localhost:9092
a
b
c

The same applies for the consume command.

> docker exec -it kafka /bin/bash
kafka-console-consumer --topic tutorial-topic --from-beginning --zookeeper zookeeper:2181
a
b
c

That’s it! We Just run Kafka locally for local development seamlessly!

Java Based Akka application Part 2: Adding tests

On the previous blog we focused on spinning up our first Akka project.
Now it’s time to add a test for our codebase.

First thing to get started is adding the right dependencies to the existing project.

	<dependencies>
		<dependency>
			<groupId>com.typesafe.akka</groupId>
			<artifactId>akka-actor-typed_2.13</artifactId>
			<version>${akka.version}</version>
		</dependency>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-classic</artifactId>
			<version>1.2.3</version>
		</dependency>
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
			<version>1.18.16</version>
			<scope>provided</scope>
		</dependency>

		<!-- Test -->
		<dependency>
			<groupId>com.typesafe.akka</groupId>
			<artifactId>akka-actor-testkit-typed_2.13</artifactId>
			<version>${akka.version}</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.13.1</version>
			<scope>test</scope>
		</dependency>
	</dependencies>

What you shall notice is the usage of Junit 4, instead of Junit 5. Some of the testing utils like TestKitJunitResource need annotations like @ClassRule and are bound to Junit4. Obviously this is not a blocker on using JUnit 5, with some tweaks it is feasible to use the tools your project needs. However in this example Junit 4 shall be used.

Before we write the test we need to think of our code.
It is obvious that we sent a message to our actor in a fire and forget fashion.

	private Behavior<GuardianMessage> receiveMessage(MessageToGuardian messageToGuardian) {
		getContext().getLog().info("Message received: {}",messageToGuardian.getMessage());
		return this;
	}

If you don’t have a way to intercept what happens inside the method your options are limited. In those cases you can utilise the log messages and actually expect log events to happen.

Before we add the unit test we need to make some logback adjustments. This will take effect only on our test logback.xml. More specific we need to have an appender on logback that captured the data. This is the CapturingAppender.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>INFO</level>
        </filter>
        <encoder>
            <pattern>[%date{ISO8601}] [%level] [%logger] [%marker] [%thread] - %msg MDC: {%mdc}%n</pattern>
        </encoder>
    </appender>

    <!-- Logging from tests are silenced by this appender. When there is a test failure the captured logging events are flushed to the appenders defined for the akka.actor.testkit.typed.internal.CapturingAppenderDelegate logger. -->
    <appender name="CapturingAppender" class="akka.actor.testkit.typed.internal.CapturingAppender" />

    <!-- The appenders defined for this CapturingAppenderDelegate logger are used when there is a test failure and all logging events from the test are flushed to these appenders. -->
    <logger name="akka.actor.testkit.typed.internal.CapturingAppenderDelegate" >
      <appender-ref ref="STDOUT"/>
    </logger>

    <root level="DEBUG">
        <appender-ref ref="CapturingAppender"/>
    </root>
</configuration>

Now it’s time to add the unit test.

package com.gkatzioura;

import akka.actor.testkit.typed.javadsl.LogCapturing;
import akka.actor.testkit.typed.javadsl.LoggingTestKit;
import akka.actor.testkit.typed.javadsl.TestKitJunitResource;
import akka.actor.testkit.typed.javadsl.TestProbe;
import akka.actor.typed.ActorRef;
import org.junit.ClassRule;
import org.junit.Rule;
import org.junit.Test;

public class AppGuardianTests {

	@ClassRule
	public static final TestKitJunitResource testKit = new TestKitJunitResource();

	@Rule
	public final LogCapturing logCapturing = new LogCapturing();

	@Test
	public void testReceiveMessage() {
		ActorRef<AppGuardian.GuardianMessage> underTest = testKit.spawn(AppGuardian.create(), "app-guardian");

		LoggingTestKit.info("Message received: hello")
				.expect(
						testKit.system(),
						() -> {
							underTest.tell(new AppGuardian.MessageToGuardian("hello"));
							return null;
						});
	}

}

Once we run the test the expected outcome is to pass. The actor did receive the message, did execute a logging action and this was captured by the CapturingAppender. Then the logging event was validated if it was the expected one. In case of exception probably you need to check if the logback.xml took effect.

As always you can find the source code on github.

Java Based Akka application Part 1: Your base Project

Akka is a free, open-source toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. Along with Akka you have akka-streams  a module that makes the ingestion and processing of streams easy  and Alpakka, a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.

On this blog I shall focus on creating an Akka project using Java as well as packaging it.

 

You already know that Akka is built on Scala, thus why Java and no Scala? There are various reasons to go for Java.

  • Akka is a toolkit running on the JVM so you don’t have to be proficient with Scala to use it.
  • You might have a team already proficient with Java but not in Scala.
  • It’s much easier to evaluate if you already have a codebase on Java and the various build tools (maven etc)

Will shall go for the simple route and Download the Application from lightbend quickstart. The project received, will be backed with typed actors.

After some adaption the maven file would look like this, take note that we shall use lombok .

<project>
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.gkatzioura</groupId>
    <artifactId>akka-java-app</artifactId>
    <version>1.0</version>

    <properties>
      <akka.version>2.6.10</akka.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.typesafe.akka</groupId>
            <artifactId>akka-actor-typed_2.13</artifactId>
            <version>${akka.version}</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.2.3</version>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.16</version>
            <scope>provided</scope>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
                <configuration>
                    <source>11</source>
                    <target>11</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>1.6.0</version>
                <configuration>
                    <executable>java</executable>
                    <arguments>
                        <argument>-classpath</argument>
                        <classpath />
                        <argument>com.gkatzioura.Application</argument>
                    </arguments>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Now there is one Actor that is responsible for managing your other actors. This is the top level actor called Guardian Actor. It is created along with the ActorSystem and when it stops the ActorSystem will stop too.

In order to create an actor you define the message the actor will receive and you specify why it will behave to those messages.

package com.gkatzioura;

import akka.actor.typed.Behavior;
import akka.actor.typed.javadsl.AbstractBehavior;
import akka.actor.typed.javadsl.ActorContext;
import akka.actor.typed.javadsl.Behaviors;
import akka.actor.typed.javadsl.Receive;
import lombok.AllArgsConstructor;
import lombok.Getter;

public class AppGuardian extends AbstractBehavior<AppGuardian.GuardianMessage> {

	public interface GuardianMessage {}

	static Behavior<GuardianMessage> create() {
		return Behaviors.setup(AppGuardian::new);
	}

	@Getter
	@AllArgsConstructor
	public static class MessageToGuardian implements GuardianMessage {
		private String message;
	}

	private AppGuardian(ActorContext<GuardianMessage> context) {
		super(context);
	}

	@Override
	public Receive<GuardianMessage> createReceive() {
		return newReceiveBuilder().onMessage(MessageToGuardian.class, this::receiveMessage).build();
	}

	private Behavior<GuardianMessage> receiveMessage(MessageToGuardian messageToGuardian) {
		getContext().getLog().info("Message received: {}",messageToGuardian.getMessage());
		return this;
	}

}

Akka is message driven so the guardian actor should be able to consume messages send to it. Therefore messages that implement the GuardianMessage interface are going to be processed.

By creating the actor the createReceive method is used in order to add handling of the messages that the actor should handle.

Be aware that when it comes to logging instead of spinning up a logger in the class use the
getContext().getLog()

Behind the scenes the log messages will have the path of the actor automatically added as akkaSource Mapped Diagnostic Context (MDC) value.

Last step would be to add the Main class.

package com.gkatzioura;

import java.io.IOException;

import akka.actor.typed.ActorSystem;
import lombok.extern.slf4j.Slf4j;

@Slf4j
public class Application {

	public static final String APP_NAME = "akka-java-app";

	public static void main(String[] args) {
		final ActorSystem<AppGuardian.GuardianMessage> appGuardian = ActorSystem.create(AppGuardian.create(), APP_NAME);
		appGuardian.tell(new AppGuardian.MessageToGuardian("First Akka Java App"));

		try {
			System.out.println(">>> Press ENTER to exit <<<");
			System.in.read();
		}
		catch (IOException ignored) {
		}
		finally {
			appGuardian.terminate();
		}
	}

}

The expected outcome is to have our Guardian actor to print the message submitted. By pressing enter the Akka application will terminate through the guardian actor.
On the next blog we will go one step further and add a unit test that validates the message received.
As always you can find the source code on github.

Run a docker PostgreSQL instance locally for Testing

Running a PostgreSQL instance ad-hoc for testing is not always as bootstraping as it should be. This blog will run a PostgreSQL instance that connects to your workstation’s network and instead of using one of the popular tools like dbeaver we shall use the client that comes with the instance. Also we shall run a bootstrap script to have some data pre-inserted.

Let’s get started by running the instance. On purpose I will use another port. On scenarios of multiple instances running in your workstation, port collisions are likely. The workaround would be to choose port 5433.

docker run --rm --name test-instance -e POSTGRES_PASSWORD=password -p 5433:5432 postgres

This will run PostgreSQL and you shall be able to connect to port 5433. On a CTRL-C the instance will be stopped and destroyed.

Now instead of using an external tool to connect let’s use the instance itself, it comes with psql pre-installed.

docker exec -it test-instance /bin/bash
> psql postgres postgres
postgres=# \h
Available help:
  ABORT                            ALTER TRIGGER                    CREATE RULE                      DROP GROUP                       LISTEN
  ALTER AGGREGATE                  ALTER TYPE                       CREATE SCHEMA                    DROP INDEX                       LOAD
  ALTER COLLATION                  ALTER USER                       CREATE SEQUENCE                  DROP LANGUAGE                    LOCK
.....
postgres=# \q

The instance works and connections from the outside are possible.

Next step would be to bootstrap a db initialization script.

#!/bin/bash
set -e
 
psql -v ON_ERROR_STOP=1 --username postgres --dbname postgres <<-EOSQL
    create schema test_schema;
 
    create table test_schema.employee(
        id  SERIAL PRIMARY KEY,
        firstname   TEXT    NOT NULL,
        lastname    TEXT    NOT NULL,
        email       TEXT    not null,
        age         INT     NOT NULL,
        salary         real,
        unique(email)
    );
 
    insert into test_schema.employee (firstname,lastname,email,age,salary)
    values ('John','Doe 1','john1@doe.com',18,1234.23);

EOSQL

Supposing the file with the script is called init_db.sh

Let’s run the command with the initialization schema mounted.

docker run --rm --name test-instance -v /path/to/init_db.sh:/docker-entrypoint-initdb.d/init-db-script.sh -e POSTGRES_PASSWORD=password -p 5433:5432 postgres

And let’s check the results.

docker exec -it test-instance /bin/bash
>psql postgres postgres
postgres=# SELECT*FROM test_schema.employee;
 id | firstname | lastname |     email     | age | salary
----+-----------+----------+---------------+-----+---------
  1 | John      | Doe 1    | john1@doe.com |  18 | 1234.23
(1 row)

That’s it! You created a Postgresql database through docker, you did connect to it also you added a bootstrap script with data.

Locking for multiple nodes the easy way: GCS

It happens to all of us. We develop stateless applications that can scale horizontally without any effort.
However sometimes cases arise where you need to achieve some type of coordination.

You can go really advanced on this one. For example you can use a framework like Akka and it’s cluster capabilities. Or you can go really simple like rolling a mechanism on your own as long as it gives you the results needed. On another note you can just have different node groups based on the work you need them to do. The options and the solutions can change based on the problem.

If your problem can go with a simple option, one way to do so , provided you use Google Cloud Storage, is to use its lock capabilities.
Imagine for example a scenario of 4 nodes, they do scale dynamically but each time a new node registers you want to change its actions by acquiring a unique configuration which does not collide with a configuration another node might have received.

The strategy can be to use a file on Google Cloud Storage for locking and a file that acts as a centralised configuration registry.

The lock file is nothing more that a file on cloud storage which shall be created and deleted. What will give us lock abilities is the option on GCS to create a file only if it not exists.
Thus a process from one node will try to create the `lock` file, this action would be equivalent to obtaining the lock.
Once the process is done will delete the file, this action would be equivalent to releasing the lock.
Other processes in the meantime will try to create the file (acquire the lock) and fail (file already exists) because other processes have created the file.
Meanwhile the process that has successfully created the file (acquired the lock) will change the centralised configuration registry and once done will delete the file (release the lock).

So let’s start with the lock object.

package com.gkatzioura.gcs.lock;

import java.util.Optional;

import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobInfo;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageException;

public class GCSLock {

	public static final String LOCK_STRING = "_lock";
	private final Storage storage;
	private final String bucket;
	private final String keyName;

	private Optional<Blob> acquired = Optional.empty();

	GCSLock(Storage storage, String bucket, String keyName) {
		this.storage = storage;
		this.bucket = bucket;
		this.keyName = keyName;
	}

	public boolean acquire() {
		try {
			var blobInfo = BlobInfo.newBuilder(bucket, keyName).build();
			var blob = storage.create(blobInfo, LOCK_STRING.getBytes(), Storage.BlobTargetOption.doesNotExist());
			acquired = Optional.of(blob);
			return true;
		} catch (StorageException storageException) {
			return false;
		}
	}

	public void release() {
		if(!acquired.isPresent()) {
			throw new IllegalStateException("Lock was never acquired");
		}
		storage.delete(acquired.get().getBlobId());
	}

}

As you can see the write specifies to write an object only if it does not exist. This operation behind the scenes is using the x-goog-if-generation-match header which is used for concurrency.
Thus one node will be able to acquire the lock and change the configuration files.
Afterwards it can delete the lock. If an exception is raised probably the operation fails and the lock is already acquired.

To make the example more complete let’s make the configuration file. The configuration file would be a simple json file for key map actions.

package com.gkatzioura.gcs.lock;

import java.util.HashMap;
import java.util.Map;

import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.BlobInfo;
import com.google.cloud.storage.Storage;
import org.json.JSONObject;

public class GCSConfiguration {

	private final Storage storage;
	private final String bucket;
	private final String keyName;

	GCSConfiguration(Storage storage, String bucket, String keyName) {
		this.storage = storage;
		this.bucket = bucket;
		this.keyName = keyName;
	}

	public void addProperty(String key, String value) {
		var blobId = BlobId.of(bucket, keyName);
		var blob = storage.get(blobId);

		final JSONObject configJson;

		if(blob==null) {
			configJson = new JSONObject();
		} else {
			configJson = new JSONObject(new String(blob.getContent()));
		}

		configJson.put(key, value);

		var blobInfo = BlobInfo.newBuilder(blobId).build();
		storage.create(blobInfo, configJson.toString().getBytes());
	}

	public Map<String,String> properties() {

		var blobId = BlobId.of(bucket, keyName);
		var blob = storage.get(blobId);

		var map = new HashMap<String,String>();

		if(blob!=null) {
			var jsonObject = new JSONObject(new String(blob.getContent()));
			for(var key: jsonObject.keySet()) {
				map.put(key, jsonObject.getString(key));
			}
		}

		return map;
	}

}

It is a simple config util backed by GCS. Eventually it can be changed and put the lock operating inside the addProperty operation, it’s up to the user and the code. For the purpose of this blog we shall just acquire the lock change the configuration and release the lock.
Our main class will look like this.

package com.gkatzioura.gcs.lock;

import com.google.cloud.storage.StorageOptions;

public class Application {

	public static void main(String[] args) {
		var storage = StorageOptions.getDefaultInstance().getService();

		final String bucketName = "bucketName";
		final String lockFileName = "lockFileName";
		final String configFileName = "configFileName";

		var lock = new GCSLock(storage, bucketName, lockFileName);
		var gcsConfig = new GCSConfiguration(storage, bucketName, configFileName);

		var lockAcquired = lock.acquire();
		if(lockAcquired) {
			gcsConfig.addProperty("testProperty", "testValue");
			lock.release();
		}

		var config = gcsConfig.properties();

		for(var key: config.keySet()) {
			System.out.println("Key "+key+" value "+config.get(key));
		}

	}

}

Now let’s go for some multithreading. Ten threads will try to put values, it is expected that they have some failure.

package com.gkatzioura.gcs.lock;

import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;

public class ApplicationConcurrent {

	private static final String bucketName = "bucketName";
	private static final String lockFileName = "lockFileName";
	private static final String configFileName = "configFileName";

	public static void main(String[] args) throws ExecutionException, InterruptedException {
		var storage = StorageOptions.getDefaultInstance().getService();

		final int threads = 10;
		var service = Executors.newFixedThreadPool(threads);
		var futures = new ArrayList<Future>(threads);

		for (var i = 0; i < threads; i++) {
			futures.add(service.submit(update(storage, "property-"+i, "value-"+i)));
		}

		for (var f : futures) {
			f.get();
		}

		service.shutdown();

		var gcsConfig = new GCSConfiguration(storage, bucketName, configFileName);
		var properties = gcsConfig.properties();

		for(var i=0; i < threads; i++) { System.out.println(properties.get("property-"+i)); } } private static Runnable update(final Storage storage, String property, String value) { return () -> {
			var lock = new GCSLock(storage, bucketName, lockFileName);
			var gcsConfig = new GCSConfiguration(storage, bucketName, configFileName);

			boolean lockAcquired = false;

			while (!lockAcquired) {
				lockAcquired = lock.acquire();
				System.out.println("Could not acquire lock");
			}

			gcsConfig.addProperty(property, value);
			lock.release();
		};
	}
}

Obviously 10 threads are ok to display the capabilities. During the thread initialization and execution some threads will try to acquire the lock simultaneously and one will fails, while other threads will be late and will fail and wait until the lock is available.

In the end what is expected is all of them to have their values added to the configuration.
That’s it. If your problems have a simple nature this approach might do the trick. Obviously you can use the http api instead of the sdk. You can find the code on github.