PostgreSQL BiDirectional Replication

As you can understand from my previous blogs I am really into PostgreSQL.

Previously we run Debezium in Embedded mode. Behind the scenes Debezium consumes the changes that were committed to the transaction log. This happens by utilising the logical decoding feature of PostgreSQL.

In this blog we shall focus on replication and more specific bidirectional replication. To achieve bidirectional replication in PostgreSQL we need the module pglogical. You might wonder the deference between logical decoding and pglogical. Essentially logical decoding has its origins from PgLocigal. View PgLocial as a more featureful module while logical decoding is embedded to a PostgreSQL distribution.

We will create a custom PostgreSQL Docker image and install PgLogical.

# Use the official PostgreSQL image as base
FROM postgres:15
USER root
RUN apt-get update; apt-get install postgresql-15-pglogical -y
USER postgres

Also we need to have a PostgreSQL configuration that will enable PgLogical replication and the conflict resolution.

listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level = logical
max_wal_senders = 3
track_commit_timestamp = on
shared_preload_libraries = 'pglogical'
pglogical.conflict_resolution = 'first_update_wins'

Let’s break this down. We added pglogical and we enabled track_commit_timestamp. By enabling this parameter PostgreSQL tracks the commit time of transactions. This will be crucial for the conflict resolution strategy.
Now let’s see the conflict resolution. We selected ‘first_update_wins’, therefore in case of two transactions operating on the same row, the transaction that finished first will be the one to be considered.

Bidirectional replication is setup upon a table. Since we use Docker we shall provide an initialization script to PostgreSQL.

The script will:

  • Enable pglogical
  • Create the table
  • Add a target node
  • Insert the row we shall run tests upon
#!/bin/bash
set -e

psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
  ALTER SYSTEM RESET shared_preload_libraries;
  CREATE EXTENSION pglogical;

  create schema test_schema;
  create table test_schema.employee(
          id  SERIAL PRIMARY KEY,
          firstname   TEXT    NOT NULL,
          lastname    TEXT    NOT NULL,
          email       TEXT    not null,
          age         INT     NOT NULL,
          salary         real,
          unique(email)
      );

  SELECT pglogical.create_node(
      node_name := '$TARGET',
      dsn := 'host=$TARGET port=5432 dbname=$POSTGRES_DB user=$POSTGRES_USER password=$POSTGRES_PASSWORD');

  SELECT pglogical.replication_set_add_table('default', 'test_schema.employee', true);

  insert into test_schema.employee (id,firstname,lastname,email,age,salary) values (1,'John','Doe 1','john1@doe.com',18,1234.23);


EOSQL

Let’s create the instances now using docker compose.

version: '3.1'

services:
  postgres-a:
    build: ./pglogicalimage
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      TARGET: postgres-b
    volumes:
      - ./config/postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5431:5432
  postgres-b:
    build: ./pglogicalimage
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      TARGET: postgres-a
    volumes:
      - ./config/postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5432:5432

We can get our instances up and running by issuing

docker compose up

Docker Compose V2 is out there with many good features, you can find more about it on the book I authored:
A Developer’s Essential Guide to Docker Compose
.

Since both instances are up and running we need to enable the replication. Therefore we shall subscribe the nodes to each other.

Execute on the first node

SELECT pglogical.create_subscription(
  subscription_name := 'postgres_b',
  provider_dsn := 'host=postgres-b port=5432 dbname=postgres user=postgres password=postgres',
  synchronize_data := false,
  forward_origins := '{}' );

Execute at the second node

SELECT pglogical.create_subscription(
  subscription_name := 'postgres_a',
  provider_dsn := 'host=postgres-a port=5432 dbname=postgres user=postgres password=postgres',
  synchronize_data := false,
  forward_origins := '{}' );

You can use any PostgreSQL client that suits you. Alternatively you can just use the psql client that comes packaged with the Docker Images.
For example:

Login to the first node

docker compose exec postgres-a psql  --username postgres --dbname postgres

Login to the second node

docker compose exec postgres-b psql  --username postgres --dbname postgres

Let’s see how conflict resolution will work now.

On the first node we shall run the following snippet

BEGIN;
UPDATE test_schema.employee SET lastname='first wins';

#before committing start transaction on postgres-b

COMMIT;

Don’t press commit immediately, instead take the time and before you commit the transaction start the following transaction on the second node.

BEGIN;
UPDATE test_schema.employee SET lastname='second looses';

#make sure transaction on node postgres-a is committed first.

COMMIT;

This transaction will be committed after the transaction that takes places in postgres-a.

Let’s check the logs on postgres-a-1

postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] LOG:  CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: keep_local.
postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] DETAIL:  existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=748,origin=0,timestamp=2024-05-01 07:10:42.269227+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:45.125791+00,commit_lsn=0/16181C0
postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] CONTEXT:  apply UPDATE from remote relation test_schema.employee in commit before 0/16181C0, xid 747 committed at 2024-05-01 07:10:45.125791+00 (action #2) from node replorigin 1

The transaction that took place on postgres-a finished first. Postgres-a received the replication data from the transaction of node postgres-b. A comparison was issued on the commit timestamp, because the commit timestamp of the transaction on postgres-a was earlier the resolution was to keep the local changes.

We can see the reverse on postgres-b


postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] LOG: CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: apply_remote.
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] DETAIL: existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=747,origin=0,timestamp=2024-05-01 07:10:45.125791+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:42.269227+00,commit_lsn=0/1618488
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] CONTEXT: apply UPDATE from remote relation test_schema.employee in commit before 0/1618488, xid 748 committed at 2024-05-01 07:10:42.269227+00 (action #2) from node replorigin 1

Let’s check the result in the database.

postgres=# SELECT*FROM test_schema.employee;
 id | firstname |  lastname  |     email     | age | salary  
----+-----------+------------+---------------+-----+---------
  1 | John      | first wins | john1@doe.com |  18 | 1234.23

As expected the first transaction is the one that stayed.
To wrap it up:

  • We started two transactions in parallel
  • We changed the same row
  • We accepted the changes of the transaction that finished first

That’s it. Hope you had some fun and now you have another tool for y0ur needs.

Debezium in Embedded mode

In a previous blog we setup a Debezium server reading events from a from a PostgresQL database. Then we streamed those changes to a Redis instance through a Redis stream.

We might get the impression that in order to run Debezium we need to have two extra components running in our infrastructure:

  • A standalone Debezium server instance
  • A software component with streaming capabilities and various integrations, such as Redis or Kafka

This is not always the case since Debezium can run in embedded mode. By running in embedded mode you use Debezium in order to read directly from a database’s transaction log. It is up to you how you are gonna handle the entries retrieved. The process reading the entries from the transaction log can reside on any Java application thus there is no need for a standalone deployment.

Apart from the number of components reduced, the other benefit is that we can alter the entries as we read them from the database and take action in our application. Sometimes we might just need a subset of the capabilities offered.

Let’s use the same PotsgreSQL configurations we used previously

listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level = logical
max_wal_senders = 3

Also we shall create an initialization script for the table we want to focus

#!/bin/bash
set -e
 
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
  create schema test_schema;
  create table test_schema.employee(
          id  SERIAL PRIMARY KEY,
          firstname   TEXT    NOT NULL,
          lastname    TEXT    NOT NULL,
          email       TEXT    not null,
          age         INT     NOT NULL,
          salary         real,
          unique(email)
      );
EOSQL

Our Docker Compose file will look like this

version: '3.1'
 
services:

  postgres:
    image: postgres
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - ./postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5432:5432

The configuration files we created are mounted to the PostgreSQL Docker container. Docker Compose V2 is out there with many good features, you can find more about it on the book I authored:
A Developer’s Essential Guide to Docker Compose
.

Provided we run docker compose up, a postgresql server with a schema and a table will be up and running. Also that server will have logical decoding enabled and Debezium shall be able to track changes on that table through the transaction log.
We have everything needed to proceed on building our application.

First let’s add the dependencies needed:

 
    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <version.debezium>2.3.1.Final</version.debezium>
        <logback-core.version>1.4.12</logback-core.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>io.debezium</groupId>
            <artifactId>debezium-api</artifactId>
            <version>${version.debezium}</version>
        </dependency>
        <dependency>
            <groupId>io.debezium</groupId>
            <artifactId>debezium-embedded</artifactId>
            <version>${version.debezium}</version>
        </dependency>
        <dependency>
            <groupId>io.debezium</groupId>
            <artifactId>debezium-connector-postgres</artifactId>
            <version>${version.debezium}</version>
        </dependency>
        <dependency>
            <groupId>io.debezium</groupId>
            <artifactId>debezium-storage-jdbc</artifactId>
            <version>${version.debezium}</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>${logback-core.version}</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-core</artifactId>
            <version>${logback-core.version}</version>
        </dependency>
    </dependencies>

We also need to create the Debezium embedded properties:

name=embedded-debezium-connector
connector.class=io.debezium.connector.postgresql.PostgresConnector
offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
offset.flush.interval.ms=60000
database.hostname=127.0.0.1
database.port=5432
database.user=postgres
database.password=postgres
database.dbname=postgres
database.server.name==embedded-debezium
debezium.source.plugin.name=pgoutput
plugin.name=pgoutput
database.server.id=1234
topic.prefix=embedded-debezium
schema.include.list=test_schema
table.include.list=test_schema.employee

Apart from establishing the connection towards the PostgresQL Database we also decided to store the offset in a file. By using the offset in Debezium we keep track of the progress we do on processing the events.

On each change that happens on the table test_schema.employee we shall receive an event. Once we receive that event our codebase should handle it.
To handle the events we need to create a DebeziumEngine.ChangeConsumer. The ChangeConsumer will consume the events emitted.

package com.egkatzioura;

import io.debezium.engine.DebeziumEngine;
import io.debezium.engine.RecordChangeEvent;
import org.apache.kafka.connect.source.SourceRecord;

import java.util.List;

public class CustomChangeConsumer implements DebeziumEngine.ChangeConsumer<RecordChangeEvent<SourceRecord>> {

    @Override
    public void handleBatch(List<RecordChangeEvent<SourceRecord>> records, DebeziumEngine.RecordCommitter<RecordChangeEvent<SourceRecord>> committer) throws InterruptedException {
        for(RecordChangeEvent<SourceRecord> record: records) {
            System.out.println(record.record().toString());
        }
    }

}

Every incoming event will be printed on the console.

Now we can add our main class where we setup the engine.

package com.egkatzioura;

import io.debezium.embedded.Connect;
import io.debezium.engine.DebeziumEngine;
import io.debezium.engine.format.ChangeEventFormat;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

public class Application {

    public static void main(String[] args) throws IOException {
        Properties properties = new Properties();

        try(final InputStream stream = Application.class.getClassLoader().getResourceAsStream("embedded_debezium.properties")) {
            properties.load(stream);
        }
        properties.put("offset.storage.file.filename",new File("offset.dat").getAbsolutePath());

        var engine = DebeziumEngine.create(ChangeEventFormat.of(Connect.class))
                .using(properties)
                .notifying(new CustomChangeConsumer())
                .build();
        engine.run();

    }

}

Provided our application is running as well as the PostgresQL database we configured previously, we can start inserting data

docker exec -it debezium-embedded-postgres-1 psql postgres postgres
psql (15.3 (Debian 15.3-1.pgdg120+1))
Type "help" for help.

postgres=# insert into test_schema.employee (firstname,lastname,email,age,salary) values ('John','Doe 1','john1@doe.com',18,1234.23);

Also we can see the change on the console

SourceRecord{sourcePartition={server=embedded-debezium}, sourceOffset={last_snapshot_record=true, lsn=22518160, txId=743, ts_usec=1705916606794160, snapshot=true}} ConnectRecord{topic='embedded-debezium.test_schema.employee', kafkaPartition=null, key=Struct{id=1}, keySchema=Schema{embedded-debezium.test_schema.employee.Key:STRUCT}, value=Struct{after=Struct{id=1,firstname=John,lastname=Doe 1,email=john1@doe.com,age=18,salary=1234.23},source=Struct{version=2.3.1.Final,connector=postgresql,name=embedded-debezium,ts_ms=1705916606794,snapshot=last,db=postgres,sequence=[null,"22518160"],schema=test_schema,table=employee,txId=743,lsn=22518160},op=r,ts_ms=1705916606890}, valueSchema=Schema{embedded-debezium.test_schema.employee.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}

We did it. We managed to run Debezium through a Java application without the need of a standalone Debezium server running or a streaming component. You can find the code on GitHub.

Add ZipKin to your Spring application

If your application contains multiple services interacting with each other the need for distributed tracing is increasing. You have a call towards one application that also calls another application, in certain cases the application to be accessed next might be a different one. You need to trace the request end to end and identify what happened to the call.
Zipkin is a Distributed Tracing system. Essentially by using Zipkin on our system we can track how a call spans across various Microservices.

ZipKin comes with Various database options. In our case we shall use Elasticsearch.

We will setup out ZipKin server using Compose

Let’s start with our Compose file:

services:
  redis:
    image: redis
    ports:
      - 6379:6379
  elasticsearch:
    image: elasticsearch:7.17.7
    ports:
      - 9200:9200
      - 9300:9300
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9200/_cat/health"]
      interval: 20s
      timeout: 10s
      retries: 5
      start_period: 5s
    environment:
      - discovery.type=single-node
    restart: always
  zipkin:
    image: openzipkin/zipkin-slim
    ports:
      - 9411:9411
    environment:
      - STORAGE_TYPE=elasticsearch
      - ES_HOSTS=http://elasticsearch:9200
      - JAVA_OPTS=-Xms1G -Xmx1G -XX:+ExitOnOutOfMemoryError
    depends_on:
      - elasticsearch
    restart: always

We can run the above using

docker compose up

You can find more on Compose on the Developers Essential Guide to Docker Compose.

Let’s build our applications, our applications will be servlet based

We shall use a service for locations, this service essentially will persist locations on a Redis database using the GeoHash data structure.
You can find the service implementation on a previous blog.

We would like to add some extra dependencies so that Zipkin integration is possible.

...
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>2021.0.5</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>
...
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-sleuth</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-sleuth-zipkin</artifactId>
        </dependency>
...

Spring Sleuth provides distributed tracing to our Spring application. By using spring Sleuth tracing data are generated. In case of a servlet filter or a rest template tracing data will also be generated. Provided the Zipkin binary is included the data generated will be dispatched to the Zipkin collector specified using
spring.zipkin.baseUrl.

Let’s also make our entry point application. This application will execute requests towards the location service we implemented previously.
The dependencies will be the following.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <groupId>org.example</groupId>
    <version>1.0-SNAPSHOT</version>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>european-venue</artifactId>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>2021.0.5</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>io.projectreactor</groupId>
            <artifactId>reactor-core</artifactId>
            <version>3.5.0</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.24</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-sleuth</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-sleuth-zipkin</artifactId>
        </dependency>
    </dependencies>

</project>

We shall create a service interacting with the location service.

The location model shall be the same:

package org.landing;

import lombok.Data;

@Data
public class Location {

    private String name;
    private Double lat;
    private Double lng;

}

And the service:

package org.landing;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;


@Service
public class LocationService {

    @Autowired
    private RestTemplate restTemplate;

    @Value("${location.endpoint}")
    private String locationEndpoint;

    public void checkIn(Location location) {
        restTemplate.postForLocation(locationEndpoint, location);
    }

}

Following we will add the controller:

package org.landing;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

import lombok.AllArgsConstructor;

@RestController
@AllArgsConstructor
public class CheckinController {

    private final LocationService locationService;

    @PostMapping("/checkIn")
    public ResponseEntity<String> checkIn(@RequestBody Location location) {
        locationService.checkIn(location);
        return ResponseEntity.ok("Success");
    }

}

We need also RestTemplate to be configured:

package org.landing;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;

@Configuration
public class RestTemplateConfiguration {

    @Bean
    public RestTemplate restTemplate() {
        return new RestTemplate();
    }

}

Last but not least the main method:

package org.location;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class LocationApplication {

    public static void main(String[] args) {
        SpringApplication.run(LocationApplication.class);
    }

}

Now that everything is up and running let’s put this into action

curl --location --request POST 'localhost:8081/checkIn/' \
--header 'Content-Type: application/json' \
--data-raw '{
	"name":"Liverpool Street",
	"lat": 51.517336,
	"lng": -0.082966
}'
> Success

Let’s navigate now to the Zipkin Dashboard and search traces for the european-venue.
By expanding the calls we shall end up to a call like this.

Essentially we have an end to end tracing for our applications. We can see the entry point which is the european-venue checkin endpoint.
Also because the location service is called we have data points for the call received. However we also see data points for the call towards the redis database.
Essentially by adding sleuth to our application, beans that are ingress and egress points are being wrapped so that trace data can be reported.

You can find the code on GitHub.

Use Redis GeoHash with Spring boot

One very handy Data Structure when it comes to Redis is the GeoHash Data structure. Essentially it is a sorted set that generates a score based on the longitude and latitude.

We will spin up a Redis database using Compose

services:
  redis:
    image: redis
    ports:
      - 6379:6379

Can be run like this

docker compose up

You can find more on Compose on the Developers Essential Guide to Docker Compose.

Let’s add our dependencies

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <groupId>org.example</groupId>
    <version>1.0-SNAPSHOT</version>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>location-service</artifactId>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.24</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
            <version>2.7.5</version>
        </dependency>
    </dependencies>
</project>

We shall start with our Configuration. For convenience on injecting we shall create a GeoOperations<String,String> bean.

package org.location;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.core.GeoOperations;
import org.springframework.data.redis.core.RedisTemplate;

@Configuration
public class RedisConfiguration {

    @Bean
    public GeoOperations<String,String> geoOperations(RedisTemplate<String,String> template) {
        return template.opsForGeo();
    }

}

Our model would be this one

package org.location;

import lombok.Data;

@Data
public class Location {

    private String name;
    private Double lat;
    private Double lng;

}

This simple service will persist venue locations and also fetch venues nearby of a location.

package org.location;

import java.util.List;
import java.util.stream.Collectors;

import org.springframework.data.geo.Circle;
import org.springframework.data.geo.Distance;
import org.springframework.data.geo.GeoResult;
import org.springframework.data.geo.GeoResults;
import org.springframework.data.geo.Metrics;
import org.springframework.data.geo.Point;
import org.springframework.data.redis.connection.RedisGeoCommands;
import org.springframework.data.redis.core.GeoOperations;
import org.springframework.data.redis.domain.geo.GeoLocation;
import org.springframework.stereotype.Service;

@Service
public class GeoService {

    public static final String VENUS_VISITED = "venues_visited";
    private final GeoOperations<String, String> geoOperations;

    public GeoService(GeoOperations<String, String> geoOperations) {
        this.geoOperations = geoOperations;
    }

    public void add(Location location) {
        Point point = new Point(location.getLng(), location.getLat());
        geoOperations.add(VENUS_VISITED, point, location.getName());

    }

    public List<String> nearByVenues(Double lng, Double lat, Double kmDistance) {
        Circle circle = new Circle(new Point(lng, lat), new Distance(kmDistance, Metrics.KILOMETERS));
        GeoResults<RedisGeoCommands.GeoLocation<String>> res = geoOperations.radius(VENUS_VISITED, circle);
        return res.getContent().stream()
                  .map(GeoResult::getContent)
                  .map(GeoLocation::getName)
                  .collect(Collectors.toList());
    }

}

We shall also add a controller

package org.location;

import java.util.List;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class LocationController {

    private final GeoService geoService;

    public LocationController(GeoService geoService) {
        this.geoService = geoService;
    }

    @PostMapping("/location")
    public ResponseEntity<String> addLocation(@RequestBody Location location) {
        geoService.add(location);
        return ResponseEntity.ok("Success");
    }

    @GetMapping("/location/nearby")
    public ResponseEntity<List<String>> locations(Double lng, Double lat, Double km) {
        List<String> locations = geoService.nearByVenues(lng, lat, km);
        return ResponseEntity.ok(locations);
    }

}

Then let’s add an element.

curl --location --request POST 'localhost:8080/location' \
--header 'Content-Type: application/json' \
--data-raw '{
	"lng": 51.5187516,
	"lat":-0.0814374,
	"name": "liverpool-street"
}'

Let’s retrieve the element of the api

curl --location --request GET 'localhost:8080/location/nearby?lng=51.4595573&lat=0.24949&km=100'
> [
    "liverpool-street"
]

And also let’s check redis

ZRANGE venues_visited 0 -1 WITHSCORES
1) "liverpool-street"
2) "2770072452773375"

We did it, pretty convenient for our day to day distance use cases.

Gradle: Push to Maven Repository

If you are a developer sharing your artifacts is a common task, that needs to be in place from the start.

In most teams and companies a Maven repository is already setup, this repository would be used mostly through CI/CD tasks enabling developers to distribute the generated artifacts.

In order to make the example possible we shall spin up a Nexus repository using Docker Compose.

First let’s create a default password and the directory containing the plugins Nexus will download. As of now the password is clear text, this will serve us for doing our first action. By resetting the admin password on nexus the file shall be removed, thus there won’t be a file with a clear text password.

mkdir nexus-data
cat a-test-password > admin.password

Then onwards to the Compose file:

services:
  nexus:
    image: sonatype/nexus3
    ports:
      - 8081:8081
    environment:
      INSTALL4J_ADD_VM_PARAMS: "-Xms2703m -Xmx2703m -XX:MaxDirectMemorySize=2703m"
    volumes:
      - nexus-data:/nexus-data
      - ./admin.password:/nexus-data/admin.password
volumes:
  nexus-data:

Let’s examine the file.
By using INSTALL4J_ADD_VM_PARAMS we override the Java command that will run the Nexus server, this way we can instruct to use more memory.
Because nexus takes too long on initialization we shall create a Docker volume. Everytime we run the compose file, the initialization will not happen from start, instead will use the results of the previous initialization. By mounting the admin password created previously we have a predefined password for the service.

By issuing the following command, the server will be up and running:

 
docker compose up

You can find more on Compose on the Developers Essential Guide to Docker Compose.

After some time nexus will have been initialized and running, therefore we shall proceed to our Gradle configuration.

We will keep track of the version on the gradle.properties file

version 1.0-SNAPSHOT

The build.gradle file:

plugins {
    id 'java'
    id 'maven-publish'
}

group 'org.example'
version '1.0-SNAPSHOT'

repositories {
    mavenCentral()
}

dependencies {
    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.8.1'
    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.8.1'
}

test {
    useJUnitPlatform()
}

publishing {
    publications {
        mavenJava(MavenPublication) {
            groupId 'org.example'
            artifactId 'gradle-push'
            version version
            from components.java
            versionMapping {
                usage('java-api') {
                    fromResolutionOf('runtimeClasspath')
                }
                usage('java-runtime') {
                    fromResolutionResult()
                }
            }
            pom {
                name = 'gradle-push'
                description = 'Gradle Push to Nexus'
                url = 'https://github.com/gkatzioura/egkatzioura.wordpress.com.git'
                licenses {
                    license {
                        name = 'The Apache License, Version 2.0'
                        url = 'http://www.apache.org/licenses/LICENSE-2.0.txt'
                    }
                }
                developers {
                    developer {
                        id = 'John'
                        name = 'John Doe'
                        email = 'an-email@gmail.com'
                    }
                }
                scm {
                    connection = 'scm:git:https://github.com/gkatzioura/egkatzioura.wordpress.com.git'
                    developerConnection = 'scm:git:https://github.com/gkatzioura/egkatzioura.wordpress.com.git'
                    url = 'https://github.com/gkatzioura/egkatzioura.wordpress.com.git'
                }
            }
        }
    }

    repositories {
        maven {
            def releasesRepoUrl = "http://localhost:8081/repository/maven-releases/"
            def snapshotsRepoUrl = "http://localhost:8081/repository/maven-snapshots/"
            url = version.endsWith('SNAPSHOT') ? snapshotsRepoUrl : releasesRepoUrl

            credentials {
                username "admin"
                password "a-test-password"
            }
        }
    }
}

The plugin to be used is the maven-publish plugin. If we examine the file, we identify that the plugin generates a pom maven file which shall be used in order to execute the deploy command to Nexus. The repositories are configured on the repositories section including the nexus users we defined previously. Whether the version is a SNAPSHOT or a release the corresponding repository endpoint will be picked. As we see clear text password is used. On the next blog we will examine how we can avoid that.

Now it is sufficient to run

gradle publish

This will deploy the binary to the snapshot repository.

You can find the source code on GitHub.

New Book Day: A Developer’s Essential Guide to Docker Compose

As Developers nowadays we have a wide variety of Software components and Cloud Services to use. This was a scenario that we could not even imagine in the past.
I still remember when we had to setup our Application Servers and Databases on top of Bare metal servers.
This burst of computing functionality in the form of the Cloud and managed services, allowed us to be able to utilise more tools for our applications and build better products.
Issues like orchestrating workloads, isolating and shipping them went to a whole different level.
As a result containerization came to the rescue.
Docker took over the microservice world and became the dominant solution to deploy microservices and in certain cases even to deploy Databases and Brokers.
This brings us to the development process. Production deployments is a huge chapter on its own. Platform engineers needs to take care of the security, the scaling and robustness of container based deployments.
But the development process is also affected:

  • SQL/NoSQL Databases as well as purpose-built databases like InfluxDB need to be available locally
  • Scenarios of Microservices applications need to be tested
  • Other Components such as brokers need to be available for testing

A step forward to the challenges mentioned above is to utilise Docker and its rich functionality. As convenient as it is to spin up containers locally you still end up managing containers, volumes and networks. Most of the times those have been spinned up add-hoc or with some scripts that a team has to maintain.

Docker Compose is one of the solutions to the problems described. With Compose you can spin up multiple containers locally organised using yaml files.
Here are some of the benefits when used during development:

  • The containers are organised and can spin up or shut down in an organized way
  • Applications can be placed on different networks
  • Volumes can be managed and attached to containers in a managed way
  • Containers resolve each other’s location through a DNS automatically, manual linking is not needed.

I have been using Compose for years. It helped me to make my development process much more efficient. Also in certain cases it helped me on actual production deployments. Writing this book was an opportunity for me to advocate for Docker Compose.

 

 

Thanks to the amazing people at Packt publishing it was possible to write this book and give back to the community.

The book is focused on various aspects of Compose.

In the beginning it will be an extensive look on Docker Compose, how it is implemented, how it interacts with the Docker engine, the available commands as well as the functionality it provides.

Onwards we will dive deep in day to day development using Compose. We will spin up complex infrastructure locally, as well as simulate Microservice environments using Compose. We will take this concept further and incrementally simulate things that we have to deal with a production deployment, as well as issue workarounds. Lastly we will use Compose for CI/CD jobs on popular solutions like Github Actions, Travis, and BitBucket Pipelines.

The last part of the book is all about deploying to production. All the knowledge acquired previously can be used for actual production deployments. A production deployment comes with some standards:

  • Infrastructure as Code
  • Container registry
  • Networks
  • Load Balancing
  • Autoscaling

The above, in combination with the knowledge we accumulated so far using Compose, will be used for a deployment on AWS and Azure, the most popular Cloud providers. This will be done also with the extra help of Infrastructure as Code by using Terraform.

Lastly since many production deployments nowadays reside on Kubernetes we shall build a bridge between Compose and Kubernetes by migrating the existing Compose deployment using Kompose.

You can find the book on Amazon  as well as on the Packt portal.
Happy Learning!

Debezium Server with PostgreSQL and Redis Stream

Debezium is a great tool for capturing the row level changes that happen on a Database and stream those changes to a broker of our choice.

Since this functionality stays in the boundaries of a Database, it helps on keeping our applications simple. For example there in no need for an application to emit events on any database interactions. Debezium will monitor the row changes and will emit the events. Based on the broker solution used with Debezium a consumer can subscribe to the broker thus receive the changes.

PostgreSQL being a popular SQL database, it is supported by Debezium.

Our goal would be to listen to PostgreSQL changes and stream them to a Redis stream through a Debezium Server. It is common to use Debizum with Kafka, in case where Kafka is not present in a team’s Tech stack we can use other brokers.

In our case we would keep things lightweight by using Redis Streams.

Redis will be setup without any extra configurations.

In order to use PostgreSQL with Debezium it is essentials to alter the configuration on postgreSQL.

The configuration we shall use on postgreSQL will be the following

listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level = logical
max_wal_senders = 3

As we can see we use the logical_decoding from PostgreSQL.
From the documentation:

Logical decoding is the process of extracting all persistent changes to a database’s tables into a coherent, easy to understand format which can be interpreted without detailed knowledge of the database’s internal state.

In PostgreSQL, logical decoding is implemented by decoding the contents of the write-ahead log, which describe changes on a storage level, into an application-specific form such as a stream of tuples or SQL statements.

We will also create a namespace and a table for PostgreSQL. The namespace and the table will be the ones to listen for changes.

#!/bin/bash
set -e

psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
  create schema test_schema;
  create table test_schema.employee(
          id  SERIAL PRIMARY KEY,
          firstname   TEXT    NOT NULL,
          lastname    TEXT    NOT NULL,
          email       TEXT    not null,
          age         INT     NOT NULL,
          salary         real,
          unique(email)
      );
EOSQL

This is the table we used in a previous PostgreSQL example.

Debezium will have to be able to interact with the PostgreSQL server as well as the the redis server.
The configuration should be the following.

debezium.sink.type=redis
debezium.sink.redis.address=redis:6379
debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
debezium.source.offset.storage.file.filename=data/offsets.dat
debezium.source.offset.flush.interval.ms=0
debezium.source.database.hostname=postgres
debezium.source.database.port=5432
debezium.source.database.user=postgres
debezium.source.database.password=postgres
debezium.source.database.dbname=postgres
debezium.source.database.server.name=tutorial
debezium.source.schema.whitelist=test_schema
debezium.source.plugin.name=pgoutput

By examining the configuration we can see that we have the necessary information for Debezium to communicate to the PostgreSQL database, also we specify the schema that we created previously. Therefore only changes from that schema will be streamed. We can also make things more restrictive for example whitelisting tables.

Since this demo will involve three different software Components docker compose will come in handy.

version: '3.1'

services:
  redis:
    image: redis
    ports:
      - 6379:6379
    depends_on:
      - postgres
  postgres:
    image: postgres
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - ./postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5432:5432
  debezium:
    image: debezium/server
    volumes:
      - ./conf:/debezium/conf
      - ./data:/debezium/data
    depends_on:
      - redis

By using Compose we were able to spin up three different software components on the same network. This helps the components to interact with each other by using the dns names of the services as specified on Compose. Also the configuration files we created previously are mounted to the Docker containers. Docker Compose V2 is out there with many good features, you can find more about it on the book I authored
A Developer’s Essential Guide to Docker Compose
.

In order to get the stack running we shall execute the following command

$ docker compose up

Since it is up and running, we can now start listening for events.

We shall login to Redis and start listen for any possible database updates.

$ docker exec -it debezium-example-redis-1 redis-cli
> xread block 1000000 streams tutorial.test_schema.employee $

This will make it possible to block until we receive one event from the stream.
If we examine the stream name we should see the pattern of {server-name}.{schema}.{table}. This would allow consumers to subscribe only to the changes of interest.

Onwards we will make an entry.

$ docker exec -it debezium-example-postgres-1 psql postgres postgres
> insert into test_schema.employee (firstname,lastname,email,age,salary) values ('John','Doe 1','john1@doe.com',18,1234.23);
> \q

If we check the redis session we should see that we received an event from the Redis stream

127.0.0.1:6379> xread block 1000000 streams tutorial.test_schema.employee $
1) 1) "tutorial.test_schema.employee"
   2) 1) 1) "1663796657336-0"
         2) 1) "{\"schema\":{\"type\":\"struct\",\"fields\":[{\"type\":\"int32\",\"optional\":false,\"default\":0,\"field\":\"id\"}],\"optional\":false,\"name\":\"tutorial.test_schema.employee.Key\"},\"payload\":{\"id\":1}}"
            2) "{\"schema\":{\"type\":\"struct\",\"fields\":[{\"type\":\"struct\",\"fields\":[{\"type\":\"int32\",\"optional\":false,\"default\":0,\"field\":\"id\"},{\"type\":\"string\",\"optional\":false,\"field\":\"firstname\"},{\"type\":\"string\",\"optional\":false,\"field\":\"lastname\"},{\"type\":\"string\",\"optional\":false,\"field\":\"email\"},{\"type\":\"int32\",\"optional\":false,\"field\":\"age\"},{\"type\":\"float\",\"optional\":true,\"field\":\"salary\"}],\"optional\":true,\"name\":\"tutorial.test_schema.employee.Value\",\"field\":\"before\"},{\"type\":\"struct\",\"fields\":[{\"type\":\"int32\",\"optional\":false,\"default\":0,\"field\":\"id\"},{\"type\":\"string\",\"optional\":false,\"field\":\"firstname\"},{\"type\":\"string\",\"optional\":false,\"field\":\"lastname\"},{\"type\":\"string\",\"optional\":false,\"field\":\"email\"},{\"type\":\"int32\",\"optional\":false,\"field\":\"age\"},{\"type\":\"float\",\"optional\":true,\"field\":\"salary\"}],\"optional\":true,\"name\":\"tutorial.test_schema.employee.Value\",\"field\":\"after\"},{\"type\":\"struct\",\"fields\":[{\"type\":\"string\",\"optional\":false,\"field\":\"version\"},{\"type\":\"string\",\"optional\":false,\"field\":\"connector\"},{\"type\":\"string\",\"optional\":false,\"field\":\"name\"},{\"type\":\"int64\",\"optional\":false,\"field\":\"ts_ms\"},{\"type\":\"string\",\"optional\":true,\"name\":\"io.debezium.data.Enum\",\"version\":1,\"parameters\":{\"allowed\":\"true,last,false,incremental\"},\"default\":\"false\",\"field\":\"snapshot\"},{\"type\":\"string\",\"optional\":false,\"field\":\"db\"},{\"type\":\"string\",\"optional\":true,\"field\":\"sequence\"},{\"type\":\"string\",\"optional\":false,\"field\":\"schema\"},{\"type\":\"string\",\"optional\":false,\"field\":\"table\"},{\"type\":\"int64\",\"optional\":true,\"field\":\"txId\"},{\"type\":\"int64\",\"optional\":true,\"field\":\"lsn\"},{\"type\":\"int64\",\"optional\":true,\"field\":\"xmin\"}],\"optional\":false,\"name\":\"io.debezium.connector.postgresql.Source\",\"field\":\"source\"},{\"type\":\"string\",\"optional\":false,\"field\":\"op\"},{\"type\":\"int64\",\"optional\":true,\"field\":\"ts_ms\"},{\"type\":\"struct\",\"fields\":[{\"type\":\"string\",\"optional\":false,\"field\":\"id\"},{\"type\":\"int64\",\"optional\":false,\"field\":\"total_order\"},{\"type\":\"int64\",\"optional\":false,\"field\":\"data_collection_order\"}],\"optional\":true,\"field\":\"transaction\"}],\"optional\":false,\"name\":\"tutorial.test_schema.employee.Envelope\"},\"payload\":{\"before\":null,\"after\":{\"id\":1,\"firstname\":\"John\",\"lastname\":\"Doe 1\",\"email\":\"john1@doe.com\",\"age\":18,\"salary\":1234.23},\"source\":{\"version\":\"1.9.5.Final\",\"connector\":\"postgresql\",\"name\":\"tutorial\",\"ts_ms\":1663796656393,\"snapshot\":\"false\",\"db\":\"postgres\",\"sequence\":\"[null,\\\"24289128\\\"]\",\"schema\":\"test_schema\",\"table\":\"employee\",\"txId\":738,\"lsn\":24289128,\"xmin\":null},\"op\":\"c\",\"ts_ms\":1663796657106,\"transaction\":null}}"
(10.17s)
127.0.0.1:6379> 

How cool is that? We can now start streaming our databases changes to the broker of our choice.

You can find the source code on GitHub.

Kafka & Zookeeper for Development: Connecting Clients to the Cluster

Previously we achieved to have our Kafka brokers connect to a ZooKeeper ensemble. Also we brought down some brokers checked the election leadership and produced/consumed some messages.

For now we want to make sure that we will be able to connect to those nodes. The problem with connecting to the ensemble we created previously is that it is located inside the container network. When a client interacts with one of the brokers and receives the full list of the brokers he will receive a list of IPs not accessible to it.

So the initial handshake of a client will be successful but then the client will try to interact withs some unreachable hosts.

In order to tackle this we will have a combination of workarounds.

The first one would be to bind the port of each Kafka broker to a different local ip.

kafka-1 will be mapped to 127.0.0.1:9092
kafka-2 will be mapped to 127.0.0.2:9092
kafka-3 will be mapped to 127.0.0.3:9092

So let’s create the aliases of those addresses

sudo ifconfig lo0 alias 127.0.0.2
sudo ifconfig lo0 alias 127.0.0.3

Now it’s possible to do the ip binding. Let’s also put those entries to our /etc/hosts. By doing this, we achieve our local network and our docker network to be in agreement on which broker they should access.

127.0.0.1	kafka-1
127.0.0.2	kafka-2
127.0.0.3	kafka-3

The next step is also to change the KAFKA_ADVERTISED_LISTENERS on each broker. We will adapt this to the DNS entry of each broker. By setting KAFKA_ADVERTISED_LISTENERS the clients from the outside can correctly connect to it, to an address reachable to them and not an address through the internal network. Further explanations can be found on this blog.

  kafka-1:
    container_name: kafka-1
    image: confluent/kafka
    ports:
    - "127.0.0.1:9092:9092"
    volumes:
    - type: bind
      source: ./server1.properties
      target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-1:9092"
  kafka-2:
    container_name: kafka-2
    image: confluent/kafka
    ports:
      - "127.0.0.2:9092:9092"
    volumes:
      - type: bind
        source: ./server2.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-2:9092"
  kafka-3:
    container_name: kafka-3
    image: confluent/kafka
    ports:
      - "127.0.0.3:9092:9092"
    volumes:
      - type: bind
        source: ./server3.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-3:9092"

We see the port binding change as well as the KAFKA_ADVERTISED_LISTENERS. Now let’s wrap everything together in our docker-compose

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: "1"
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    environment:
      ZOO_MY_ID: "2"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    environment:
      ZOO_MY_ID: "3"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  kafka-1:
    container_name: kafka-1
    image: confluent/kafka
    ports:
    - "127.0.0.1:9092:9092"
    volumes:
    - type: bind
      source: ./server1.properties
      target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-1:9092"
  kafka-2:
    container_name: kafka-2
    image: confluent/kafka
    ports:
      - "127.0.0.2:9092:9092"
    volumes:
      - type: bind
        source: ./server2.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-2:9092"
  kafka-3:
    container_name: kafka-3
    image: confluent/kafka
    ports:
      - "127.0.0.3:9092:9092"
    volumes:
      - type: bind
        source: ./server3.properties
        target: /etc/kafka/server.properties
    depends_on:
      - zookeeper-1
      - zookeeper-2
      - zookeeper-3
    environment:
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-3:9092"

You can find more on Compose on the Developers Essential Guide to Docker Compose.

Last but not least you can find the code on github.

Kafka & Zookeeper for Development: Connecting Brokers to the Ensemble

Previously we created successfully a Zookeeper ensemble, now it’s time to add some Kafka brokers that will connect to the ensemble and we shall execute some commands.

We will pick up from the same docker compose file we compiled previously. First let’s jump on the configuration that a Kafka broker needs.

offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
group.initial.rebalance.delay.ms=0
socket.send.buffer.bytes=102400
delete.topic.enable=true
socket.request.max.bytes=104857600
log.cleaner.enable=true
log.retention.check.interval.ms=300000
log.retention.hours=168
num.io.threads=8
broker.id=0
log4j.opts=-Dlog4j.configuration=file:/etc/kafka/log4j.properties
log.dirs=/var/lib/kafka
auto.create.topics.enable=true
num.network.threads=3
socket.receive.buffer.bytes=102400
log.segment.bytes=1073741824
num.recovery.threads.per.data.dir=1
num.partitions=1
zookeeper.connection.timeout.ms=6000
zookeeper.connect=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181

Will go through the ones that is essential to know.

  • offsets.topic.replication.factor: how the internal offset topic gets replicated – replication factor
  • transaction.state.log.replication.factor: how the internal transaction topic gets replicated – replication factor
  • transaction.state.log.min.isr: the minimum in sync replicas for the internal transaction topic
  • delete.topic.enable: if not true Kafka will ignore the delete topic command
  • socket.request.max.bytes: the maximum size of requests
  • log.retention.check.interval.ms: the interval to evaluate if a log should be deleted
  • log.retention.hours: how many hours a log is retained before getting deleted
  • broker.id: what is the broker id of that installation
  • log.dirs: the directories where Kafka will store the log data, can be a comma separated
  • auto.create.topics.enable: create topics if they don’t exist on sending/consuming messages or asking for topic metadata
  • num.network.threads: threads on receiving requests and sending responses from the network
  • socket.receive.buffer.bytes: buffer of the server socket
  • log.segment.bytes: the size of a log file
  • num.recovery.threads.per.data.dir: threads used for log recovery at startup and flushing at shutdown
  • num.partitions: has to do with the default number of partition a topic will have once created if partition number is not specified.
  • zookeeper.connection.timeout.ms: time needed for a client to establish connection to ZooKeeper
  • zookeeper.connect: is the list of the ZooKeeper servers

Now it’s time to create the properties for each broker. Due to the broker.id property we need to have different files with the corresponding broker.id

So our first’s brokers file would look like this (broker.id 1). Keep in mind that those brokers will run on the same docker-compose file. Therefore the zookeeper.connect property contains the internal docker compose dns names. The name of the file would be named server1.properties.

socket.send.buffer.bytes=102400
delete.topic.enable=true
socket.request.max.bytes=104857600
log.cleaner.enable=true
log.retention.check.interval.ms=300000
log.retention.hours=168
num.io.threads=8
broker.id=1
transaction.state.log.replication.factor=1
log4j.opts=-Dlog4j.configuration\=file\:/etc/kafka/log4j.properties
group.initial.rebalance.delay.ms=0
log.dirs=/var/lib/kafka
auto.create.topics.enable=true
offsets.topic.replication.factor=1
num.network.threads=3
socket.receive.buffer.bytes=102400
log.segment.bytes=1073741824
num.recovery.threads.per.data.dir=1
num.partitions=1
transaction.state.log.min.isr=1
zookeeper.connection.timeout.ms=6000
zookeeper.connect=zookeeper-1\:2181,zookeeper-2\:2181,zookeeper-3\:2181

The same recipe applies for the broker.id=2 as well as broker.id=3

After creating those three broker configuration files it is time to change our docker-compose configuration.

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: "1"
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    environment:
      ZOO_MY_ID: "2"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    environment:
      ZOO_MY_ID: "3"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  kafka-1:
    container_name: kafka-1
    image: confluent/kafka
    ports:
    - "9092:9092"
    volumes:
    - type: bind
      source: ./server1.properties
      target: /etc/kafka/server.properties
  kafka-2:
    container_name: kafka-2
    image: confluent/kafka
    ports:
      - "9093:9092"
    volumes:
      - type: bind
        source: ./server2.properties
        target: /etc/kafka/server.properties
  kafka-3:
    container_name: kafka-3
    image: confluent/kafka
    ports:
      - "9094:9092"
    volumes:
      - type: bind
        source: ./server3.properties
        target: /etc/kafka/server.properties

You can find more on Compose on the Developers Essential Guide to Docker Compose.

Let’s spin up the docker-compose file.

> docker-compose -f docker-compose.yaml up

Just like the previous examples we shall run some commands through the containers.

Now that we have a proper cluster with Zookeeper and multiple Kafka brokers it is time to test them working together.
The first action is to create a topic with a replication factor of 3. The expected outcome would be for this topic to be replicated 3 kafka brokers

> docker exec -it kafka-1 /bin/bash
confluent@92a6d381d0db:/$ kafka-topics --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 --create --topic tutorial-topic --replication-factor 3 --partitions 1

Our topic has been created let’s check the description of the topic.

confluent@92a6d381d0db:/$ kafka-topics --describe --topic tutorial-topic --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Topic:tutorial-topic	PartitionCount:1	ReplicationFactor:3	Configs:
	Topic: tutorial-topic	Partition: 0	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3

As we see the Leader for the partition is broker 2

Next step is putting some data to the topic recently created. Before doing so I will add a consumer listening for messages to that topic. While we post messages to the topic those will be printed by this consumer.

> docker exec -it kafka-3 /bin/bash
confluent@4042774f8802:/$ kafka-console-consumer --topic tutorial-topic --from-beginning --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181

Let’s add some topic data.

> docker exec -it kafka-1 /bin/bash
confluent@92a6d381d0db:/$ kafka-console-producer --topic tutorial-topic --broker-list kafka-1:9092,kafka-2:9092
test1
test2
test3

As expected the consumer on the other terminal will print the messages expected.

test1
test2
test3

Due to having a cluster it would be nice to stop the leader broker and see another broker to take the leadership. While doing this the expected results will be to have all the messages replicated and no disruption on consuming and publishing the messages.

Stop the leader which is broker-2

> docker stop kafka-2

Check the leadership from another broker

confluent@92a6d381d0db:/$ kafka-topics --describe --topic tutorial-topic --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Topic:tutorial-topic	PartitionCount:1	ReplicationFactor:3	Configs:
	Topic: tutorial-topic	Partition: 0	Leader: 1	Replicas: 2,1,3	Isr: 1,3

The leader now is kafka-1

Read the messages to see that they did got replicated.

> docker exec -it kafka-3 /bin/bash
confluent@4042774f8802:/$ kafka-console-consumer --topic tutorial-topic --from-beginning --zookeeper zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
test1
test2
test3

As expected apart from the Leadership being in place our data have also been replicated!

If we try to post new messages, it will also be a successful action.

So to summarise we did run a Kafka cluster connected to a zookeeper ensemble. We did create a topic with replication enabled to 3 brokers and last but not least we did test what happens if one broker goes down.

On the next blog we are going to wrap it up so our local machine clients can connect to the docker compose ensemble.

Kafka & Zookeeper for Development: Zookeeper Ensemble

Previously we spun up Zookeeper and Kafka locally but also through Docker. What comes next is spinning up more than just one Kafka and Zookeeper node and create a 3 node cluster. To achieve this the easy way locally docker-compose will be used. Instead of spinning up various instances on the cloud or running various Java processes and altering configs, docker-compose will greatly help us to bootstrap a Zookeeper ensemble and the Kafka brokers, with everything needed preconfigured.

The first step is to create the Zookeeper ensemble but before going there let’s check the ports needed.
Zookeeper needs three ports.

  • 2181 is the client port. On the previous example it was the port our clients used to communicate with the server.
  • 2888 is the peer port. This is the port that zookeeper nodes use to talk to each other.
  • 3888 is the leader port. The port that nodes use to talk to each other when it comes to the leader election.

By using docker compose our nodes will use the same network and the container name will also be an internal dns entry.
The names on the zookeeper nodes would be zookeeper-1, zookeeper-2, zookeeper-3.

Our next goal is to give to each zookeeper node a configuration that will enable the nodes to discover each other.

This is the typical zookeeper configuration expected.

  • tickTime is the unit of time in milliseconds zookeeper uses for heartbeat and minimum session timeout.
  • dataDir is the location where ZooKeeper will store the in-memory database snapshots
  • initlimit and SyncLimit are used for the zookeeper synchronization.
  • server* are a list of the nodes that will have to communicate with each other

zookeeper.properties

clientPort=2181
dataDir=/var/lib/zookeeper
syncLimit=2
DATA.DIR=/var/log/zookeeper
initLimit=5
tickTime=2000
server.1=zookeeper-1:2888:3888
server.2=zookeeper-2:2888:3888
server.3=zookeeper-3:2888:3888

When it comes to a node the server that the node is located, should be bound to `0.0.0.0`. Thus we need three different zookeeper properties per node.

For example for the node with id 1 the file should be the following
zookeeper1.properties

clientPort=2181
dataDir=/var/lib/zookeeper
syncLimit=2
DATA.DIR=/var/log/zookeeper
initLimit=5
tickTime=2000
server.1=0.0.0.0:2888:3888
server.2=zookeeper-2:2888:3888
server.3=zookeeper-3:2888:3888

The next question that arises is the id file of zookeeper. How a zookeeper instance can identify which is its id.

Based on the documentation we need to specify the server ids using the myid file

The myid file is a plain text file located at a nodes dataDir containing only a number the server name.

So three myids files will be created each containing the number of the broker

myid_1.txt

1

myid_2.txt

2

myid_3.txt

3

It’s time to spin up the Zookeeper ensemble. We will use the files specified above. Different client ports are mapped to the host to avoid collision.

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    volumes:
      - type: bind
        source: ./zookeeper1.properties
        target: /conf/zoo.cfg
      - type: bind
        source: ./myid_1.txt
        target: /data/myid
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    volumes:
    - type: bind
      source: ./zookeeper2.properties
      target: /conf/zoo.cfg
    - type: bind
      source: ./myid_2.txt
      target: /data/myid
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    volumes:
      - type: bind
        source: ./zookeeper3.properties
        target: /conf/zoo.cfg
      - type: bind
        source: ./myid_3.txt
        target: /data/myid

Eventually instead of having all those files mounted it would be better if we had a more simple option. Hopefully the image that we use gives us the choice of specifying ids and brokers using environment variables.

version: "3.8"
services:
  zookeeper-1:
    container_name: zookeeper-1
    image: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: "1"
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-2:
    container_name: zookeeper-2
    image: zookeeper
    ports:
      - "2182:2181"
    environment:
      ZOO_MY_ID: "2"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181
  zookeeper-3:
    container_name: zookeeper-3
    image: zookeeper
    ports:
      - "2183:2181"
    environment:
      ZOO_MY_ID: "3"
      ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181

You can find more on Compose on the Developers Essential Guide to Docker Compose.

Now let’s check the status of our zookeeper nodes.

Let’s use the zookeeper shell to see the leaders and followers

docker exec -it zookeeper-1 /bin/bash
>./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower

And

> docker exec -it zookeeper-3 /bin/bash
root@81c2dc476127:/apache-zookeeper-3.6.2-bin# ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader

So in this tutorial we created a Zookeeper ensemble using docker-compose and we also had a leader election. This recipe can be adapted to apply on the a set of VMs or a container orchestration engine.

On the next tutorial we shall add some Kafka brokers to connect to the ensemble.