WordCount on Hadoop with Scala

Hadoop is a great technology built with java.

Today we will use Scala to implement a simple map reduce job and then run it using HDInsight.

We shall add the assembly plugin on our assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")


Then we will add the Hadoop core dependency on our build.sbt file. Also will we apply some configuration in the merge strategy to avoid deduplicate errors.



assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.2.1"

We will use WordCount as an example.
The original Java class shall be transformed to a Scala class.

package com.gkatzioura.scala

import java.lang.Iterable
import java.util.StringTokenizer

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{IntWritable, Text}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}
import scala.collection.JavaConverters._

/**
  * Created by gkatzioura on 2/14/17.
  */
package object WordCount {

  class TokenizerMapper extends Mapper[Object, Text, Text, IntWritable] {

    val one = new IntWritable(1)
    val word = new Text()

    override def map(key: Object, value: Text, context: Mapper[Object, Text, Text, IntWritable]#Context): Unit = {
      val itr = new StringTokenizer(value.toString)
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken())
        context.write(word, one)
      }
    }
  }

  class IntSumReader extends Reducer[Text,IntWritable,Text,IntWritable] {
    override def reduce(key: Text, values: Iterable[IntWritable], context: Reducer[Text, IntWritable, Text, IntWritable]#Context): Unit = {
      var sum = values.asScala.foldLeft(0)(_ + _.get)
      context.write(key, new IntWritable(sum))
    }
  }


  def main(args: Array[String]): Unit = {
    val configuration = new Configuration
    val job = Job.getInstance(configuration,"word count")
    job.setJarByClass(this.getClass)
    job.setMapperClass(classOf[TokenizerMapper])
    job.setCombinerClass(classOf[IntSumReader])
    job.setReducerClass(classOf[IntSumReader])
    job.setOutputKeyClass(classOf[Text])
    job.setOutputKeyClass(classOf[Text]);
    job.setOutputValueClass(classOf[IntWritable]);
    FileInputFormat.addInputPath(job, new Path(args(0)))
    FileOutputFormat.setOutputPath(job, new Path(args(1)))
    System.exit(if(job.waitForCompletion(true))  0 else 1)
  }

}

Then we will build our example

sbt clean compile assembly

Our new jar will reside on target/scala-2.12/ScalaHadoop-assembly-1.0.jar

On the next post we shall run our code using Azure’s HDInsight.

You can find the code on github.

Hibernate Caching with HazelCast: Basic configuration

Previously we went through an introduction on JPA caching, the mechanisms and what hibernate offers.

What comes next is a hibernate project using Hazelcast as a second level cache.

We will use a basic spring boot project for this purpose with JPA. Spring boot uses hibernate as the default JPA provider.
Our setup will be pretty close to the one of a previous post.
We will use postgresql with docker for our sql database.

group 'com.gkatzioura'
version '1.0-SNAPSHOT'

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.5.1.RELEASE")
    }
}

apply plugin: 'java'
apply plugin: 'idea'
apply plugin: 'org.springframework.boot'


repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile group: 'org.springframework.boot', name: 'spring-boot-starter-data-jpa'
    compile group: 'org.postgresql', name:'postgresql', version:'9.4-1206-jdbc42'
    compile group: 'org.springframework', name: 'spring-jdbc'
    compile group: 'com.zaxxer', name: 'HikariCP', version: '2.6.0'
    compile group: 'com.hazelcast', name: 'hazelcast-hibernate5', version: '1.2'
    compile group: 'com.hazelcast', name: 'hazelcast', version: '3.7.5'
    testCompile group: 'junit', name: 'junit', version: '4.11'
}

By examining the dependencies carefully we see the hikari pool, the postgresql driver, spring data jpa and of course hazelcast.

Instead of creating the database manually we will automate it by utilizing the database initialization feature of Spring boot.

We shall create a file called schema.sql under the resources folder.

create schema spring_data_jpa_example;
 
create table spring_data_jpa_example.employee(
    id  SERIAL PRIMARY KEY,
    firstname   TEXT    NOT NULL,
    lastname    TEXT    NOT NULL,   
    email       TEXT    not null,
    age         INT     NOT NULL,
    salary         real,
    unique(email)
);
 
insert into spring_data_jpa_example.employee (firstname,lastname,email,age,salary) 
values ('Test','Me','test@me.com',18,3000.23);

To keep it simple and avoid any further configurations we shall put the configurations for datasource, jpa and caching inside the application.yml file.

spring:
  datasource:
    continue-on-error: true
    type: com.zaxxer.hikari.HikariDataSource
    url: jdbc:postgresql://172.17.0.2:5432/postgres
    driver-class-name: org.postgresql.Driver
    username: postgres
    password: postgres
    hikari:
      idle-timeout: 10000
  jpa:
    properties:
      hibernate:
        cache:
          use_second_level_cache: true
          use_query_cache: true
          region:
            factory_class: com.hazelcast.hibernate.HazelcastCacheRegionFactory
    show-sql: true

The configuration spring.datasource.continue-on-error is crucial since once the application relaunches, there should be a second attempt to create the database and thus a crash is inevitable.

Any hibernate specific properties reside at the spring.jpa.properties path. We enabled the second level cache and the query cache.

Also we set show-sql to true. This means that once a query hits the database it shall be logged through the console.

Then create our employee entity.

package com.gkatzioura.hibernate.enitites;

import javax.persistence.*;

/**
 * Created by gkatzioura on 2/6/17.
 */
@Entity
@Table(name = "employee", schema="spring_data_jpa_example")
public class Employee {

    @Id
    @Column(name = "id")
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    @Column(name = "firstname")
    private String firstName;

    @Column(name = "lastname")
    private String lastname;

    @Column(name = "email")
    private String email;

    @Column(name = "age")
    private Integer age;

    @Column(name = "salary")
    private Integer salary;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastname() {
        return lastname;
    }

    public void setLastname(String lastname) {
        this.lastname = lastname;
    }

    public String getEmail() {
        return email;
    }

    public void setEmail(String email) {
        this.email = email;
    }

    public Integer getAge() {
        return age;
    }

    public void setAge(Integer age) {
        this.age = age;
    }

    public Integer getSalary() {
        return salary;
    }

    public void setSalary(Integer salary) {
        this.salary = salary;
    }
}

Everything is setup. Spring boot will detect the entity and create an EntityManagerFactory on its own.
What comes next is the repository class for employee.

package com.gkatzioura.hibernate.repository;

import com.gkatzioura.hibernate.enitites.Employee;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.repository.CrudRepository;

/**
 * Created by gkatzioura on 2/11/17.
 */
public interface EmployeeRepository extends JpaRepository<Employee,Long> {
}

And the last one is the controller

package com.gkatzioura.hibernate.controller;

import com.gkatzioura.hibernate.enitites.Employee;
import com.gkatzioura.hibernate.repository.EmployeeRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import java.util.List;

/**
 * Created by gkatzioura on 2/6/17.
 */
@RestController
public class EmployeeController {

    @Autowired
    private EmployeeRepository employeeRepository;

    @RequestMapping("/employee")
    public List<Employee> testIt() {

        return employeeRepository.findAll();
    }

    @RequestMapping("/employee/{employeeId}")
    public Employee getEmployee(@PathVariable Long employeeId) {

        return employeeRepository.findOne(employeeId);
    }

}

Once we issue a request at
http://localhost:8080/employee/1

Console will display the query issued at the database

Hibernate: select employee0_.id as id1_0_0_, employee0_.age as age2_0_0_, employee0_.email as email3_0_0_, employee0_.firstname as firstnam4_0_0_, employee0_.lastname as lastname5_0_0_, employee0_.salary as salary6_0_0_ from spring_data_jpa_example.employee employee0_ where employee0_.id=?

The second time we issue the request, since we have the second cache enabled there won’t be a query issued upon the database. Instead the entity shall be fetched from the second level cache.

You can download the project from github.

Hibernate Caching With HazelCast: JPA caching basics

One of the greatest capabilities of HazelCast is the support for hibernate’s second level cache.

JPA has two levels of cache.
The first level cache caches an object’s state for the duration of a transaction. By querying the same object twice you have to get the object your retrieved the first time.
However in case of complex queries which include the object you retrieved and access your database, chances are, that the results would be out of sync since they will not reflect the changes you applied to the object in memory during the transaction. However you can tackle this with flush().
Once a JPA session is initiated its first level cache is restricted to that session, it will not affect other sessions.
First level cache is required as a part of JPA

The second level cache on the contrary to first level cache is associated with the Session Factory, thus the second level cache is shared across sessions. Commonly used data can be stored in memory and retrieved faster.

Once you have the second level cache enabled hibernate will cache the entities retrieved in a hibernate region. To do so you have to set your entities as cachable. Under the hood the information that resides in an entity is cached in a dehydrated format.

Hazelcast can be used with second level cache in two forms of architectures.
Client-server or cluster-only architecture.
For start we will investigate a cluster only architecture.
Hazelcast creates a separate distributed map for each Hibernate cache region therefore an entity. You can easily configure these regions via Hazelcast map configuration.The name of the region has a corresponding hazelcast map. For example one of our entities is called User and the full package path is ‘com.gkatzioura.User’ then our hazelcast will have the name ‘com.gkatzioura.User’. Provided that this map is distributed across hazelcast nodes, once the entity is retrieved from one node the cached information will be shared to other hazelcast nodes. Once an entity gets updated in a node the cached information will be invalidated in the other nodes.

Hibernate also provides us with Query cache. Query cache is a cache that caches query results . For example in case of a jpql query

SELECT usr.username,usr.firstname FROM User usr

the cached result would be a map with a key composed of the query and the parameters

and the value the results retrieved. In the previous cache the data retrieved are primitive values and they are stored as it is.
However there cases in which a query might retrieve entities.
For example

SELECT c FROM Customer c

In such cases instead of storing all the information retrieved, the entities are retrieved and cached in the second level cache whilst the query cache has an entry using the query and its parameters as a key and the entity ids as the value.
Once the same query is issued again the query cache will fetch the ids and will lookup on the second level cache for the corresponding entities. If an entity does not exist in the second level cache, then a query is issued in order to fetch the entity missing.
When it comes to second level cache and query cache configuration we need to pay attention on the eviction mechanisms of the second level cache and query cache. You might stumble in cases of ids being cached in the query cache however their corresponding entities are evicted from the second level cache. In such cases there is a performance hit since hibernate will issue a query for each entity missing.

Hazelcast has support for the query cache, however it is local to the node and never distributed across hazelcast cluster.
Although the results fetched from a query remain on the specific node, the entities specified from the cached query shall be retrieved from the distributed map which is used as a second level cache.

This is the theory we need so far. On the next blog we do some spring data jpa code and some hazelcast configurations.

Push Spring Boot Docker images on ECR

On a previous blog we integrated a spring boot application with EC2.
It is one of the most raw forms of deployment that you can have on Amazon Web Services.

On this tutorial we will create a docker image with our application which will be stored to the Amazon EC2 container registry.

You need to have the aws cli tool installed.

We will get as simple as we can with our spring application therefore we will use an example from the official spring source page. The only changes applied will be on the packaging and the application name.

Our application shall be named ecs-deployment

rootProject.name = 'ecs-deployment'

Then we build and run our application

gradle build
gradle bootRun

Now let’s dockerize our application.
First we shall create a Dockerfile that will reside on src/main/docker.

FROM frolvlad/alpine-oraclejdk8
VOLUME /tmp
ADD ecs-deployment-1.0-SNAPSHOT.jar app.jar
RUN sh -c 'touch /app.jar'
ENV JAVA_OPTS=""
ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar" ]

Then we should edit our gradle file in order to add the docker dependency, the docker plugin and an extra gradle task in order to create our docker image.

buildscript {
    ...
    dependencies {
        ...
        classpath('se.transmode.gradle:gradle-docker:1.2')
    }
}

...
apply plugin: 'docker'


task buildDocker(type: Docker, dependsOn: build) {
    push = false
    applicationName = jar.baseName
    dockerfile = file('src/main/docker/Dockerfile')
}

And we are ready to build our docker image.

./gradlew build buildDocker

You can also run your docker application from the newly created image.

docker run -p 8080:8080 -t com.gkatzioura.deployment/ecs-deployment:1.0-SNAPSHOT

First step is too create our ecr repository

aws ecr create-repository  --repository-name ecs-deployment

Then let us proceed with our docker registry authentication.

aws ecr get-login

Then run the command given in the output. The login attempt will succeed and your are ready to proceed to push your image.

First tag the image in order to specify the repository that we previously created and then do a docker push.

docker tag {imageid} {aws account id}.dkr.ecr.{aws region}.amazonaws.com/ecs-deployment:1.0-SNAPSHOT
docker push {aws account id}.dkr.ecr.{aws region}.amazonaws.com/ecs-deployment:1.0-SNAPSHOT

And we are done! Our spring boot docker image is deployed on the Amazon EC2 container registry.

You can find the source code on github.

Spring-Boot and Cache Abstraction with HazelCast

Previously we got started with Spring Cache abstraction using the default Cache Manager that spring provides.

Although this approach might suit our needs for simple applications, in case of complex problems we need to use different tools with more capabilities. Hazelcast is one of them. Hazelcast is hands down a great caching tool when it comes to a JVM based application. By using hazelcast as a cache, data is evenly distributed among the nodes of a computer cluster, allowing for horizontal scaling of available storage.

We will run our codebase using spring profiles thus ‘hazelcast-cache’ will be our profile name.

group 'com.gkatzioura'
version '1.0-SNAPSHOT'


buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.4.2.RELEASE")
    }
}

apply plugin: 'java'
apply plugin: 'idea'
apply plugin: 'org.springframework.boot'

repositories {
    mavenCentral()
}


sourceCompatibility = 1.8
targetCompatibility = 1.8

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-cache")
    compile("org.springframework.boot:spring-boot-starter")
    compile("com.hazelcast:hazelcast:3.7.4")
    compile("com.hazelcast:hazelcast-spring:3.7.4")

    testCompile("junit:junit")
}

bootRun {
    systemProperty "spring.profiles.active", "hazelcast-cache"
}

As you can see we updated the gradle file from the previous example and we added two extra dependencies hazelcast and hazelcast-spring. Also we changed the profile that our application will run by default.

Our next step is to configure the hazelcast cache manager.

package com.gkatzioura.caching.config;

import com.hazelcast.config.Config;
import com.hazelcast.config.EvictionPolicy;
import com.hazelcast.config.MapConfig;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;

/**
 * Created by gkatzioura on 1/10/17.
 */
@Configuration
@Profile("hazelcast-cache")
public class HazelcastCacheConfig {

    @Bean
    public Config hazelCastConfig() {

        Config config = new Config();
        config.setInstanceName("hazelcast-cache");

        MapConfig allUsersCache = new MapConfig();
        allUsersCache.setTimeToLiveSeconds(20);
        allUsersCache.setEvictionPolicy(EvictionPolicy.LFU);
        config.getMapConfigs().put("alluserscache",allUsersCache);

        MapConfig usercache = new MapConfig();
        usercache.setTimeToLiveSeconds(20);
        usercache.setEvictionPolicy(EvictionPolicy.LFU);
        config.getMapConfigs().put("usercache",usercache);

        return config;
    }

}

We just created two maps with a ttl policy of 20 seconds. Therefore 20 seconds since the map gets populated a cache eviction will occur. For more hazelcast configurations please refer to the official hazelcast documentation.

Another change that we have to implement is to change UserPayload into a serializable Java object, since objects stored in hazelcast must be Serializable.

package com.gkatzioura.caching.model;

import java.io.Serializable;

/**
 * Created by gkatzioura on 1/5/17.
 */
public class UserPayload implements Serializable {

    private String userName;
    private String firstName;
    private String lastName;

    public String getUserName() {
        return userName;
    }

    public void setUserName(String userName) {
        this.userName = userName;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }
}

Last but not least we add another repository bound to the hazelcast-cache profile.

The result is our previous spring-boot application integrated with hazelcast instead of the default cache, configured with a ttl policy.

You can find the sourcecode on github.

Spring boot and Cache Abstraction

Caching is a major ingredient of most applications, and as long as we try to avoid disk access it will stay strong.
Spring has great support for caching with a wide range of configurations. You can start as simple as you want and progress to something much more customizable.

This would be an example with the simplest form of caching that spring provides.
Spring comes by default with an in memory cache which is pretty easy to setup.

Let us start with our gradle file.

group 'com.gkatzioura'
version '1.0-SNAPSHOT'


buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.4.2.RELEASE")
    }
}

apply plugin: 'java'
apply plugin: 'idea'
apply plugin: 'org.springframework.boot'

repositories {
    mavenCentral()
}


sourceCompatibility = 1.8
targetCompatibility = 1.8

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-cache")
    compile("org.springframework.boot:spring-boot-starter")
    testCompile("junit:junit")
}

bootRun {
    systemProperty "spring.profiles.active", "simple-cache"
}

Since the same project will be used for different cache providers there are gonna be multiple spring profiles. The spring profile for this tutorial would be the simple-cache since we are going to use the ConcurrentMap-based Cache which happens to be the default.

We will implement an application which will fetch user information from our local file system.
The information shall reside on the users.json file

[
  {"userName":"user1","firstName":"User1","lastName":"First"},
  {"userName":"user2","firstName":"User2","lastName":"Second"},
  {"userName":"user3","firstName":"User3","lastName":"Third"},
  {"userName":"user4","firstName":"User4","lastName":"Fourth"}
]

Also we will specify a simple model for the data to be retrieved.

package com.gkatzioura.caching.model;

/**
 * Created by gkatzioura on 1/5/17.
 */
public class UserPayload {

    private String userName;
    private String firstName;
    private String lastName;

    public String getUserName() {
        return userName;
    }

    public void setUserName(String userName) {
        this.userName = userName;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }
}

Then we will add a bean that will read the information.

package com.gkatzioura.caching.config;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.gkatzioura.caching.model.UserPayload;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import org.springframework.core.io.Resource;

import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

/**
 * Created by gkatzioura on 1/5/17.
 */
@Configuration
@Profile("simple-cache")
public class SimpleDataConfig {

    @Autowired
    private ObjectMapper objectMapper;

    @Value("classpath:/users.json")
    private Resource usersJsonResource;

    @Bean
    public List<UserPayload> payloadUsers() throws IOException {

        try(InputStream inputStream = usersJsonResource.getInputStream()) {

            UserPayload[] payloadUsers = objectMapper.readValue(inputStream,UserPayload[].class);
            return Collections.unmodifiableList(Arrays.asList(payloadUsers));
        }
    }
}

Obviously in order to access the information we will use the bean instantiated containing all the user information.

Next step will be to create a repository interface to specify the methods that will be used.

package com.gkatzioura.caching.repository;

import com.gkatzioura.caching.model.UserPayload;

import java.util.List;

/**
 * Created by gkatzioura on 1/6/17.
 */
public interface UserRepository {

    List<UserPayload> fetchAllUsers();

    UserPayload firstUser();

    UserPayload userByFirstNameAndLastName(String firstName,String lastName);

}

Now let’s dive into the implementation which will contain the cache annotations needed.

package com.gkatzioura.caching.repository;

import com.gkatzioura.caching.model.UserPayload;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Repository;

import java.util.List;
import java.util.Optional;

/**
 * Created by gkatzioura on 12/30/16.
 */
@Repository
@Profile("simple-cache")
public class UserRepositoryLocal implements UserRepository {

    @Autowired
    private List<UserPayload> payloadUsers;

    private static final Logger LOGGER = LoggerFactory.getLogger(UserRepositoryLocal.class);

    @Override
    @Cacheable("alluserscache")
    public List<UserPayload> fetchAllUsers() {

        LOGGER.info("Fetching all users");

        return payloadUsers;
    }

    @Override
    @Cacheable(cacheNames = "usercache",key = "#root.methodName")
    public UserPayload firstUser() {

        LOGGER.info("fetching firstUser");

        return payloadUsers.get(0);
    }

    @Override
    @Cacheable(cacheNames = "usercache",key = "{#firstName,#lastName}")
    public UserPayload userByFirstNameAndLastName(String firstName,String lastName) {

        LOGGER.info("fetching user by firstname and lastname");

        Optional<UserPayload> user = payloadUsers.stream().filter(
                p-> p.getFirstName().equals(firstName)
                &&p.getLastName().equals(lastName))
                .findFirst();

        if(user.isPresent()) {
            return user.get();
        } else {
            return null;
        }
    }

}

Methods that contain the @Cacheable will trigger cache population contrary to methods that contain @CacheEvict which trigger cache eviction.
By using @Cacheable instead of just specifying the cache map that our values will be stored, we can proceed into specifying also keys based on the method name or the method arguments. Thus we achieve method caching.
For example the method firstUser, uses as a key the method name whilst the method userByFirstNameAndLastName uses the method arguments in order to create a key.

Two methods with the @CacheEvict annotation will empty the caches specified.

LocalCacheEvict will be the component that will handler the eviction.

package com.gkatzioura.caching.repository;

import org.springframework.cache.annotation.CacheEvict;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Component;

/**
 * Created by gkatzioura on 1/7/17.
 */
@Component
@Profile("simple-cache")
public class LocalCacheEvict {

    @CacheEvict(cacheNames = "alluserscache",allEntries = true)
    public void evictAllUsersCache() {

    }

    @CacheEvict(cacheNames = "usercache",allEntries = true)
    public void evictUserCache() {

    }

}

Since we use a very simple form of cacheh ttl eviction is not supported. Therefore we will add a scheduler only for this particular case which will evict the cache after a certain period of time.

package com.gkatzioura.caching.scheduler;

import com.gkatzioura.caching.repository.LocalCacheEvict;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Profile;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

/**
 * Created by gkatzioura on 1/7/17.
 */
@Component
@Profile("simple-cache")
public class EvictScheduler {

    @Autowired
    private LocalCacheEvict localCacheEvict;

    private static final Logger LOGGER = LoggerFactory.getLogger(EvictScheduler.class);

    @Scheduled(fixedDelay=10000)
    public void clearCaches() {

        LOGGER.info("Invalidating caches");

        localCacheEvict.evictUserCache();
        localCacheEvict.evictAllUsersCache();
    }


}

To wrap up we will use a controller to call the methods specified

package com.gkatzioura.caching.controller;

import com.gkatzioura.caching.model.UserPayload;
import com.gkatzioura.caching.repository.UserRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;

import java.util.List;

/**
 * Created by gkatzioura on 12/30/16.
 */
@RestController
public class UsersController {

    @Autowired
    private UserRepository userRepository;

    @RequestMapping(path = "/users/all",method = RequestMethod.GET)
    public List<UserPayload> fetchUsers() {

        return userRepository.fetchAllUsers();
    }

    @RequestMapping(path = "/users/first",method = RequestMethod.GET)
    public UserPayload fetchFirst() {
        return userRepository.firstUser();
    }

    @RequestMapping(path = "/users/",method = RequestMethod.GET)
    public UserPayload findByFirstNameLastName(String firstName,String lastName ) {

        return userRepository.userByFirstNameAndLastName(firstName,lastName);
    }

}

Last but not least our Application class should contain two extra annotations. @EnableScheduling is needed in order to enable schedulers and @EnableCaching in order to enable caching

package com.gkatzioura.caching;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.scheduling.annotation.EnableScheduling;

/**
 * Created by gkatzioura on 12/30/16.
 */
@SpringBootApplication
@EnableScheduling
@EnableCaching
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class,args);
    }

}

You can find the sourcecode on github.

Integrate Spring Boot and EC2 using Cloudformation

On a previous blog we integrated a spring boot application with elastic beanstalk.
The application was a servlet based application responding to requests.

On this tutorial we are going to deploy a spring boot application, which executes some scheduled tasks on an ec2 instance.
The application will be pretty much the same application taken from the official spring guide with some minor differences on packages.

The name of our application will be ec2-deployment

rootProject.name = 'ec2-deployment'

Then we will schedule a task to our spring boot application.

package com.gkatzioura.deployment.task;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

/**
 * Created by gkatzioura on 12/16/16.
 */
@Component
public class SimpleTask {

    private static final Logger LOGGER = LoggerFactory.getLogger(SimpleTask.class);

    @Scheduled(fixedRate = 5000)
    public void reportCurrentTime() {
        LOGGER.info("This is a simple task on ec2");
    }

}


Next step is to build the application and deploy it to our s3 bucket.

gradle build
aws s3 cp build/libs/ec2-deployment-1.0-SNAPSHOT.jar s3://{your bucket name}/ec2-deployment-1.0-SNAPSHOT.jar 

What comes next is a bootstrapping script in order to run our application once the server is up and running.

#!/usr/bin/env bash
aws s3 cp s3://{bucket with code}/ec2-deployment-1.0-SNAPSHOT.jar /home/ec2-user/ec2-deployment-1.0-SNAPSHOT.jar
sudo yum -y install java-1.8.0
sudo yum -y remove java-1.7.0-openjdk
cd /home/ec2-user/
sudo nohup java -jar ec2-deployment-1.0-SNAPSHOT.jar > ec2dep.log

This script is pretty much self explanatory. We download the application from the bucket we uploaded it previously, we install the java version needed and then we run the application (this script serves us for example purposes, there are certainly many ways to set up you java application running on linux).

Next step would be to proceed to our cloudformation script. Since we will download our application from s3 it is essential to have an IAM policy that will allow us to download items from the s3 bucket we used previously. Therefore we will create a role with the policy needed

"RootRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version" : "2012-10-17",
          "Statement": [ {
            "Effect": "Allow",
            "Principal": {
              "Service": [ "ec2.amazonaws.com" ]
            },
            "Action": [ "sts:AssumeRole" ]
          } ]
        },
        "Path": "/",
        "Policies": [ {
          "PolicyName": "root",
          "PolicyDocument": {
            "Version" : "2012-10-17",
            "Statement": [ {
              "Effect": "Allow",
              "Action": [
                "s3:Get*",
                "s3:List*"
              ],
              "Resource": {"Fn::Join" : [ "", [ "arn:aws:s3:::", {"Ref":"SourceCodeBucket"},"/*"] ] }
            } ]
          }
        } ]
      }
    }

Next step is to encode our bootstrapping script to Base64 in order to be able to pass it as user data.
Once the ec2 instance is up and running it will run the shell commands previously specified.

Last step is to create our instance profile and specify the ec2 instance to be launched

    "RootInstanceProfile": {
      "Type": "AWS::IAM::InstanceProfile",
      "Properties": {
        "Path": "/",
        "Roles": [ {
          "Ref": "RootRole"
        } ]
      }
    },
    "Ec2Instance":{
      "Type":"AWS::EC2::Instance",
      "Properties":{
        "ImageId":"ami-9398d3e0",
        "InstanceType":"t2.nano",
        "KeyName":"TestKey",
        "IamInstanceProfile": {"Ref":"RootInstanceProfile"},
"UserData":"IyEvdXNyL2Jpbi9lbnYgYmFzaA0KYXdzIHMzIGNwIHMzOi8ve2J1Y2tldCB3aXRoIGNvZGV9L2VjMi1kZXBsb3ltZW50LTEuMC1TTkFQU0hPVC5qYXIgL2hvbWUvZWMyLXVzZXIvZWMyLWRlcGxveW1lbnQtMS4wLVNOQVBTSE9ULmphcg0Kc3VkbyB5dW0gLXkgaW5zdGFsbCBqYXZhLTEuOC4wDQpzdWRvIHl1bSAteSByZW1vdmUgamF2YS0xLjcuMC1vcGVuamRrDQpjZCAvaG9tZS9lYzItdXNlci8NCnN1ZG8gbm9odXAgamF2YSAtamFyIGVjMi1kZXBsb3ltZW50LTEuMC1TTkFQU0hPVC5qYXIgPiBlYzJkZXAubG9n"
      }
    }

KeyName stands for the ssh key name, in case you want to login to the ec2 instance.

So we are good to go and create our cloudformation stack. You have to add the CAPABILITY_IAM flag.

aws s3 cp ec2spring.template s3://{bucket with templates}/ec2spring.template
aws cloudformation create-stack --stack-name SpringEc2 --parameters ParameterKey=SourceCodeBucket,ParameterValue={bucket with code} --template-url https://s3.amazonaws.com/{bucket with templates}/ec2spring.template --capabilities CAPABILITY_IAM

That’s it. Now you have your spring application up and running on top of an ec2 instance.
You can download the source code from GitHub.

Integrate Spring boot and Elastic Beanstalk using Cloudformation

AWS beanstalk is an amazon web service that does most of the configuration for you and creates an infrastructure suitable for a horizontally scalable application. Instead of Beanstalk the other approach would be to configure load balancers and auto scalling groups, which requires a bit of AWS expertise and time.

On this tutorial we are going to upload a spring boot jar application using amazon elastic beanstalk and a cloud formation bundle.

Less is more therefore we are going to use pretty much the same spring boot application taken from the official Spring guide as a template.

The only change would be to alter the rootProject.name to beanstalk-deployment and some changes on the package structure. Downloading the project from github is sufficient.

Then we can build and run the project

gradlew build
java -jar build/libs/beanstalk-deployment-1.0-SNAPSHOT.jar 

Next step is to upload the application to s3.

aws s3 cp build/libs/beanstalk-deployment-1.0-SNAPSHOT.jar s3://{you bucket name}/beanstalk-deployment-1.0-SNAPSHOT.jar

You need to install the elastic beanstalk client since it helps a lot with most beanstalk operations.

Since we will use Java 8 I would get a list with elastic beanstalk environments in order to retrieve the correct SolutionStackName.

aws elasticbeanstalk list-available-solution-stacks |grep Java 

Based on the results I will use the “64bit Amazon Linux 2016.09 v2.3.0 running Java 8” stackname.

Now we are ready to proceed to our cloudformation script.

We will specify a parameter and this will be the bucket containing the application code

  "Parameters" : {
    "SourceCodeBucket" : {
      "Type" : "String"
    }
  }

Then we will specify the name of the application

    "SpringBootApplication": {
      "Type": "AWS::ElasticBeanstalk::Application",
      "Properties": {
        "Description":"Spring boot and elastic beanstalk"
      }
    }

Next step will be to specify the application version

    "SpringBootApplicationVersion": {
      "Type": "AWS::ElasticBeanstalk::ApplicationVersion",
      "Properties": {
        "ApplicationName":{"Ref":"SpringBootApplication"},
        "SourceBundle": {
                  "S3Bucket": {"Ref":"SourceCodeBucket"},
                  "S3Key": "beanstalk-deployment-1.0-SNAPSHOT.jar"
        }
      }
    }

And then we specify our configuration template.

    "SpringBootBeanStalkConfigurationTemplate": {
      "Type": "AWS::ElasticBeanstalk::ConfigurationTemplate",
      "Properties": {
        "ApplicationName": {"Ref":"SpringBootApplication"},
        "Description":"A display of speed boot application",
        "OptionSettings": [
          {
            "Namespace": "aws:autoscaling:asg",
            "OptionName": "MinSize",
            "Value": "2"
          },
          {
            "Namespace": "aws:autoscaling:asg",
            "OptionName": "MaxSize",
            "Value": "2"
          },
          {
            "Namespace": "aws:elasticbeanstalk:environment",
            "OptionName": "EnvironmentType",
            "Value": "LoadBalanced"
          }
        ],
        "SolutionStackName": "64bit Amazon Linux 2016.09 v2.3.0 running Java 8"
      }
    }

The last step would be to glue the above properties by defining an environment

    "SpringBootBeanstalkEnvironment": {
      "Type": "AWS::ElasticBeanstalk::Environment",
      "Properties": {
        "ApplicationName": {"Ref":"SpringBootApplication"},
        "EnvironmentName":"JavaBeanstalkEnvironment",
        "TemplateName": {"Ref":"SpringBootBeanStalkConfigurationTemplate"},
        "VersionLabel": {"Ref": "SpringBootApplicationVersion"}
      }
    }

Now you are ready to upload your cloudformation template and deploy your beanstalk application

aws s3 cp beanstalkspring.template s3://{bucket with templates}/beanstalkspring.template
aws cloudformation create-stack --stack-name SpringBeanStalk --parameters ParameterKey=SourceCodeBucket,ParameterValue={bucket with code} --template-url https://s3.amazonaws.com/{bucket with templates}/beanstalkspring.template

You can download the full sourcecode and the cloudformation template from Github.

Embed Jython to you java codebase.

Jython is a great tool for some quick java scripts using a pretty solid syntax. Actually it works wonderfully when it comes to implement some maintenance or monitoring scripts with jmx for you java apps.

In case you work with other teams with a python background, it makes absolute sense to integrate python to your java applications.

First let’s import the jython interpeter using the standalone version.

group 'com.gkatzioura'
version '1.0-SNAPSHOT'

apply plugin: 'java'

sourceCompatibility = 1.5

repositories {
    mavenCentral()
}

dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.11'
    compile group: 'org.python', name: 'jython-standalone', version: '2.7.0'
}

So the easiest thing to do is just to execute a python file in our class path. The file would be hello_world.py

print "Hello World"

And then pass the file as an inputstream to the interpeter

package com.gkatzioura;

import org.python.core.PyClass;
import org.python.core.PyInteger;
import org.python.core.PyObject;
import org.python.core.PyObjectDerived;
import org.python.util.PythonInterpreter;

import java.io.InputStream;

/**
 * Created by gkatzioura on 19/10/2016.
 */
public class JythonCaller {

    private PythonInterpreter pythonInterpreter;

    public JythonCaller() {
        pythonInterpreter = new PythonInterpreter();
    }

    public void invokeScript(InputStream inputStream) {

        pythonInterpreter.execfile(inputStream);
    }

}
    @Test
    public void testInvokeScript() {

        InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("hello_world.py");
        jythonCaller.invokeScript(inputStream);
    }

Next step is to create a python class file and and another python file that will import the class file and instantiate a class.

The class file would be divider.py.

class Divider:

    def divide(self,numerator,denominator):

        return numerator/denominator;

And the file importing the Divider class would be classcaller.py

from divider import Divider

divider = Divider()

print divider.divide(10,5);

So let us test it

    @Test
    public void testInvokeClassCaller() {

        InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("classcaller.py");
        jythonCaller.invokeScript(inputStream);
    }

What we can understand from this example is that the interpreter imports successfully the files from the classpath.

Running files using the interpreter is ok, however we need to fully utilize classes and functions implemented in python.
Therefore next step is to create a python class and use its functions using java.

package com.gkatzioura;

import org.python.core.PyClass;
import org.python.core.PyInteger;
import org.python.core.PyObject;
import org.python.core.PyObjectDerived;
import org.python.util.PythonInterpreter;

import java.io.InputStream;

/**
 * Created by gkatzioura on 19/10/2016.
 */
public class JythonCaller {

    private PythonInterpreter pythonInterpreter;

    public JythonCaller() {
        pythonInterpreter = new PythonInterpreter();
    }

    public void invokeClass() {

        pythonInterpreter.exec("from divider import Divider");
        PyClass dividerDef = (PyClass) pythonInterpreter.get("Divider");
        PyObject divider = dividerDef.__call__();
        PyObject pyObject = divider.invoke("divide",new PyInteger(20),new PyInteger(4));

        System.out.println(pyObject.toString());
    }

}

You can find the sourcecode on github.

Java on the AWS cloud using Lambda, Api Gateway and CloudFormation

On a previous post we implemented a java based aws lambda function and deployed it using CloudFormation.

Since we have our lambda function set up we will integrate it with a http endpoint using AWS API Gateway.

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. With a few clicks in the AWS Management Console, you can create an API that acts as a “front door” for applications to access data, business logic, or functionality from your back-end services, such as workloads running on Amazon Elastic Compute Cloud (Amazon EC2), code running on AWS Lambda, or any Web application

For this example imagine API gateway as if it is an HTTP Connector.

We will change our original function in order to implement a division.

package com.gkatzioura.deployment.lambda;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;

import java.math.BigDecimal;
import java.util.Map;
import java.util.logging.Logger;

/**
 * Created by gkatzioura on 9/10/2016.
 */
public class RequestFunctionHandler implements RequestHandler<Map<String,String>,String> {

    private static final String NUMERATOR_KEY = "numerator";
    private static final String DENOMINATOR_KEY = "denominator";

    private static final Logger LOGGER = Logger.getLogger(RequestFunctionHandler.class.getName());

    public String handleRequest(Map <String,String> values, Context context) {

        LOGGER.info("Handling request");

        if(!values.containsKey(NUMERATOR_KEY)||!values.containsKey(DENOMINATOR_KEY)) {
            return "You need both numberator and denominator";
        }

        try {
            BigDecimal numerator = new BigDecimal(values.get(NUMERATOR_KEY));
            BigDecimal denominator= new BigDecimal(values.get(DENOMINATOR_KEY));
            return  numerator.divide(denominator).toString();
        } catch (Exception e) {
            return "Please provide valid values";
        }
    }

}

Then we will change our lambda code and update it on s3.

aws s3 cp build/distributions/JavaLambdaDeployment.zip s3://lambda-functions/JavaLambdaDeployment.zip

Next step is to update our CloudFormation template and add the api gateway forwarding requests to our lambda function.

First we have to declare our Rest api

    "AGRA16PAA": {
      "Type": "AWS::ApiGateway::RestApi",
      "Properties": {"Name": "CalculationApi"}
    }

Then we need to add a rest resource. Inside the DependsOn element we can see the id of our rest api. Therefore cloudwatch will create the resource after the rest api has been created.

"AGR2JDQ8": {
      "Type": "AWS::ApiGateway::Resource",
      "Properties": {
        "RestApiId": {"Ref": "AGRA16PAA"},
        "ParentId": {
          "Fn::GetAtt": ["AGRA16PAA","RootResourceId"]
        },
        "PathPart": "divide"
      },
      "DependsOn": [
        "AGRA16PAA"
      ]
    }

Another crucial part is to add a permission in order to be able to invoke our lambda function.

    "LPI6K5": {
      "Type": "AWS::Lambda::Permission",
      "Properties": {
        "Action": "lambda:invokeFunction",
        "FunctionName": {"Fn::GetAtt": ["LF9MBL", "Arn"]},
        "Principal": "apigateway.amazonaws.com",
        "SourceArn": {"Fn::Join": ["",
          ["arn:aws:execute-api:", {"Ref": "AWS::Region"}, ":", {"Ref": "AWS::AccountId"}, ":", {"Ref": "AGRA16PAA"}, "/*"]
        ]}
      }
    }

Last step would be to add the api gateway method in order to be able to invoke our lambda function from the api gateway. Furthermore we will add an api gateway deployment instruction.

"Deployment": {
      "Type": "AWS::ApiGateway::Deployment",
      "Properties": {
        "RestApiId": { "Ref": "AGRA16PAA" },
        "Description": "First Deployment",
        "StageName": "StagingStage"
      },
      "DependsOn" : ["AGM25KFD"]
    },
    "AGM25KFD": {
      "Type": "AWS::ApiGateway::Method",
      "Properties": {
        "AuthorizationType": "NONE",
        "HttpMethod": "POST",
        "ResourceId": {"Ref": "AGR2JDQ8"},
        "RestApiId": {"Ref": "AGRA16PAA"},
        "Integration": {
          "Type": "AWS",
          "IntegrationHttpMethod": "POST",
          "IntegrationResponses": [{"StatusCode": 200}],
          "Uri": {
            "Fn::Join": [
              "",
              [
                "arn:aws:apigateway:",
                {"Ref": "AWS::Region"},
                ":lambda:path/2015-03-31/functions/",
                {"Fn::GetAtt": ["LF9MBL", "Arn"]},
                "/invocations"
              ]
            ]
          }
        },
        "MethodResponses": [{
          "StatusCode": 200
        }]
      }

So we ended up with our new cloudwatch configuration.

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "LF9MBL": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": "lambda-functions",
          "S3Key": "JavaLambdaDeployment.zip"
        },
        "FunctionName": "SimpleRequest",
        "Handler": "com.gkatzioura.deployment.lambda.RequestFunctionHandler",
        "MemorySize": 128,
        "Role": "arn:aws:iam::274402012893:role/lambda_basic_execution",
        "Runtime": "java8"
      }
    },
    "Deployment": {
      "Type": "AWS::ApiGateway::Deployment",
      "Properties": {
        "RestApiId": { "Ref": "AGRA16PAA" },
        "Description": "First Deployment",
        "StageName": "StagingStage"
      },
      "DependsOn" : ["AGM25KFD"]
    },
    "AGM25KFD": {
      "Type": "AWS::ApiGateway::Method",
      "Properties": {
        "AuthorizationType": "NONE",
        "HttpMethod": "POST",
        "ResourceId": {"Ref": "AGR2JDQ8"},
        "RestApiId": {"Ref": "AGRA16PAA"},
        "Integration": {
          "Type": "AWS",
          "IntegrationHttpMethod": "POST",
          "IntegrationResponses": [{"StatusCode": 200}],
          "Uri": {
            "Fn::Join": [
              "",
              [
                "arn:aws:apigateway:",
                {"Ref": "AWS::Region"},
                ":lambda:path/2015-03-31/functions/",
                {"Fn::GetAtt": ["LF9MBL","Arn"]},
                "/invocations"
              ]
            ]
          }
        },
        "MethodResponses": [{"StatusCode": 200}]
      },
      "DependsOn": ["LF9MBL","AGR2JDQ8","LPI6K5"]
    },
    "AGR2JDQ8": {
      "Type": "AWS::ApiGateway::Resource",
      "Properties": {
        "RestApiId": {"Ref": "AGRA16PAA"},
        "ParentId": {
          "Fn::GetAtt": ["AGRA16PAA","RootResourceId"]
        },
        "PathPart": "divide"
      },
      "DependsOn": ["AGRA16PAA"]
    },
    "AGRA16PAA": {
      "Type": "AWS::ApiGateway::RestApi",
      "Properties": {
        "Name": "CalculationApi"
      }
    },
    "LPI6K5": {
      "Type": "AWS::Lambda::Permission",
      "Properties": {
        "Action": "lambda:invokeFunction",
        "FunctionName": {"Fn::GetAtt": ["LF9MBL", "Arn"]},
        "Principal": "apigateway.amazonaws.com",
        "SourceArn": {"Fn::Join": ["",
          ["arn:aws:execute-api:", {"Ref": "AWS::Region"}, ":", {"Ref": "AWS::AccountId"}, ":", {"Ref": "AGRA16PAA"}, "/*"]
        ]}
      }
    }
 }
}

Last but not least, we have to update our previous cloudformation stack.

So we uploaded our latest template

aws s3 cp cloudformationjavalambda2.template s3://cloudformation-templates/cloudformationjavalambda2.template

And all we have to do is to update our stack.

aws cloudformation update-stack --stack-name JavaLambdaStack --template-url https://s3.amazonaws.com/cloudformation-templates/cloudformationjavalambda2.template

Our stack has just been updated.
We can got to our api gateway endpoint and try to issue a post.

curl -H "Content-Type: application/json" -X POST -d '{"numerator":1,"denominator":"2"}' https://{you api gateway endpoint}/StagingStage/divide
"0.5"

You can find the sourcecode on github.