A guide to the InfluxDBMapper and QueryBuilder for Java Part: 1

With the release of latest influxdb-java driver version came along the InfluxbMapper.

To get started we need to spin up an influxdb instance, and docker is the easiest way to do so. We just follow the steps as described here.

Now we have a database with some data and we are ready to execute our queries.

We have the measure h2o_feet

> SELECT * FROM "h2o_feet"

name: h2o_feet
time                   level description      location       water_level
2015-08-18T00:00:00Z   below 3 feet           santa_monica   2.064
2015-08-18T00:00:00Z   between 6 and 9 feet   coyote_creek   8.12
2015-09-18T21:36:00Z   between 3 and 6 feet   santa_monica   5.066
2015-09-18T21:42:00Z   between 3 and 6 feet   santa_monica   4.938

So we shall create a model for that.

package com.gkatzioura.mapper.showcase;

import java.time.Instant;
import java.util.concurrent.TimeUnit;

import org.influxdb.annotation.Column;
import org.influxdb.annotation.Measurement;

@Measurement(name = "h2o_feet", database = "NOAA_water_database", timeUnit = TimeUnit.SECONDS)
public class H2OFeetMeasurement {

    @Column(name = "time")
    private Instant time;

    @Column(name = "level description")
    private String levelDescription;

    @Column(name = "location")
    private String location;

    @Column(name = "water_level")
    private Double waterLevel;

    public Instant getTime() {
        return time;

    public void setTime(Instant time) {
        this.time = time;

    public String getLevelDescription() {
        return levelDescription;

    public void setLevelDescription(String levelDescription) {
        this.levelDescription = levelDescription;

    public String getLocation() {
        return location;

    public void setLocation(String location) {
        this.location = location;

    public Double getWaterLevel() {
        return waterLevel;

    public void setWaterLevel(Double waterLevel) {
        this.waterLevel = waterLevel;

And the we shall fetch all the entries of the h2o_feet measurement.

package com.gkatzioura.mapper.showcase;

import java.util.List;
import java.util.logging.Logger;

import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;
import org.influxdb.impl.InfluxDBImpl;
import org.influxdb.impl.InfluxDBMapper;

public class InfluxDBMapperShowcase {

    private static final Logger LOGGER = Logger.getLogger(InfluxDBMapperShowcase.class.getName());

    public static void main(String[] args) {

        InfluxDB influxDB = InfluxDBFactory.connect("http://localhost:8086", "root", "root");

        InfluxDBMapper influxDBMapper = new InfluxDBMapper(influxDB);
        List h2OFeetMeasurements = influxDBMapper.query(H2OFeetMeasurement.class);


After being successful on fetching the data we will continue with persisting data.

        H2OFeetMeasurement h2OFeetMeasurement = new H2OFeetMeasurement();
        h2OFeetMeasurement.setLevelDescription("Just a test");


        List measurements = influxDBMapper.query(H2OFeetMeasurement.class);

        H2OFeetMeasurement h2OFeetMeasurement1 = measurements.get(measurements.size()-1);
        assert h2OFeetMeasurement1.getLevelDescription().equals("Just a test");

Apparently fetching all the measurements to get the last entry is not the most efficient thing to do. In the upcoming tutorials we are going to see how we use the InfluxDBMapper with advanced InfluxDB queries.

Run WordCount with Scala and Spark on HDInsight

Previously we tried to solve the word count problem with a Scala and Spark approach.
The next step is to deploy our solution to HDInsight using spark, hdfs, and scala

We shall provision a Sprak cluster.


Since we are going to use HDInsight we can utilize hdfs and therefore use the azure storage.


Then we choose our instance types.


And we are ready to create the Spark cluster.


Our data shall be uploaded to the hdfs file system
To do so we will upload our text files to the azure storage account which is integrated with hdfs.

For more information on managing a storage account with azure cli check the official guide. Any text file will work.

azure storage blob upload mytextfile.txt sparkclusterscala  example/data/mytextfile.txt

Since we use hdfs we shall make some changes to the original script

val text = sc.textFile("wasb:///example/data/mytextfile.txt")
val counts = text.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

Then we can upload our scala class to the head node using ssh

scp WordCountscala.scala demon@{your cluster}-ssh.azurehdinsight.net:/home/demo/WordCountscala.scala

Again in order to run the script, things are pretty straightforward.

spark-shell -i WordCountscala.scala

And once the task is done we are presented with the spark prompt. Plus we can now save our results to the hdfs file system.

scala> counts.saveAsTextFile("/wordcount_results")

And do a quick check.

hdfs dfs -ls /wordcount_results/
hdfs dfs -text /wordcount_results/part-00000

WordCount with Sprak and Scala

Apache Spark has taken over the big data world. Spark is implemented with Scala and is well know for its performance.

In the previous blogs we approached the word count problem by using Scala with hadoop and Scala with storm.
On this blog we will utilize Spark for the word count problem.

Submitting spark jobs implemented with Scala is pretty easy and convenient. All we need is to submit our file as our input to the spark command.

First we have to download and setup a spark version locally.

Then will shall download a text file for testing. In my case the script from MGS2 did the work.

Now on to the WordCount script. For local testing we will use a file from our file system.

val text = sc.textFile("mytextfile.txt")
val counts = text.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

Next step is to run the script

spark-shell -i WordCountscala.scala

Once finished a Spark command prompt will appear and we are free to do some experiments with the word count results

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala> res0.length
res1: Int = 20159

Thus we detected 20159 different words.

Our next step is to run our job to a spark cluster on HDInsight.

Run Scala implemented Storm topologies on HDInsight

Previously we set up a Scala implemented storm topology in order to count words.

What comes next is uploading our topology to HDInsight.

So we shall proceed in creating a Storm topology on HDInsight.


Then we choose the instance types.


Next step is to upload our jar file to the head node in order to deploy it. We can use scp for this purpose.

scp target/scala-2.12/ScalaStorm-assembly-1.0.jar  {your user}@{your azure endpoint}:/home/demo

Now we can ssh to our storm cluster’s head node and issue the storm command.

storm jar ScalaStorm-assembly-1.0.jar com.gkatzioura.scala.storm.WordCountTopology word-count-stream-scala

Then we can check our topology by navigating to https://{your cluster}.azurehdinsight.net/stormui

WordCount with Storm and Scala

Apache Storm is a free and open source distributed realtime computation system running on the JVM.
To get started we will implement a very simple example. Previously we implemented a word count hadoop job using scala and we uploaded it to hdinsight.
We will focus on the same word count concept but for real time cases and implement a word count topology utilizing apache storm. Our source code will be based on the official storm examples.

Storm works with spouts and bolts.

First We shall implement a spout which will emit fake data events. In our case sentences.

package com.gkatzioura.scala.storm

import org.apache.storm.spout.SpoutOutputCollector
import org.apache.storm.task.TopologyContext
import org.apache.storm.topology.OutputFieldsDeclarer
import org.apache.storm.topology.base.BaseRichSpout
import org.apache.storm.tuple.{Fields, Values}
import org.apache.storm.utils.Utils

import scala.util.Random

  * Created by gkatzioura on 2/17/17.
class RandomSentenceSpout extends BaseRichSpout {

  var _collector:SpoutOutputCollector = _
  var _rand:Random = _

  override def nextTuple(): Unit = {


    val sentences = Array("the cow jumped over the moon","an apple a day keeps the doctor away",
      "four score and seven years ago","snow white and the seven dwarfs","i am at two with nature")
    val sentence = sentences(_rand.nextInt(sentences.length))
    _collector.emit(new Values(sentence))

  override def open(conf: java.util.Map[_, _], context: TopologyContext, collector: SpoutOutputCollector): Unit = {
    _collector = collector
    _rand = Random

  override def declareOutputFields(declarer: OutputFieldsDeclarer): Unit = {
    declarer.declare(new Fields("word"))


Next step is to implement a bolt which splits the sentences and emits them.

package com.gkatzioura.scala.storm

import java.text.BreakIterator

import org.apache.storm.topology.{BasicOutputCollector, OutputFieldsDeclarer}
import org.apache.storm.topology.base.BaseBasicBolt
import org.apache.storm.tuple.{Fields, Tuple, Values}

  * Created by gkatzioura on 2/18/17.
class SplitSentenceBolt extends BaseBasicBolt {

  override def execute(input: Tuple, collector: BasicOutputCollector): Unit = {

    val sentence = input.getString(0)
    val boundary = BreakIterator.getWordInstance

    var start = boundary.first
    var end:Int = start

    while(end!=BreakIterator.DONE) {

      end = boundary.next
      val word = sentence.substring(start,end).replaceAll("\\s+","")
      start = end
      if(!word.equals("")) {
        collector.emit(new Values(word))

  override def declareOutputFields(declarer: OutputFieldsDeclarer): Unit = {
    declarer.declare(new Fields("word"))

And the last step is the word count bolt.

package com.gkatzioura.scala.storm

import org.apache.storm.topology.{BasicOutputCollector, OutputFieldsDeclarer}
import org.apache.storm.topology.base.BaseBasicBolt
import org.apache.storm.tuple.{Fields, Tuple, Values}

  * Created by gkatzioura on 2/18/17.
class WordCountBolt extends BaseBasicBolt{

  val counts = scala.collection.mutable.Map[String,Int]()

  override def execute(input: Tuple, collector: BasicOutputCollector): Unit = {

    val word = input.getString(0)

    val optCount = counts.get(word)
    if(optCount.isEmpty) {
    } else {

    collector.emit(new Values(word,counts))

  override def declareOutputFields(declarer: OutputFieldsDeclarer): Unit = {

    declarer.declare(new Fields("word","count"));


The final step is to create our topology which takes care whether we run locally or in a cluster environment.

package com.gkatzioura.scala.storm

import org.apache.storm.{Config, LocalCluster, StormSubmitter}
import org.apache.storm.topology.TopologyBuilder
import org.apache.storm.tuple.Fields

  * Created by gkatzioura on 2/18/17.
object WordCountTopology  {

  def main(args: Array[String]): Unit = {
    println("Hello, world!")
    val builder = new TopologyBuilder
    builder.setSpout("spout", new RandomSentenceSpout, 5)
    builder.setBolt("split", new SplitSentenceBolt, 8).shuffleGrouping("spout")
    builder.setBolt("count", new WordCountBolt, 12).fieldsGrouping("split", new Fields("word"))

    val conf = new Config()

    if (args != null && args.length > 0) {
      StormSubmitter.submitTopology(args(0), conf, builder.createTopology())
    else {
      val cluster = new LocalCluster
      cluster.submitTopology("word-count", conf, builder.createTopology())

Now we shall build our app. To do so we need to include the assembly plugin in our plugins.sbt file.

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

Our sbt file is as follows

name := "ScalaStorm"

version := "1.0"

scalaVersion := "2.12.1"

scalacOptions += "-Yresolve-term-conflict:package"

libraryDependencies += "org.apache.storm" % "storm-core" % "1.0.2" % "provided"

And then we issue a build

sbt clean compile assembly

You can find the sourcecode on github.

On the next post we shall deploy our Storm app to HDInsight.

Run Scala implemented Hadoop Jobs on HDInsight

Previously we set up a Scala application in order to execute a simple word count on hadoop.

What comes next is uploading our application to HDInsight.

So we shall proceed in creating a Hadoop cluster on HDInsight.


Then we will create the hadoop cluster.


As you can see we specify the admin console credentials and the ssh user to login to the head node.

Our hadoop cluster will be backed by an azure storage account.


Then it is time to upload our text files to the azure storage account.

For more information on managing a storage account with azure cli check the official guide. Any text file will work.

azure storage blob upload mytext.txt scalahadoopexample  example/data/input.txt

Now we can ssh to our Hadoop node.

First let’s run the examples that come packaged with the HInsight hadoop cluster.

hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /example/data/input.txt /example/data/results

Check the results

hdfs dfs -text /example/data/results/part-r-00000

And then we are ready to scp the scala code to our hadoop node and issue as wordcount.

hadoop jar ScalaHadoop-assembly-1.0.jar /example/data/input.txt /example/data/results2

And again check the results

hdfs dfs -text /example/data/results2/part-r-00000

That’s it! HDinsight makes it pretty straight forward!

WordCount on Hadoop with Scala

Hadoop is a great technology built with java.

Today we will use Scala to implement a simple map reduce job and then run it using HDInsight.

We shall add the assembly plugin on our assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

Then we will add the Hadoop core dependency on our build.sbt file. Also will we apply some configuration in the merge strategy to avoid deduplicate errors.

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first

libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.2.1"

We will use WordCount as an example.
The original Java class shall be transformed to a Scala class.

package com.gkatzioura.scala

import java.lang.Iterable
import java.util.StringTokenizer

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{IntWritable, Text}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}
import scala.collection.JavaConverters._

  * Created by gkatzioura on 2/14/17.
package object WordCount {

  class TokenizerMapper extends Mapper[Object, Text, Text, IntWritable] {

    val one = new IntWritable(1)
    val word = new Text()

    override def map(key: Object, value: Text, context: Mapper[Object, Text, Text, IntWritable]#Context): Unit = {
      val itr = new StringTokenizer(value.toString)
      while (itr.hasMoreTokens()) {
        context.write(word, one)

  class IntSumReader extends Reducer[Text,IntWritable,Text,IntWritable] {
    override def reduce(key: Text, values: Iterable[IntWritable], context: Reducer[Text, IntWritable, Text, IntWritable]#Context): Unit = {
      var sum = values.asScala.foldLeft(0)(_ + _.get)
      context.write(key, new IntWritable(sum))

  def main(args: Array[String]): Unit = {
    val configuration = new Configuration
    val job = Job.getInstance(configuration,"word count")
    FileInputFormat.addInputPath(job, new Path(args(0)))
    FileOutputFormat.setOutputPath(job, new Path(args(1)))
    System.exit(if(job.waitForCompletion(true))  0 else 1)


Then we will build our example

sbt clean compile assembly

Our new jar will reside on target/scala-2.12/ScalaHadoop-assembly-1.0.jar

On the next post we shall run our code using Azure’s HDInsight.

You can find the code on github.