Sunday, August 2, 2015

Presto DB Oracle Connector

Hi guys! So Have you ever tried Presto DB? It is awesome.

I used it to integrate Hive, ElasticSearch and Oracle. It is very fast, I mean not real time, but fast.

When I tried to use it to integrate with Oracle I got a surprise. There is no Oracle plugin.

Then I scream and cry and create my own Presto Oracle Plugin. XD

If you have the same issue, please take a look. Just click in the link below:

Presto Oracle Connector


Regards.

Sunday, March 29, 2015

How to do Aggregations in ElasticSearch

Hi guys, today I will show you how to create Aggregations on ElasticSearch. Get ready.

First of all we will create our index my_tutorial. To create we will use the curl. Take a look:
 curl -XPOST http://localhost:9200/my_tutorial -d '{}'  

I didn't put anything in the index settings, but you can put any configuration you want. This is just an example.

Now we need to fill our index with data.
 curl -XPOST http://localhost:9200/my_tutorial/books -d '{"title": "Harry Potter and The Deathly Hallows","tags": ["fantasy","adventure"]}'  
 curl -XPOST http://localhost:9200/my_tutorial/books -d '{"title": "The Hobbit","tags": ["fantasy","adventure"]}'  
 curl -XPOST http://localhost:9200/my_tutorial/books -d '{"title": "I, Robot","tags": ["sci-fi"]}'  
 curl -XPOST http://localhost:9200/my_tutorial/books -d '{"title": "Mastering ElasticSearch","tags": ["tech"]}'  
 curl -XPOST http://localhost:9200/my_tutorial/books -d '{"title": "Orange Clockwork","tags": ["sci-fi"]}'  
 curl -XPOST http://localhost:9200/my_tutorial/books -d '{"title": "1984","tags": ["sci-fi"]}'  

So, now we just need to execute the aggregation query. I will put the version with put and another one with just the json.
 curl -XPOST "http://localhost:9200/my_tutorial/_search?search_type=count&pretty=true" -d '{"query": {"match_all":{}},"aggs": {"my_aggregation_name": {"terms": {"field": "tags"}}}}'  

Just json:
 {  
      "query": {  
           "match_all":{}  
      },  
      "aggs": {  
           "my_aggregation_name": {  
                "terms": {"field": "tags"}  
           }  
      }  
 }  
Actually you don`t need to put the query clause, but it is the costume.

And then your results would be:
 {  
  "took" : 5,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 6,  
   "max_score" : 0.0,  
   "hits" : [ ]  
  },  
  "aggregations" : {  
   "my_aggregation_name" : {  
    "buckets" : [ {  
     "key" : "fi",  
     "doc_count" : 3  
    }, {  
     "key" : "sci",  
     "doc_count" : 3  
    }, {  
     "key" : "adventure",  
     "doc_count" : 2  
    }, {  
     "key" : "fantasy",  
     "doc_count" : 2  
    }, {  
     "key" : "tech",  
     "doc_count" : 1  
    } ]  
   }  
  }  
 }  
As you can see our tags sci-fi were divided. This happens because the default configuration for any field in ES is analyzed. To change this behavior to sci-fi we will need to change our document mapping. But it is subject to another -XPOST.

See you.