Query Components

Datasources

data source는 table, union, query 세종류로 사용할 수 있음

Table Data Source

{
    "type": "table",
    "name": "<datasource_name>"
}

Union Data Source

Union의 같은 경우 사용하려면 BrokerRouter에게 요청해야함

데이터 소스들은 동일한 스키마를 가져야함

{
    "type": "union",
    "dataSources": ["<datasource_name1>", "<datasource_name2>"]
}

Query Data Source

groupBy Query만 사용가능

nested groupBy 를 위한

{
    "type": "query",
    "query": {
        "type": "groupBy",
        ...
    }
}

DimensionSpec

Query에서 dimension을 바꿀 수 있음

Default DimensionSpec

{
  "type" : "default",
  "dimension" : <dimension>,
  "outputName": <output_name>,
  "outputType": <"STRING"|"LONG"|"FLOAT"> // STRING default
}

Extraction DimensionSpec

extraction_function을 사용해서 Dimension을 바꿀 수 있음

{
  "type" : "extraction",
  "dimension" : <dimension>,
  "outputName" :  <output_name>,
  "outputType": <"STRING"|"LONG"|"FLOAT">,
  "extractionFn" : <extraction_function>
}

Filtered DimensionSpecs

Lookup DimensionSpecs

Output Types

Regular Expression

  • Regular Expression

  • Partial

  • Search Query

  • Substring

  • Strlen

  • Time Format

  • Time Parsing

  • Javascript

  • Lookup

  • Registered Lookup

  • Cascade

  • String Format

  • Upper and Lower

  • Bucket

Filtered DimensionSpec

multi-value dimensions에 대해서 filter를 걸 수 있음

직접 value를 넣는 listFiltered와 정규표현식을 사용하는 regexFiltered 가 있음

{
    "type" : "listFiltered",
    "delegate" : <dimensionSpec>,
    "values": <array of strings>,
    "isWhitelist": <optional attribute for true/false, default is true> }
{
    "type" : "regexFiltered",
    "delegate" : <dimensionSpec>,
    "pattern": <java regex pattern>
}

Lookup DimensionSpec

replaceMissingValueWith""로 설정하면 null로 설정하는 것과 같음

default는 replaceMissingValueWithnull, retainMissingValuefalse

{
  "type":"lookup",
  "dimension":"dimensionName",
  "outputName":"dimensionOutputName",
  "replaceMissingValueWith":"missing_value",
  "retainMissingValue":false,
  "lookup":{"type": "map", "map":{"key":"value"}, "isOneToOne":false}
}

Limit Spec

Default LimitSpec

{
    "type"    : "default",
    "limit"   : <integer_value>,
    "columns" : [list of OrderByColumnSpec],
}

OrderByColumnSpec

{
    "dimension" : "<Any dimension or metric name>",
    "direction" : <"ascending"|"descending">,
    "dimensionOrder" : <"lexicographic"(default)|"alphanumeric"|"strlen"|"numeric">
}

Having

Query의 결과에 대해서 Filtering

{
    "type" : "filter",
    "filter" : <any Druid query filter>
}

Query Filter

  • Numeric Filters

    • Equal To

      {
          "type": "equalTo",
          "aggregation": "<aggregate_metric>",
          "value": <numeric_value>
      }
      
    • Greater Than

      {
          "type": "greaterThan",
          "aggregation": "<aggregate_metric>",
          "value": <numeric_value>
      }
      
    • Less Than

      {
          "type": "lessThan",
          "aggregation": "<aggregate_metric>",
          "value": <numeric_value>
      }
      
  • Dimension Selector Filter

    • dimSelector

      {
          "type": "dimSelector",
          "dimension": "<dimension>",
          "value": <dimension_value>
      }
      
  • Logic expression Filters

    • AND

      {
          "type": "and",
          "havingSpecs": [<having clause>, <having clause>, ...]
      }
      
    • OR

      {
          "type": "or",
          "havingSpecs": [<having clause>, <having clause>, ...]
      }
      
    • NOT

      {
          "type": "not",
          "havingSpec": <having clause>
      }
      

Aggregation Granularity

Simple Granularities

all, none, second, minute, fifteen_minute, thirty_minute, hour, day, week, month, quarter, year 의 string 값을 지원함

  • all은 모든 bucket의 데이터를 하나의 bucket으로 담음.

  • none은 bucket에 담지 않고 index의 granularity를 사용함. TimeSeries Query에서는 none은 비추!

Query Granularity의 값이 Ingestion Granularity의 값보다 작으면 의미가 없음.(인덱싱이 되지 않아서)

Duration Granularities

milliseconds 단위로 기간을 지정해서 사용할 수 있음

{
    "type": "duration",
    "duration": 3600000,
    "origin": "2018-01-01T00:30:00Z"
}

origin의 기본 값은 1970-01-01T00:00:00Z

Period Granularities

years, months, weeks, day, hours, minutes, seconds 단위로 기간을 지정

ISO 8601 포멧을 따른다함(P1Y, P2M, P3W, P4H, PT1H30M 등)

{
    "type": "period",
    "period": "P3M",
    "timeZone": "America/Los_Angeles",
    "origin": "2012-02-01T00:00:00-08:00"
}

timeZoneJoda Time Library에서 지원하는 것을 사용할 수 있음

Query Filter

Selector Filter

"filter": {
    "type": "selector",
    "dimension": <dimension_string>,
    "value": <dimension_value_string>
}

Column Comparison Filter

"filter": { 
    "type": "columnComparison",
    "dimensions": [<dimension_a>, <dimension_b>] 
}

Regular Expression Filter

"filter": { 
    "type": "regex",
    "dimension": <dimension_string>,
    "pattern": <pattern_string>
}

Logical Expression Filter

  • AND

    "filter": { "type": "and", "fields": [<filter>, <filter>, ...] }
    
  • OR

    "filter": { "type": "or", "fields": [<filter>, <filter>, ...] }
    
  • NOT

    "filter": { "type": "not", "field": <filter> }
    

Javascript Filter

return valuetrue인 것들만 남김

{
    "type" : "javascript",
    "dimension" : <dimension_string>,
    "function" : "function(value) { <...> }"
}

Extraction Filter

extraction function을 사용해서 필터링

Search Filter

{
    "filter": {
        "type": "search",
        "dimension": <dimension_string>,
        "query": {
          "type": <search_query_type>,
          "value": "value_1" 
        }        
    }
}
  • Contains

property description required?
type This String should always be "contains". yes
value A String value to run the search over. yes
caseSensitive Whether two string should be compared as case sensitive or not no (default == false)
  • Insensitive Contains

property description required?
type This String should always be "insensitive_contains". yes
value A String value to run the search over. yes
  • Fragment

property description required?
type This String should always be "fragment". yes
values A JSON array of String values to run the search over. yes
caseSensitive Whether strings should be compared as case sensitive or not. Default: false(insensitive) no

In Filter

{
    "type": "in",
    "dimension": <dimension_string>,
    "values": ["value_1", "value_2", "value_3"]
}

Like Filter

{
    "type": "like",
    "dimension": "last_name",
    "pattern": "D%"
}
property type description required?
type String This should always be "like". yes
dimension String The dimension to filter on yes
pattern String LIKE pattern, such as "foo%" or "___bar". yes
escape String An escape character that can be used to escape special characters. no
extractionFn Extraction function Extraction function to apply to the dimension no

Bound Filter

Interval Filter

Aggregations

Count Aggregator

row count를 기반으로 counting함

{ "type" : "count", "name" : <output_name> }

Sum Aggregator

  • longSum, doubleSum, floatSum

  • doubleMin, doubleMax

  • floatMin, floatMax

First/Last Aggregator

  • doubleFirst, floaFirst, longFirst

  • doubleLast, floatLast, longLast

Javascript Aggregator

{ "type": "javascript",
  "name": "<output_name>",
  "fieldNames"  : [ <column1>, <column2>, ... ],
  "fnAggregate" : "function(current, column1, column2, ...) {
                     <updates partial aggregate (current) based on the current row values>
                     return <updated partial aggregate>
                   }",
  "fnCombine"   : "function(partialA, partialB) { return <combined partial results>; }",
  "fnReset"     : "function()                   { return <initial value>; }"
}

Approximate Aggregations

Cardinality Aggregator

dimension의 cardinality를 HyperLogLog Algorithm을 사용하여 계산함.

{
  "type": "cardinality",
  "name": "<output_name>",
  "fields": [ <dimension1>, <dimension2>, ... ],
  "byRow": <false | true> # (optional, defaults to false),
  "round": <false | true> # (optional, defaults to false)
}

Cardinality by value

byRowfalse를 해놓으면(default false) 값에 대해서 cadinality를 계산함

SELECT COUNT(DISTINCT(dimension)) FROM <datasource>;

SELECT COUNT(DISTINCT(value)) FROM (
  SELECT dim_1 as value FROM <datasource>
  UNION
  SELECT dim_2 as value FROM <datasource>
  UNION
  SELECT dim_3 as value FROM <datasource>
);

Cardinality by row

byRowtrue 이면 dimension의 조합 cardinality를 계산함

SELECT COUNT(*) FROM ( SELECT DIM1, DIM2, DIM3 FROM <datasource> GROUP BY DIM1, DIM2, DIM3 )

HyperUnique Aggregator

{ 
  "type" : "hyperUnique",
  "name" : <output_name>,
  "fieldName" : <metric_name>,
  "isInputHyperUnique" : false,
  "round" : false
}

Miscellaneous Aggregations

Filtered Aggregator

필터링 된 값만 집계함

필터 + 집계를 한번에 할 수 있음

{
  "type" : "filtered",
  "filter" : {
    "type" : "selector",
    "dimension" : <dimension>,
    "value" : <dimension value>
  }
  "aggregator" : <aggregation>
}

Post Aggregations

aggregation된 결과에 다른 처리를 해야하는 경우 사용

Arithmetic post-aggregator

postAggregation : {
  "type"  : "arithmetic",
  "name"  : <output_name>,
  "fn"    : <arithmetic_function>,
  "fields": [<post_aggregator>, <post_aggregator>, ...],
  "ordering" : <null (default), or "numericFirst">
}

arithmetic_function = [+, -, *, /, quotient]

Field accessor post-aggregators

Constant post-aggregator

Greatest / Least post-aggregators

JavaScript post-aggregator

HyperUnique Cardinality post-aggregator

union dataSource를 사용하려면 쿼리를 Broker에게 보내야함