SELECT
SELECT
allows you to specify list of columns and expressions to be selected and evaluated from a table.
All examples below are given for the file ratings.csv from the movielens project. See section to Get sample data
Syntax
Note that the
TABLE
you query from can either be a table in your database (in which case you would pass the table's name), or the result of a sub query.
Simple Select
All columns
QuestDB supports SELECT * FROM tablename
. When selecting all, you can also omit most of the statement and simply pass
the table name.
The two examples below are equivalent
SELECT * FROM ratings;
// is equivalent to //
ratings;
Specific columns
To select specific columns, replace * by the names of the columns you are interested in.
Example:
SELECT movieId, rating FROM ratings;
Arithmetic Expressions
SELECT
is capable of evaluating multiple expressions and functions. You can mix comma separated lists
of expressions with the column names you are selecting.
SELECT movieId, (100  rating)*2, rating > 3.5 good
FROM ratings;
The result of rating > 3.5
is a boolean. The column will be named good and take values true or false.
Aliases
Using aliases allow you to give expressions or column names of your choice. You can assign an alias to a column or an expression by writing the alias name you want after that expression
Alias names and column names must be unique. Example:
SELECT movieId alias1, rating alias2
FROM ratings
Aggregation
Aggregation functions can be used in arithmetic expressions.
Aggregate functions
Currently implemented aggregate functions:
Function  Remarks 

sum(expr) 
Sums values of numeric expr . Return type is double for expr of types
double ,float ; int for expr of types byte ,
short ,int and long for expr of type long

lsum(expr) 
Sums values of expr and returns long . This function can sum int values without
overflowing the result.

avg(expr) 
Calculates average value of expr .

max(expr) 
Calculates maximum value of expr . Return type is the same as type of expr

min(expr) 
Calculates minimum value of expr . Return type is the same as type of expr

first(expr) 
First value of expr in natural order of records.

last(expr) 
Last value of expr in natural order of records.

count()  Calculates count of records in query. 
count(expr) 
Calculates count of records with nonnull values of expr

var(expr) 
Calculates variance of expr .

stddev(expr) 
Calculates standard deviation of values of expr

Aggregation by group
QuestDB evaluates aggregation functions without need for traditional GROUP BY
. Simply use a mix of column
names and aggregation functions in a SELECT
clause. You can have any number of discrete value columns and
any number of aggregation functions.
SELECT movieId, avg(rating), count() FROM ratings;
// is equivalent to //
SELECT movieId, avg(rating), count() FROM ratings GROUP BY movieId;
Aggregation arithmetic
Aggregation functions can be used in arithmetic expressions. The following computes mid
of rating values for
every movie.
SELECT movieId, (min(rating) + max(rating))/2 mid, count() count FROM ratings;
WHERE clause
Syntax
QuestDB supports standard WHERE
clause for filtering data.
Filter expressions are required to return boolean result.
SELECT COLUMN1, COLUMN2...
FROM TABLENAME
WHERE CONDITIONS;
CONDITIONS
are expressions that return boolean results.
Operators
Advanced conditions can be built using logical operators.
List of supported BOOLEAN
operators:
Operator  Remarks 

and  Logical AND operator. 
or  Logical OR operator. 
not 
Logical NOT operator. Example where not(x > 10)

<  Numerical LESS than. 
>  Numerical GREATER than. 
<=  Numerical LESS than OR EQUAL. 
>=  Numerical GREATER than OR EQUAL. 
= 
Equality operator applicable to all types. where x = 'ABC' and y = 10

eq 
Equality operator applicable to double type. It compares two double values with a delta.
where eq(x, 0.5, 0.000001) . This is logically equivalent to abs(x0.5) < 0.000001

!=  Nonequality operator applicable to all types. 
in 
Returns true when value of left argument is one of list of values of right argument.
where x in ('ABC','CDE','HJK') . This operator is performs better than logically equivalent
where x = 'ABC' or x = 'CDE' or x = 'HJK'

~ 
Returns true when value of left argument matches regular expression. where x ~ 'THE'
Here THE is the pattern. In this case it matches all strings that contain THE . For more
details on the pattern please refer to
Java Pattern documentation

Exact timestamp
Use =
operator and UTC date string for exact timestamp matches:
SELECT ratings WHERE timestamp = '20100112T00:02:26.000Z'
QuestDB SQL optimiser will create more efficient plan when data is time series naturally ordered by timestamp.
Interval timestamp
Using >
,>=
,<
,<=
operators:
SELECT ratings WHERE timestamp > '20100112T00:00:00.000Z' and timestamp < '20100112T00:59:59.999Z'
Using in
operator, for example:
SELECT ratings WHERE timestamp in ('20100112T00:00:00.000Z', '20100112T00:59:59.999Z')
in
is inclusive of edges and supports exactly two UTC timestamps.
Using =
operator and partial UTC timestamp string. Example below selects data between 14:00 and 14:59 on 12 January 2010:
SELECT ratings WHERE timestamp = '20100112T14'
To obtain intervals UTC timestamps can be truncated at seconds, minutes, days, months and years. This example selects data for the whole of 2011:
SELECT ratings WHERE timestamp = '2011'
Using =
operator and interval length modifier to specify longer than single time unit intervals. This
example selects 2pm, 3pm and 4pm data:
SELECT ratings WHERE timestamp = '20100112T14;2h'
QuestDB uses the following algorithm to create the interval: 20100112T14
is translated into natural interval
[20100112T14:00:00.000Z, 20100112T14:59:59.999Z]
, then 2 hours added to the upper bound resulting in
[20100112T14:00:00.000Z, 20100112T16:59:59.999Z]
interval.
Interval modifier format is:
where letters stand for:
Modifier  Remarks 

s  Seconds 
m  Minutes 
h  Hours 
d  Days 
M  Months 
y  Years 
Floating Point
You can use eq
operator to compare double
and float
values with tolerance to avoid rounding problems.
For example:
SELECT prices WHERE eq(bid, 1.56, 0.000001)
You can also use =
:
SELECT prices WHERE bid = 1.56
but =
would not match 1.56
and 1.559999999999
.
Best practice for floating point values would be to store as LONG integer using scaling factors to avoid roundingrelated issues.
Search using aggregation results
Subqueries can be used to filter on aggregation results in QuestDB SQL. It is fairly simple because of
the optional nature of select .. from
.
The following example selects all movies that received over 50,000 ratings.
(select movieId x, (min(rating) + max(rating))/2 mid, count() count from ratings) where count > 50000
Standard SQL equivalent would be:
select movieId, (min(rating) + max(rating))/2 mid, count() count from ratings
group by movieId
having count() > 50000
ORDER BY
Usage
ORDER BY
is used to sort the results of a query in ascending or descending order.
Syntax
SELECT COLUMN1, COLUMN2...
FROM TABLENAME
ORDER BY COLUMN1 [ASC]DESC, COLUMN2 [ASC]DESC ...;
Parameter  Remarks 

COLUMN  Columns by which you wish to order 
ASC  To sort in ascending order 
DESC  To sort in descending order 
Note that ASC is optional and can be omitted.
Examples
Order by one column in ascending order:
ratings ORDER BY userId;
// is equivalent to //
SELECT * FROM ratings ORDER BY userId ASC;
Order by one column in descending order:
ratings ORDER BY userId DESC;
Order by several columns:
ratings ORDER BY userId, rating DESC;
// is equivalent to //
SELECT * FROM ratings ORDER BY userId ASC , rating DESC;
SELECT DISTINCT
Usage
SELECT DISTINCT
is used to return only distinct (i.e different) values.
Syntax
SELECT DISTINCT COLUMN1, COLUMN2, ...
FROM TABLE;
Example
The following query will return a list of all unique ratings in the table.
SELECT DISTINCT rating
FROM ratings;
SELECT DISTINCT can be used in conjunction with more advanced queries and filters.
The following query will return a list of all unique ratings in the table, and the number of times they occur.
SELECT DISTINCT rating, count()
FROM ratings;
The following query will return a list of all ratingsuserId couples in the table and hoy many times each user has assigned each rating. It is also filtered for ratings superior to 3
SELECT DISTINCT rating, userId, count()
FROM ratings
WHERE rating > 3;
LIMIT
Overview
LIMIT
is used to specify the number of records to return. Furthermore, you can specify whether the position of the rows
(first n rows, last n rows, n rows after skipping m rows etc) .
In other implementations of SQL, this is sometimes replaced by statements such as
OFFSET
orROWNUM
Our implementation ofLIMIT
encompasses both in one statement.
Syntax
Statements with LIMIT
follow this syntax:
SELECT column1, column2, ...
FROM table
LIMIT NUM_ROWS;
Where
NUM_ROWS
is the number of rows you want to return with the query.
Limit will by default start from the TOP of the table. If you would like to get results from the BOTTOM of the table, then l should be a negative number.
Examples
The following will return the TOP 5 results.
SELECT * FROM ratings LIMIT 5;
For a results table with rows from 1 to n, it will return rows [1, 2, 3, 4, 5]
The following will return the BOTTOM 5 results:
SELECT * FROM ratings LIMIT 5;
For a results table with rows from 1 to n, it will return rows [n5, n4, n3, n2, n1, n]
Range
You can use two parameters to return a range. To do so, you should use the syntax
SELECT . . FROM . . LIMIT n, m;
Where n
is the lower bound of your range (exclusive), and m
is the upper bound of your range (inclusive)
For example, the following return records {3, 4, 5}
ratings LIMIT 2,5;
When used with negative
numbers, you can return a range starting from the bottom.
For example, the following will return records between n7 (exclusive) and n3 (inclusive), i.e {n6, n5, n4, n3}
ratings LIMIT 7, 3;
SAMPLE BY
Overview
SAMPLE BY
is used on timeseries data to summarise large datasets into aggregates of homogeneous time chunks.
To use
SAMPLE BY
, one column needs to be designated astimestamp
. Find out more in the CREATE TABLE section.
Syntax
SAMPLE BY
syntax is as follows:
SELECT columns
FROM table
SAMPLE BY nSAMPLE_SIZE
WHere SAMPLE_SIZE
is the unit of time by which you wish to aggregate your results, and n
is the number of
timechunks that will be summarised together. You can sample by any multiple of the following:
Modifier  Remarks 

s  Seconds 
m  Minutes 
h  Hours 
d  Days 
M  Months 
y  Years 
Examples
Assume the following table
TRADES
===============================================
timestamp, buysell, quantity, price

ts1 B q1 p1
ts2 S q2 p2
ts3 S q3 p3
... ... ... ...
tsn B qn pn
The following will return the number of trades per hour:
SELECT timestamp, count()
FROM TRADES
SAMPLE BY 1h;
The following will return the trade volume in 30 minute intervals
SELECT timestamp, sum(quantity*price)
FROM TRADES
SAMPLE BY 30m;
The following will return the average trade notional (where notional is = q * p) by day:
SELECT timestamp, avg(quantity*price)
FROM TRADES
SAMPLE BY 1d;
FILL
Overview
FILL
is an option for SAMPLE BY
to determine how the results of a query are displayed when one or more of your timeslices have no data.
This is useful when you would like your query results to be homogeneous in time: rather than skipping the time chunk, the query will
return a value determine by the type of FILL you choose.
Syntax
SELECT timestamp, aggr1, aggr2, ...
FROM table
SAMPLE BY YOUR_SAMPLE_SIZE
FILL(FILL_OPTION_1, FILL_OPTION_2...);
FILL_OPTION
can be any of the following:
Option  Remarks 

NONE  Will not fill. In case there is no data, the timechunk will be skipped in the results. This means your table could potentially be missing intervals. 
NULL  Will fill with NULL 
PREV  Will fill using the previous data point. 
LINEAR  Will fill with the result of the linear interpolation of the surrounding 2 points. 
0  Will fill with 0. Note you can replace 0 with any number of your choice (e.g fill 100.00) 
Fill must be used on aggregations such as sum(), count() etc.
Examples
Consider the following table
PRICES
======================
timestamp, price

ts1 p1
ts2 p2
ts3 p3
... ...
tsn pn
We could run the following to get the minimum, maximum and average price per hour using the following query:
SELECT timestamp, min(price) min, max(price) max, avg(price) avg
FROM PRICES
SAMPLE BY 1h;
It would generally return result like this:
RESULTS
======================================
timestamp, min, max, average

ts1 min1 max1 avg1
... ... ... ...
tsn minn maxn avgn
However, in case there was no PRICES
data for a given hour, your table would have timechunks missing. For example
RESULTS
======================================
timestamp, min, max, average

ts1 min1 max1 avg1
ts2 min2 max2 avg2
ts4 min4 max4 avg4
... ... ... ...
tsn minn maxn avgn
Here you can see that the third time chunk is missing. This is because there was no price update in the third hour. Let's see what different fill values would return:
SELECT timestamp, min(price) min, max(price) max, avg(price) avg
FROM PRICES
SAMPLE BY 1h
FILL(null, 0, prev);
would return:
RESULTS
======================================
timestamp, min, max, average

ts1 min1 max1 avg1
ts2 min2 max2 avg2
ts3 NULL 0 avg2 <<< FILL VALUES
ts4 min4 max4 avg4
... ... ... ...
tsn minn maxn avgn
And the following:
SELECT timestamp, min(price) min, avg(price) avg
FROM PRICES
SAMPLE BY 1h
FILL(25.5, linear);
Would return:
RESULTS
======================================
timestamp, min, average

ts1 min1 avg1
ts2 min2 avg2
ts3 25.5 (avg2+avg4)/2 <<< FILL VALUES
ts4 min4 avg4
... ... ...
tsn minn avgn