visit
The latest is
“Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery and Postgres, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.”
Malloy was developed by
"feels like JSON made a baby with SQL, in the worse possible way"
To be clear, I know about Looker, but I haven’t done any real work with it. So that brings us to Malloy.
Lloyd describes Malloy on the
“an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database.” also stating: “SQL is complete but ugly…Everything is expressible, but nothing is reusable; simple ideas are complex to express; the language is verbose and lacks smart defaults. Malloy is immediately understandable by SQL users, and far easier to use and learn.”
Malloy natively supports
query: table('malloy-data.faa.airports') -> {
group_by: fac_type
aggregate: airport_count is count()
where: state = 'CA'
order_by: airport_count desc
}
SELECT
base.fac_type as fac_type,
COUNT( 1) as airport_count
FROM `malloy-data.faa.airports` as base
WHERE base.state='CA'
GROUP BY 1
ORDER BY 2 desc
Note that you can display this as HTML, JSON, or SQL. The basic structure of a Malloy Query takes the form of:
query: <source> {
join_one: <source> with …
join_many: <source> on …
} -> {
group_by:
<field/dimension>
<field/dimension>
aggregate:
<aggregation/measure>
<aggregation/measure>
nest:
<named_query OR query_def>
<named_query OR query_def>
where: <filter_expression>, <filter_expression>, …
having: <aggregate_filter_expression>, <aggregate_filter_expression>
order_by: <field/dimension>, <aggregation/measure>, …
limit: <limit>
}
SELECT
<group_by>, <group_by>, …
<aggregate>, <aggregate>, …
<nest>, <nest>, … -- very much a simplification; read more in Nesting Queries doc.
FROM <source>
LEFT JOIN <source> ON …
LEFT JOIN <source> ON …
WHERE (<filter_expression>) AND (<filter_expression>) AND …
GROUP BY <group_by>, <group_by>, …
HAVING <aggregate_filter_expression> AND <aggregate_filter_expression> AND …
ORDER BY <group_by> | <aggregate>
LIMIT <limit>
If you don’t know Mimoune Djouallah, you should definitely follow him on
So, I’m going to summarize what Mimoune has going on here on his
In the left-hand pane, we have the schema information, which is pretty obvious. Then to the right of that on top is our Malloy Query, in which you can see the nested aggregations easily described in just a few lines.
That is the generated SQL from the Malloy Query, which was only 6 actual lines of code; it replaced 40 lines of SQL. You can see the embedded DuckDB connection that is accessing the various Parquet files that have the data for the query in the Semantic Model.
It’s totally worth clicking into the Fiddle and playing around a bit. There is another really fun Fiddle from Lloyd that connects to the IMDB dataset that is absolutely worth checking out