visit
As a side note here, you would be perfectly fine using Kafka instead of Redpanda. I just like the simplicity that Redpanda brings to the table, as you can run a single Redpanda instance instead of all of the Kafka components.
git clone //github.com/bobbyiliev/materialize-tutorials.git
After that you can access the directory:
cd materialize-tutorials/mz-join-mysql-and-postgresql
Let's start by first running the Redpanda container:
docker-compose up -d redpanda
Build the images:
docker-compose build
Finally, start all of the services:
docker-compose up -d
In order to Launch the Materialize CLI, you can run the following command:
docker-compose run mzcli
This is just a shortcut to a docker container with
postgres-client
pre-installed, if you already havepsql
you could runpsql -U materialize -h localhost -p 6875 materialize
instead.
Now that you're in the Materialize CLI, let's define the orders
tables in the mysql.shop
database as Redpanda sources:
CREATE SOURCE orders
FROM KAFKA BROKER 'redpanda:9092' TOPIC 'mysql.shop.orders'
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY '//redpanda:8081'
ENVELOPE DEBEZIUM;
If you were to check the available columns from the orders
source by running the following statement:
SHOW COLUMNS FROM orders;
name | nullable | type
--------------+----------+-----------
id | f | bigint
user_id | t | bigint
order_status | t | integer
price | t | numeric
created_at | f | text
updated_at | t | timestamp
Next, we will create our first Materialized View, to get all of the data from the orders
Redpanda source:
CREATE MATERIALIZED VIEW orders_view AS
SELECT * FROM orders;
CREATE MATERIALIZED VIEW abandoned_orders AS
SELECT
user_id,
order_status,
SUM(price) as revenue,
COUNT(id) AS total
FROM orders_view
WHERE order_status=0
GROUP BY 1,2;
You can now use SELECT * FROM abandoned_orders;
to see the results:
SELECT * FROM abandoned_orders;
CREATE MATERIALIZED SOURCE "mz_source" FROM POSTGRES
CONNECTION 'user=postgres port=5432 host=postgres dbname=postgres password=postgres'
PUBLICATION 'mz_source';
MATERIALIZED
: Materializes the PostgreSQL source’s data. All of the data is retained in memory and makes sources directly selectable.mz_source
: The name for the PostgreSQL source.CONNECTION
: The PostgreSQL connection parameters.PUBLICATION
: The PostgreSQL publication, containing the tables to be streamed to Materialize.
Once we've created the PostgreSQL source, in order to be able to query the PostgreSQL tables, we would need to create views that represent the upstream publication’s original tables. In our case, we only have one table called users
so the statement that we would need to run is:
CREATE VIEWS FROM SOURCE mz_source (users);
SHOW FULL VIEWS;
SELECT * FROM users;
Next, let's go ahead and create a few more views.
CREATE MATERIALIZED VIEW high_value_orders AS
SELECT
users.id,
users.email,
abandoned_orders.revenue,
abandoned_orders.total
FROM users
JOIN abandoned_orders ON abandoned_orders.user_id = users.id
GROUP BY 1,2,3,4
HAVING revenue > 2000;
As you can see, here we are actually joining the users
view which is ingesting the data directly from our Postgres source, and the abandond_orders
view which is ingesting the data from the Redpanda topic, together.
CREATE SINK high_value_orders_sink
FROM high_value_orders
INTO KAFKA BROKER 'redpanda:9092' TOPIC 'high-value-orders-sink'
FORMAT AVRO USING
CONFLUENT SCHEMA REGISTRY '//redpanda:8081';
Now if you were to connect to the Redpanda container and use the rpk topic consume
command, you will be able to read the records from the topic.
However, as of the time being, we won’t be able to preview the results with rpk
because it’s AVRO formatted. Redpanda would most likely implement this in the future, but for the moment, we can actually stream the topic back into Materialize to confirm the format.
SELECT topic FROM mz_kafka_sinks;
Output:
topic
-----------------------------------------------------------------
high-volume-orders-sink-u12-426
CREATE MATERIALIZED SOURCE high_volume_orders_test
FROM KAFKA BROKER 'redpanda:9092' TOPIC ' high-volume-orders-sink-u12-426'
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY '//redpanda:8081';
Make sure to change the topic name accordingly!
SELECT * FROM high_volume_orders_test LIMIT 2;
In order to access the instance visit //localhost:3030
if you are running the demo locally or //your_server_ip:3030
if you are running the demo on a server. Then follow the steps to complete the Metabase setup.
docker-compose down
CREATE SOURCE: PostgreSQL
CREATE SOURCE
CREATE VIEWS
SELECT
First published