visit
DuckDB is an embedded SQL database engine. It's very similar to well-known SQLite but is designed for OLAP-style workloads. DuckDB supports a variety of data types and SQL features. With its ability to handle high-speed analytics in a memory-centric environment, it has quickly gained popularity among data scientists and analysts. In this blog post, we will explore using DuckDB with Go.
Online Analytical Processing (OLAP) databases are specially designed to support complex analytical and ad-hoc queries, data manipulation, and multidimensional analysis. These databases are optimized for read-heavy workloads, unlike Online Transaction Processing (OLTP) databases which are more suited for write-heavy, transactional operations. OLAP databases are the backbone of many Business Intelligence (BI) tools, allowing decision-makers to analyze data from multiple dimensions and perspectives.
In-Memory Execution: DuckDB primarily operates in-memory, but also supports out-of-memory execution. This allows it to perform computations very quickly and efficiently.
Full SQL Support: DuckDB supports a wide range of SQL features, which makes it very flexible for various types of data operations.
Transaction Support: DuckDB supports transactions, which is a critical feature for maintaining data integrity and consistency in many applications.
Vectorized Execution: DuckDB uses vectorized query execution, which leads to better CPU utilization and improved performance.
Easy Integration: DuckDB provides APIs for multiple programming languages, including Python, R, C++, Rust, Java, and Go. This makes it easier to integrate DuckDB into existing workflows and systems.
Open Source: DuckDB is open source, which means its source code is freely available for modification or enhancement. This allows for community-driven improvements and adaptability to specific use cases.
go get github.com/marcboeker/go-duckdb
package main
import (
"database/sql"
"log"
_ "github.com/marcboeker/go-duckdb"
)
func main() {
// Empty datasource means, that DB will be solely in-memory, otherwise you could specify a filename here
db, err := sql.Open("duckdb", "")
if err != nil {
log.Fatal("Failed to connect to database:", err)
}
defer db.Close()
}
The sql.Open() function is used to connect to DuckDB, and the empty data source name denotes that we are using an in-memory database.
// Create table
_, err = db.Exec(`
CREATE TABLE employee (
id INTEGER,
name VARCHAR(20),
start_dt TIMESTAMP,
is_remote BOOLEAN
)`)
if err != nil {
log.Fatal(err)
}
Next, to insert data into the employee table, you could use the Exec function with an INSERT INTO SQL command:
// Insert some data in table
_, err = db.Exec(`
INSERT INTO employee (id, name, start_dt, is_remote)
VALUES
(1, 'John Doe', '2022-01-01 09:00:00', true),
(2, 'Jane Smith', '2023-03-15 10:00:00', false)`)
if err != nil {
log.Fatal(err)
}
To fetch data, you can use the QueryRow() function to select a single row:
// Variables to store query result
var id int
var name string
var startDt time.Time
var isRemote bool
// Query single row
if err = db.QueryRow("SELECT id, name, start_dt, is_remote FROM employee WHERE id = ?", 1).Scan(&id, &name, &startDt, &isRemote); err != nil {
if err == sql.ErrNoRows {
log.Println("No rows found.")
} else {
log.Fatalf("unable to execute query: %v", err)
}
} else {
fmt.Println("Select 1 row result:\nID:", id, "Name:", name, "Start Datetime:", startDt, "Is Remote:", isRemote)
}
To select multiple rows, you can use Query() function:
// Query multiple rows
rows, err := db.Query("SELECT id, name, start_dt, is_remote FROM employee")
if err != nil {
log.Fatal(err)
}
defer rows.Close()
// Print the results
fmt.Println("Results:")
for rows.Next() {
err = rows.Scan(&id, &name, &startDt, &isRemote)
if err != nil {
log.Fatal(err)
}
fmt.Println("ID:", id, "Name:", name, "Start Datetime:", startDt, "Is Remote:", isRemote)
}
err = rows.Err()
if err != nil {
log.Fatal(err)
}
In this code:
We're calling the Query() function with an SQL command to select all records from the employee table.
We then enter a loop using rows.Next(), which iterates over each row returned by the query.
Within the loop, we're using the Scan() function to copy the columns from the current row into the id, name, workFrom, and isRemote variables.
We then print these variables using the fmt.Println() function.
After the loop, we check for errors during the iteration with rows.Err(). If there's an error, we print it using log.Fatal(err).
In the real world, your Go code must be ready to handle errors and work with transactions. The SQL package provides all the necessary tools:
// Error handling and transactions
tx, err := db.Begin()
if err != nil {
log.Fatal(err)
}
defer tx.Rollback()
_, err = tx.Exec(`
INSERT INTO employee (id, name, start_dt, is_remote)
VALUES
(3000000000, 'id int64 instead of int32', '2022-06-17 11:00:00', true)`)
if err != nil {
log.Printf("ERROR: %s\n", err.Error()) // Do not fail, just print the error in output
}
err = tx.Commit()
if err != nil {
log.Fatal(err)
}
The full source code for this example could be found on my GitHub repo: