paint-brush
Getting Started with the Weaviate Vector Search Engine by@semi-technologies
4,317 reads
4,317 reads

Getting Started with the Weaviate Vector Search Engine

by SeMI TechnologiesMay 19th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Everybody who works with data in any way shape or form knows that one of the most important challenges is searching for the correct answers to your questions. There is a whole set of excellent (open source) search engines available but there is one thing that they can’t do, search and related data based on context.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Getting Started with the Weaviate Vector Search Engine
SeMI Technologies HackerNoon profile picture
Everybody who works with data in any way shape or form knows that one of the most important challenges is searching for the correct answers to your questions. There is a whole set of available but there is one thing that they can’t do, search and related data based on context.
is an open-source, GraphQL-based, search graph based on a build in embedding mechanism.
Before we get started, some further reading while exploring Weaviate.

Getting Started with Weaviate

Let look at the following data object that one might store in a search engine:
{
    "title": "African bush elephant",
    "photoUrl": "//en.wikipedia.org/wiki/African_bush_elephant"
}
You can retrieve the data object from any search engine by searching for “elephant” or “african”. But what if you want to search for “animal”, “savanna” or “trunk”?
This is the problem the Weaviate search graph solves, because of its build-in natural language model, it indexes your data based on the context rather than keywords alone.
In this article, you will learn within 10 minutes how to use Weaviate to build your own semantic search engine and how this GraphQL query:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["animal with a trunk"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
Will result in the following response:
{
  "data": {
    "Get": {
      "Things": {
        "Photo": [
          {
            "photoUrl": "//upload.wikimedia.org/wikipedia/commons/b/bf/African_Elephant_%28Loxodonta_africana%29_male_%28%29.jpg",
            "title": "African bush elephant"
          }
        ]
      }
    }
  },
  "errors": null
}
If you want to learn more (outside this article), you can watch this or this interview at . All documentation for Weaviate can be found . You can also sign up for the update here and follow the development on here (and while you are there, don’t forget to become a 😉🙏)

Running Weaviate

The easiest way to get started with Weaviate is by running the Docker compose setup.In this demo we will be using the English version () of Weaviate which you can run with the following commands:
# Download the Weaviate configuration file
$ curl -O //raw.githubusercontent.com/semi-technologies/weaviate/0.22.7/docker-compose/runtime/en/config.yaml
# Download the Weaviate docker-compose file
$ curl -O //raw.githubusercontent.com/semi-technologies/weaviate/0.22.7/docker-compose/runtime/en/docker-compose.yml
# Run Docker Compose
$ docker-compose up
When Weaviate is running, you can simply check if it is up by using the following command:
$ curl //localhost:8080/v1/meta
We will be creating a mini-search engine for photos by taking the three following steps:
  1. Create a Weaviate schema.
  2. Add data to Weaviate.
  3. Query the data with Weaviate's GraphQL interface.

Create a Weaviate Schema

The first thing you need to do when working with Weaviate is create a schema, Weaviate makes a distinction between “things” and “actions”, in this getting started guide, we will only work with things, but the distinction is often made between nouns (things) and verbs (actions). The schema will later be used when querying and exploring your dataset. As a good rule of thumb, Weaviate uses the RESTful API to add data and the GraphQL API to fetch data. The schema is in graph format, meaning that you can create (huge) networks (i.e., knowledge graphs) of your data if you so desire, but if you are building a simple search engine, one class with a few properties can already be enough.You can learn more about creating a schema . But for now, we will dive in and create a super simple schema for a photo dataset.In the example below, we are going to use the command line to add a schema, but you can also use the , Postman, or any other way you like to send out HTTP requests.
$ curl \
  --header "Content-Type: application/json" \
  --request POST \
  --data '{
    "class": "Photo",
    "description": "A photo",
    "vectorizeClassName": false,
    "keywords": [],
    "properties": [
        {
            "dataType": [
                "string"
            ],
            "name": "title",
            "description": "Title of the Photo",
            "vectorizePropertyName": false,
            "index": true
        }, {
            "dataType": [
                "string"
            ],
            "name": "photoUrl",
            "description": "URL of the Photo",
            "vectorizePropertyName": false,
            "index": false
        }
    ]
  }' \
  //localhost:8080/v1/schema/things
You can now examine the class like this:
$ curl //localhost:8080/v1/schema
# or with jq
$ curl //localhost:8080/v1/schema | jq .
Let’s examine the JSON object to understand what we just added to Weaviate (learn more in the ):
Let’s add another class that represents a user and the photos this user owns.
$ curl \
  --header "Content-Type: application/json" \
  --request POST \
  --data '{
    "class": "User",
    "description": "A user",
    "keywords": [],
    "properties": [
        {
            "dataType": [
                "string"
            ],
            "name": "name",
            "description": "Name of the user"
        }, {
            "dataType": [
                "Photo"
            ],
            "name": "ownsPhotos",
            "description": "Photos this user owns",
            "cardinality": "many"
        }
    ]
  }' \
  //localhost:8080/v1/schema/things
We now have a super simple graph that looks like this:
Now let's populate Weaviate with some data!

Adding Data

Like creating classes, adding data can be through the RESTful API as well. For advanced users, you can use the or available . But for this example, we are going to add one user and two photos manually.
# Add the elephant
$ curl \
    --header "Content-Type: application/json" \
    --request POST \
    --data '{
        "class": "Photo",
        "schema": {
            "title": "African bush elephant",
            "photoUrl": "//upload.wikimedia.org/wikipedia/commons/b/bf/African_Elephant_%28Loxodonta_africana%29_male_%28%29.jpg"
        }
    }' \
    //localhost:8080/v1/things
Make sure to save the UUID that is returned as a result(!)
# Add Brad Pitt
$ curl \
    --header "Content-Type: application/json" \
    --request POST \
    --data '{
        "class": "Photo",
        "schema": {
            "title": "Brad Pitt at the 2019 premiere of Once Upon a Time in Hollywood",
            "photoUrl": "//upload.wikimedia.org/wikipedia/commons/4/4c/Brad_Pitt_2019_by_Glenn_Francis.jpg"
        }
    }' \
    //localhost:8080/v1/things
Make sure to also save the UUID that is returned in this result as well. We will be using them to add the two photos to the user.
# First, add the user
$ curl \
    --header "Content-Type: application/json" \
    --request POST \
    --data '{
        "class": "User",
        "schema": {
            "name": "John Doe"
        }
    }' \
    //localhost:8080/v1/things
We can now add the photos to the user by setting references (Weaviate uses the term "beacon", learn more about setting graph references in the ). Make sure to use the UUID's that relate to the photo's and the user.
$ curl \
    --header "Content-Type: application/json" \
    --request PUT \
    --data '[{
        "beacon": "weaviate://localhost/things/b81e530f-f8db-41b6-910f-0469c8b7884e"
    }, {
        "beacon": "weaviate://localhost/things/127c8bcb-99bf-4c8d-94d4-f67cd2323548"
    }]' \
    //localhost:8080/v1/things/0b70b628-377b-4b4d-85c8-89b0dacd4209/references/ownsPhotos
You can now validate the added data via:
$ curl //localhost:8080/v1/things 
# or with jq
$ curl //localhost:8080/v1/things | jq .

Query Data

Now that we have all data in, we are getting to the juicy part of Weaviate, search. Searching is done with . You can learn more about all the possible functions or you can get into the nitty-gritty details of the Weaviate GraphQL API by reading this Hackernoon article.

But for now, we are going to keep it simple.

You can use any GraphQL client you like, but to play around with the available queries, you can use the . If you go to the Playground, fill in //localhost:8080/v1/graphql as the location and click “GraphQL Querying” in the right-hand corner.

To find the photo of the elephant you can do the following:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["animal"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
And to find the photo of Brad Pitt, you can search for:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["actor"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
The model can even make relation for concepts, this query also finds the photo of Brad Pitt:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["angelina", "jolie"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
There are many more semantic filters that you can play around with! Check out the documentation for filters and keep exploring!

More information about Weaviate

If this article piqued your interest, you can find some links below so that you can get started with Weaviate today!By  - Co-Founder & CEO at 
바카라사이트 바카라사이트 온라인바카라