260 reads

Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural Search

by Alex C-GMay 2nd, 2021

Too Long; Didn't Read

Jina is an open-source deep learning-powered search framework for building cross-/multi-modal search systems on the cloud. It lets you build a search engine for any data with any kind of data. Jina's Streamlit component is a front-end for end-users, so it doesn't worry about the indexing part. The component parses out the useful information (e.g. text or image matches) and displays them to the user. It offers flexibility and, being written in Python, it can be easier for data scientists to get up to speed.

Companies Mentioned

Coin Mentioned

featured image - Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural Search

Do you ever think, “Darn this stupid cloud. Why can’t there be an easier way to build a neural search on it?”

Well, if you have, this article is for you. I’m going to walk through how to use new Streamlit component to search text or images to build a neural search front end. Want to jump right in? Check out our text search app or image search app, and here's the .

Why use Jina to build a neural search?

Jina is an open-source deep learning-powered search framework for building cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud. Essentially, it lets you build a search engine for any kind of data with any kind of data.

So you could build ala Google, a ala Google Images, a , and so on. Companies like Facebook, Google, and Spotify build these searches powered by state-of-the-art AI-powered models like FAISS, DistilBERT, and Annoy.

Why use Streamlit with Jina?

I was a big fan of Streamlit before I even joined Jina. I used it on a project to create terrible Star Trek scripts that later turned into a front-end for text generation with Transformers. So I'm over the moon to be using this cool framework to build something for our users.

Building a Streamlit component helps the data scientists, machine learning enthusiasts, and all the other developers in the Streamlit community build cool stuff powered by neural search. It offers flexibility and, being written in Python, it can be easier for data scientists to get up to speed.

Out of the box, the streamlit-jina component has text-to-text and image-to-image search, but Jina offers a rich search experience for any kind of data with any kind of data so there's plenty more to add to the component!

How does it work?

Every Jina project includes two :

Indexing: for breaking down and extracting rich meaning from your dataset using neural network models

Querying: for taking a user input and finding matching results

Our Streamlit component is a front end for end-users, so it doesn't worry about the indexing part.

Admin spins up a Jina Docker image:

docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1

User enters a query into the Streamlit component (currently either a text input or an image upload) and hits 'search'
The input query is wrapped in JSON and sent to Jina's query API
The query Flow does its thing and returns results in JSON format (along with lots of metadata)
The component parses out the useful information (e.g. text or image matches) and displays them to the user

Example code

Let's look at our since it's easier to see what's going on there:

import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)

endpoint = "//0.0.0.0:45678/api/search"

st.title("Jina Text Search")
st.markdown("You can run our [Wikipedia search example](//github.com/jina-ai/examples/tree/master/wikipedia-sentences) to test out this search")

jina.text_search(endpoint=endpoint)

As you can see, the above code:

Imports streamlit and streamlit_jina
Sets the REST endpoint for the search
Sets the page titleDisplays some explanatory text
Displays the Jina text search widget with endpoint defined

For the Jina Streamlit widgets, you can also pass in other parameters to define the number of results you want back or if you want to hide certain widgets.

Behind the scenes

The source code for our module is just one file,

__init__.py

. Let's just look at the high-level functionality for our text search example for now:

Set configuration variables

headers = {
    "Content-Type": "application/json",
}

# Set default endpoint in case user doesn't specify and endpoint
DEFAULT_ENDPOINT = "//0.0.0.0:45678/api/search"

Render component

class jina:
    def text_search(endpoint=DEFAULT_ENDPOINT, top_k=10, hidden=[]):
        container = st.beta_container()
        with container:
            if "endpoint" not in hidden:
                endpoint = st.text_input("Endpoint", endpoint)

            query = st.text_input("Enter query")

            if "top_k" not in hidden:
                top_k = st.slider("Results", 1, top_k, int(top_k / 2))

            button = st.button("Search")

            if button:
                matches = text.process.json(query, top_k, endpoint)
                st.write(matches)

        return container

In short, the

jina.text_search()

method:

Creates a Streamlit container to hold everything, with sane defaults if not specified
If widgets aren't set to hidden, present them to user
[User types query]
[User clicks button]
Sends query to Jina API and returns results
Displays results in the component

Our method's parameters are:

jina.text_search()

calls upon several other methods, all of which can find in

__init__.py

. For image search there are some additional ones:

```
image.encode.img_base64()
```
encodes a query image to base64 and wraps it in JSON before passing to Jina API
Jina's API returns matches in base64 format. The
```
image.render.html()
```
method wraps these in
```
<IMG>
```
tags so they'll display nicely

Use it in your project

In your terminal:

Create a new folder with a virtual environment and activate it. This will prevent conflicts between your system libraries and your individual project libraries:

mkdir my_project
virtualenv env
source env/bin/activate

Install the and packages:

pip install streamlit streamlit-jina

. Alternatively, use a pre-indexed Docker image:

docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1

Create your app.py:

import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)

endpoint = "//0.0.0.0:45678/api/search" # This is Jina's default endpoint. If your Flow uses something different, switch it out

st.title("Jina Text Search")

jina.text_search(endpoint=endpoint)

Run Streamlit:

streamlit run app.py

And there you have it – your very own text search!

For image search, simply swap out the text code above for our image example code and run a Jina image (like our .)

What to do next

Thanks for reading the article and looking forward to hearing what you think about the component! If you want to learn more about Jina and Streamlit here are some helpful resources:

A big thank you!

Major thanks to Randy Zwitch, TC Ricks and Amanda Kelly for their help getting our component live. And thanks to all my colleagues at Jina for building the backend that makes this happen!

Previously published at

L O A D I N G
. . . comments & more!