visit
Dewy is an OSS knowledge base designed to streamline the way developers store, organize, and retrieve information. Its flexibility and ease of use make it an excellent choice for developers aiming to build knowledge-driven applications.
LangChain.js, on the other hand, is a powerful framework that enables developers to integrate LLMs into their applications seamlessly. By combining Dewy's structured knowledge management with LangChain.js's LLM capabilities, developers can create sophisticated question-answering systems that can understand and process complex queries, offering precise and contextually relevant answers.
mkdir dewy_qa
cd dewy_qa
With the directory set up, you can install TypeScript and initialize the project:
npm init -y
npm i typescript --save-dev
npx tsc --init
Depending on your environment, you may need to make some changes to your TypeScript config. Make sure that your tsconfig.json
looks something like the following:
{
"compilerOptions": {
"target": "ES6",
"module": "CommonJS",
"moduleResolution": "node",
"declaration": true,
"outDir": "./dist",
"esModuleInterop": true,
"strict": true,
}
Now you're ready to create the CLI application. To keep the code from getting too messy, organize it into several directories, with the following layout
dewy_qa/
├── commands/
│ └── ...
├── utils/
│ └── ...
├── index.ts
├── package.json
└── tsconfig.ts
Each command will be implemented in the commands
directory, and shared code will go in the utils
directory. The entrypoint to the CLI application is the file index.ts
.
Start with a simple "hello world" version of index.ts
- you'll start filling it out in the next section
#!/usr/bin/env ts-node-script
console.log("hello world");
To verify the environment is setup correctly, try running the following command - you should see "hello world" printed in the console:
npx ts-node index.ts
Rather than typing out this very long command every time, let's create an entry in package.json
for the command. This will help us remember how to invoke the CLI, and make it easier to install as a command:
{
...
"bin": {
"dewy_qa": "./index.ts"
}
...
}
Now you can run your script with npm exec dewy_qa
or npm link
the package and run it as just dewy_qa
Load documents by setting up the Dewy client. The first step is to add some dependencies to the project. The first is dewy-ts
, the client library for Dewy. The second is commander
, which will help us build a CLI application with argument parsing, subcommands, and more. Finally, chalk
to makes the prompts more colorful.
npm install dewy-ts commander chalk
Next, implement the load command's logic. You'll do this in a separate file named commands/load.ts
. This file implements a function named load
, which expects a URL and some additional options - this will be wired up with the CLI in a later section.
Dewy makes document loading super simple - just setup the client and call addDocument
with the URL of the file you'd like to load. Dewy takes care of extracting the PDF's contents, splitting them into chunks just the right size for sending to an LLM and indexing them for semantic search.
import { Dewy } from 'dewy-ts';
import { success, error } from '../utils/colors';
export async function load(url: string, options: { collection: string, dewy_endpoint: string }): Promise<void> {
console.log(success(`Loading ${url} into collection: ${options.collection}`));
try {
const dewy = new Dewy({
BASE: options.dewy_endpoint
})
const result = await dewy.kb.addDocument({ collection: options.collection, url });
console.log(success(`File loaded successfully`));
console.log(JSON.stringify(result, null, 2));
} catch (err: any) {
console.error(error(`Failed to load file: ${err.message}`));
}
}
You may have noticed that some functions were imported from ../utils/colors
. This file just sets up some helpers for coloring console output - put it in utils
so it can be used elsewhere:
import chalk from 'chalk';
export const success = (message: string) => chalk.green(message);
export const info = (message: string) => chalk.blue(message);
export const error = (message: string) => chalk.red(message);
To start, install some additional pacakges - langchain
and openai
to use the OpenAI API as LLM:
npm install dewy-langchain langchain @langchain/openai openai
The first thing to setup is Dewy (as before) and an LLM. One difference from before is that dewy
is used to build a DewyRetriever
: this is a special type used by LangChain for retrieving information as part of a chain. You'll see how the retriever is used in a just a minute.
const model = new ChatOpenAI({
openAIApiKey: options.openai_api_key,
});
const dewy = new Dewy({
BASE: options.dewy_endpoint
})
const retriever = new DewyRetriever({ dewy, collection });
This is a string template that instructs the LLM how it should behave, with placeholders for additional context which will be provided when the "chain" is created. In this case, the LLM is instructed to answer the question, but only using the information it's provided. This reduces the model's tendency to "hallucinate", or make up an answer that's plausible but wrong. The values of context
and question
are provided in the next step:
const prompt =
PromptTemplate.fromTemplate(`Answer the question
based only on the following context:
{context}
Question: {question}`);
Use a RunnableSequence
to create an LCEL chain. This chain describes how to generate the context
and question
values: the context is generated using the retriever created earlier, and the question is generated by passing through the step's input. The results Dewy retrieves are formatted as a string by piping them to the formatDocumentsAsString
function.
DewyRetriever
and assigns them to context
and assigns the chain's input value to question
.context
and question
variables.const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
model,
new StringOutputParser(),
]);
Now that the chain has been constructed, execute it and output the results to the console. As you'll see, question
is an input argument provided by the caller of the function.
Executing the chain using chain.streamLog()
allows you to see each response chunk as it's returned from the LLM. The stream handler loop is sort of ugly, but it's just filtering to appropriate stream results and writing them to STDOUT
(using console.log
it would have added newlines after each chunk).
const stream = await chain.streamLog(question);
// Write chunks of the response to STDOUT as they're received
console.log("Answer:");
for await (const chunk of stream) {
if (chunk.ops?.length > 0 && chunk.ops[0].op === "add") {
const addOp = chunk.ops[0];
if (
addOp.path.startsWith("/logs/ChatOpenAI") &&
typeof addOp.value === "string" &&
addOp.value.length
) {
process.stdout.write(addOp.value);
}
}
}
Now that you've seen all the pieces, you're ready to create the query
command. This should look similar to the load
command from before, with some additional imports.
import { StringOutputParser } from "@langchain/core/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";
import { formatDocumentsAsString } from "langchain/util/document";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI } from "@langchain/openai";
import { Dewy } from 'dewy-ts';
import { DewyRetriever } from 'dewy-langchain';
import { success, error } from '../utils/colors';
export async function query(question: string, options: { collection: string, dewy_endpoint: string, openai_api_key: string }): Promise<void> {
console.log(success(`Querying ${options.collection} collection for: "${question}"`));
try {
const model = new ChatOpenAI({
openAIApiKey: options.openai_api_key,
});
const dewy = new Dewy({
BASE: options.dewy_endpoint
})
const retriever = new DewyRetriever({ dewy, collection: options.collection });
const prompt =
PromptTemplate.fromTemplate(`Answer the question based only on the following context:
{context}
Question: {question}`);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
model,
new StringOutputParser(),
]);
const stream = await chain.streamLog(question);
// Write chunks of the response to STDOUT as they're received
console.log("Answer:");
for await (const chunk of stream) {
if (chunk.ops?.length > 0 && chunk.ops[0].op === "add") {
const addOp = chunk.ops[0];
if (
addOp.path.startsWith("/logs/ChatOpenAI") &&
typeof addOp.value === "string" &&
addOp.value.length
) {
process.stdout.write(addOp.value);
}
}
}
} catch (err: any) {
console.error(error(`Failed to query: ${err.message}`));
}
}
With Dewy and LangChain.js integrated, the next step is to build the CLI interface. Use a library like commander
to create a user-friendly command-line interface that supports commands for loading documents into Dewy and querying the knowledge base using LangChain.js.
First, rewrite index.ts
to create the subcommands load
and query
. The --collection
argument determines which Dewy collection the document should be loaded into (Dewy lets you organize documents into different collections, similar to file folders). The --dewy-endpoint
argument lets you specify how to connect to Dewy - by default an instance running locally on port 8000
is assumed. Finally, the --openai_api_key
argument (which defaults to an environment variable) configures the OpenAI API:
#!/usr/bin/env ts-node-script
import { Command } from 'commander';
import { load } from './commands/load';
import { query } from './commands/query';
const program = new Command();
program.name('dewy-qa').description('CLI tool for interacting with a knowledge base API').version('1.0.0');
const defaultOpenAIKey = process.env.OPENAI_API_KEY;
program
.command('load')
.description("Load documents into Dewy from a URL")
.option('--collection <collection>', 'Specify the collection name', 'main')
.option('--dewy-endpoint <endpoint>', 'Specify the collection name', '//localhost:8000')
.argument('<url>', 'URL to load into the knowledge base')
.action(load);
program
.command('query')
.description('Ask questions using an LLM and the loaded documents for answers')
.option('--collection <collection>', 'Specify the collection name', 'main')
.option('--dewy-endpoint <endpoint>', 'Specify the collection name', '//localhost:8000')
.option('--openai-api-key <key>', 'Specify the collection name', defaultOpenAIKey)
.argument('<question>', 'Question to ask the knowledge base')
.action(query);
program.parse(process.argv);
OK, all done - wasn't that easy? You can try it out by running the command:
dewy_qa load //arxiv.org/pdf/2009.08553.pdf
You should see something like
Loading //arxiv.org/pdf/2009.08553.pdf into collection: main
File loaded successfully
{
"id": 18,
"collection": "main",
"extracted_text": null,
"url": "//arxiv.org/pdf/2009.08553.pdf",
"ingest_state": "pending",
"ingest_error": null
}
Extracting the content of a large PDF can take a minute or two, so you'll often see "ingest_state": "pending"
when you first load a new document.
dewy_qa query "tell me about RAG
You should see something like
Querying main collection for: "tell me about RAG"
Answer:
Based on the given context, RAG refers to the RAG proteins,
which are involved in DNA binding and V(D)J recombination.
The RAG1 and RAG2 proteins work together to bind specific
DNA sequences known as RSS (recombination signal sequences)
and facilitate the cutting and rearrangement of DNA segments
during the process of V(D)J recombination...