visit
When I began exploring the use of large language models (LLM) for summarizing large texts, I found no clear direction on how to do so.
Whether its my use-case, the model format, quantization, compression, prompt styles, or what? I don’t know. All I know is, do your own model rankings under your own working conditions. Don’t just believe some chart you read online.
.
Large Language Model (LLM): (AKA Model) A type of Artificial Intelligence that has been trained upon massive datasets to understand and generate human language.
Example: OpenAI’s GPT3.5 and GPT4 which have taken the world by storm. (In our case we are choosing among open source and\or freely downloadable models found on .)
Retrieval Augmented Generation (RAG): A technique, , of storing documents in a database that the LLM searches among to find an answer for a given user query (Document Q/A).
User Instructions: (AKA Prompt, or Context) is the query provided by the user.
Example: “Summarize the following text : { text }
”
System Prompt: Special instructions given before the user prompt, that helps shape the personality of your assistant.
Example: “You are a helpful AI Assistant.”Context: User instructions, and possibly a system prompt, and possibly previous rounds of question\answer pairs. (Previous Q/A pairs are also referred to simply as context).
Prompt Style: These are special character combinations that a LLM is trained with to recognize the difference between user instructions, system prompt and context from previous questions.
Example: <s>[INST] {systemPrompt} [/INST] [INST] {previousQuestion} [/INST] {answer} </s> [INST] {userInstructions} [/INST]
7B: Indicates the number of parameters in a given model (higher is generally better). Parameters are the internal variables that the model learns during training and are used to make predictions. For my purposes, 7B models are likely to fit on a my GPU with 12GB VRAM.
GGUF: This is a specific format for LLM designed for consumer hardware (CPU/GPU). Whatever model you are interested in, for use in PrivateGPT, you must find its GGUF version (commonly made by ).
Q2-Q8 0, K_M or K_S: When browsing the files of a GGUF repository you will see different versions of the same model. A higher number means less compressed, and better quality. The M in K_M means “Medium” and the S in K_S means “Small”.
VRAM: This is the memory capacity of your GPU. To load it completely to GPU, you will want a model smaller size than your available VRAM.
Tokens: This is the metric LLM weighs language with. Each token consists of roughly 4 characters.
:
PrivateGPT provides an API containing all the building blocks required to build private, context-aware AI applications. The API follows and extends OpenAI API standard, and supports both normal and streaming responses. That means that, if you can use OpenAI API in one of your tools, you can use your own PrivateGPT API instead, with no code changes, and for free if you are running privateGPT in
local
mode.
When i began testing various LLM variants, mistral-7b-instruct-v0.1.Q4_K_M.gguf
came as part of PrivateGPT's default setup (made to run on your CPU). Here, I've preferred the Q8_0 variants.
Model | Rating | Search Accuracy | Characters | Seconds | BS | Filler | Short | Good BS |
---|---|---|---|---|---|---|---|---|
hermes-trismegistus-mistral-7b | 68 | 56 | 62141 | 298 | 3 | 4 | 0 | 6 |
synthia-7b-v2.0 | 63 | 59 | 28087 | 188 | 1 | 7 | 7 | 0 |
mistral-7b-instruct-v0.1 | 51 | 56 | 21131 | 144 | 3 | 0 | 17 | 1 |
collectivecognition-v1.1-mistral-7b | 56 | 57 | 59453 | 377 | 3 | 10 | 0 | 0 |
kai-7b-instruct | 44 | 56 | 21480 | 117 | 5 | 0 | 18 | 0 |
NOTE: Despite the numerous large context models being released, for now, I still believe smaller context results in better summaries. I don’t prefer any more than 2750 tokens (11000 characters) per summarization task.
Name | Score | Characters Generated | % Diff from OG | Seconds to Generate | Short | Garbage | BS | Fill | Questions | Detailed |
---|---|---|---|---|---|---|---|---|---|---|
hermes-trismegistus-mistral-7b | 74 | 45870 | -61 | 274 | 0 | 1 | 1 | 3 | 0 | 0 |
synthia-7b-v2.0 | 60 | 26849 | -77 | 171 | 7 | 1 | 0 | 0 | 0 | 1 |
mistral-7b-instruct-v0.1 | 58 | 25797 | -78 | 174 | 7 | 2 | 0 | 0 | 0 | 0 |
kai-7b-instruct | 59 | 25057 | -79 | 168 | 5 | 1 | 0 | 0 | 0 | 0 |
collectivecognition-v1.1-mistral-7b | 31 | 29509 | -75 | 214 | 0 | 1 | 1 | 2 | 17 | 8 |
Find the full data and rankings on or on GitHub: , .
Model | % Difference | Score | Comment |
---|---|---|---|
Synthia 7b V2 | -64.43790093 | 28 | Good |
Mistral 7b Instruct v0.2 (Default Prompt) | -60.81878508 | 33 | VGood |
Mistral 7b Instruct v0.2 (Llama2 Prompt) | -64.5871483 | 28 | Good |
Tess 7b v1.4 | -62.12938978 | 29 | Less Structured |
Llama 2 7b 32k Instruct (Default) | -61.39890553 | 27 | Less Structured. Slow |
Find the full data and rankings on or on .
system: {{systemPrompt}}
user: {{userInstructions}}
assistant: {{assistantResponse}}
<s> [INST] <<SYS>>
{{systemPrompt}}
<</SYS>>
{{userInstructions}} [/INST]
<s>[INST] {{systemPrompt}} [/INST]</s>[INST] {{userInstructions}} [/INST]
I began testing output with the default
, then llama2
prompt styles. Next I went to work .
Prompt Style | % Difference | Score | Note |
---|---|---|---|
Mistral | -50% | 51 | Perfect! |
Default (llama-index) | -42% | 43 | Bad headings |
Llama2 | -47% | 48 | No Structure |
Find the full data and rankings on or on .
Name | System Prompt | Change | Score | Comment |
---|---|---|---|---|
None |
| -49.8 | 51 | Perfect |
Default Prompt | You are a helpful, respectful and honest assistant. \nAlways answer as helpfully as possible and follow ALL given instructions. \nDo not speculate or make up information. \nDo not reference any given instructions or context." | -58.5 | 39 | Less Nice |
MyPrompt1 | "You are Loved. Act as an expert on summarization, outlining and structuring. \nYour style of writing should be informative and logical." | -54.4 | 44 | Less Nice |
Simple | "You are a helpful AI assistant. Don't include any user instructions, or system context, as part of your output." | -52.5 | 42 | Less Nice |
Find the full data and rankings on or on .
| Prompt | vs OG | score | note |
---|---|---|---|---|
Propmt0 | Write concise, yet comprehensive, notes summarizing the following text. Use nested bullet points: with headings, terms, and key concepts in bold. Focus on essential knowledge from this text without adding any external information. | 43% | 11 |
|
Prompt1 | Write concise, yet comprehensive, notes summarizing the following text. Use nested bullet points: with headings, terms, and key concepts in bold. Focus on essential knowledge from this text without adding any external information. | 46% | 11 | Extra Notes |
Prompt2 | Write comprehensive notes summarizing the following text. Use nested bullet points: with headings, terms, and key concepts in bold. | 58% | 15 |
|
Prompt3 | Create concise bullet-point notes summarizing the important parts of the following text. Use nested bullet points, with headings terms and key concepts in bold, including whitespace to ensure readability. Avoid Repetition. | 43% | 10 |
|
Prompt4 | Write concise notes summarizing the following text. Use nested bullet points: with headings, terms, and key concepts in bold. | 41% | 14 |
|
Prompt5 | Create comprehensive, but concise, notes summarizing the following text. Use nested bullet points: with headings, terms, and key concepts in bold. | 52% | 14 | Extra Notes |
Find the full data and rankings on or on .
Write comprehensive notes summarizing the following text. Use nested bullet points: with headings, terms, and key concepts in bold.
Instead of spending weeks per summary, I completed my first 9 book summaries in only 10 days.
Book | Models | Character Difference | Identical | Minor changes | Paraphrased | Total Matched |
---|---|---|---|---|---|---|
Eastern Body Western Mind | Synthia 7Bv2 | -75% | 3.5% | 1.1% | 0.8% | 5.4% |
Healing Power Vagus Nerve | Mistral-7B-Instruct-v0.2; SynthIA-7B-v2.0 | -81% | 1.2% | 0.8% | 2.5% | 4.5% |
Ayurveda and the Mind | Mistral-7B-Instruct-v0.2; SynthIA-7B-v2.0 | -77% | 0.5% | 0.3% | 1.2% | 2% |
Healing the Fragmented Selves of Trauma Survivors | Mistral-7B-Instruct-v0.2 | -75% |
|
|
| 2% |
A Secure Base | Mistral-7B-Instruct-v0.2 | -84% | 0.3% | 0.1% | 0.3% | 0.7% |
The Body Keeps the Score | Mistral-7B-Instruct-v0.2 | -74% | 0.1% | 0.2% | 0.3% | 0.5% |
Complete Book of Chakras | Mistral-7B-Instruct-v0.2 | -70% | 0.3% | 0.3% | 0.4% | 1.1% |
50 Years of Attachment Theory | Mistral-7B-Instruct-v0.2 | -70% | 1.1% | 0.4% | 2.1% | 3.7% |
Attachment Disturbances in Adults | Mistral-7B-Instruct-v0.2 | -62% | 1.1% | 1.2% | 0.7% | 3.1% |
Psychology Major's Companion | Mistral-7B-Instruct-v0.2 | -62% | 1.3% | 1.2% | 0.4% | 2.9% |
Psychology in Your Life | Mistral-7B-Instruct-v0.2 | -74% | 0.6% | 0.4% | 0.5% | 1.6% |
Instead of spending weeks per summary, I completed my first 9 book summaries in only 10 days. In parenthesis is the page count of the original.