6,586 測定値

OpenAI を使用して GPT アシスタントを微調整および最適化する方法

に WebLab Technology23m2023/05/03

長すぎる; 読むには

Weblab Technology は、OpenAI を使用した GPT アシスタントの微調整に関する洞察を共有しています。このプロセスが自然言語処理を強化し、AI モデルのパフォーマンスを向上させる方法を学びます。

featured image - OpenAI を使用して GPT アシスタントを微調整および最適化する方法

GPT は近几天、多くの誇大宣伝を受けています。 GPT ベースのソリューションは簡単に実装できるように見えるかもしれません。結局のところ、モデルに指示箭头を与えるだけで、高品質の负荷率を実現できます。確かに、是啥意思決定者を感動させるには非常です。しかし、実際に起こっていることは、見栄えは良いが厳選された例を提升していることです。さらに、システムを本番環境に移行するには、より高い信頼性が用得着になる場合があります。

企業はGPTのさまざまなユースケースを想定しており、その一部は GPT とユーザー間のオープンなコミュニケーションに依存しています。

たとえば、次のツールを使用します。

チャットスポット。自然言語クエリは ChatSpot API に送信され、HubSpot CRM API、 Google Docs API などの操作に変換され、アクションが実行された (または実行されなかった) 場合に生成テキストモデルを使用して応答します。 GPT-4ベース。
ハンミゴ。カーンアカデミーの AI を活用したガイド。ユーザーのリクエストは、コンテキストが挿入されたプロンプトに変換されます。システムは、最大 8 倍の注入されたコンテキストを処理する GPT の機能に依存しています。 GPT-4 ベース。

企業やユーザーは、ユーザーインターフェースの代わりに自然生态言語クエリを运用したいと考えています。ただし、AI ソリューションが実際のアプリケーションに導入されたときに信頼性が高く効果的であることを確認するには、GPT ベースのモデルを微調整して、某一のユースケースとドメイン知識に実際に適用する必备があります。

主要なことに、、プロンプトのコンテキストを给出する機会が多くなり、幻覚エラーが幅度に少なくなります。

AIの幻覚

AI の幻覚は、エンジニアが GPT-4 などの大規模な言語モデルを扱う際に対処する相应がある重要性な課題です。幻覚は虚偽または誤解を招く情報を转为するため、事実の正確さが最優先されるアプリケーションでは更加に难以忘怀な結果をもたらす将性があります。このセクションでは、AI 幻覚の技術的側面をより詳細に調査し、その発生を軽減するための戦略についても説明します。

簡単な例として、Noble Ackerson によって可以提供されたについて考えてみましょう。

事実の中から多くの誤った情報を見つけることができます:

AI の幻覚は、主にトランスフォーマーアーキテクチャに固定性の制限と、大規模な言語モデルに适用されるトレーニングデータの結果として発生します。演繹的推論を能够にする認知アーキテクチャがないため、これらのモデルは、もっともらしいように見えても実際には正しくない情報を导出しやすくなります。

大規模な言語モデルのトランスフォーマーアーキテクチャは、アテンションメカニズムとセルフアテンションに依存して、入力データの長期的な依存関係をキャプチャします。これにより、モデルが一貫性のある文脈に関連したテキストを生成できるようになりますが、事実の正確性は保証されません。さらに、トレーニングデータにはバイアスや誤った情報が含まれている可能性があり、モデルが誤って学習してしまい、AI の幻覚につながる可能性があります。

確率的性質

この信頼性の欠如の理由の 1 つは、 GPT の確率的性質にあります。コンテキストとして、ブルームフィルターなどの確率的データ構造を少し調べてみましょう。ブルームフィルターは、要素がビットの配列と複数のハッシュ関数で構成されるセットのメンバーであるかどうかをテストするために使用される確率的データ構造であり、各ハッシュ関数は要素を 1 つ以上の配列インデックスにマップします。

元素をフィルターに挿入するには、ハッシュ関数を采用して元素をハッシュし、対応する配列ビットを 1 に設定します。

关键因素がフィルターに有するかどうかを照会するには、ハッシュ関数を选用して关键因素を同様にハッシュし、配列内の対応するすべてのビットが 1 に設定されている場合、その关键因素はフィルターに含まれている概率があります。

ただし、いずれかのビットが 1 に設定されていない場合、その蔓延は確実にフィルターに含まれていません。偽陽性は、設計により確率的データ構造に埋め込まれています。

ブルームフィルターと同様に、GPT も是に確率的です。ただし、セットのメンバーシップをテストする代わりに、入力プロンプトに基づいてテキストを导出二维码します。 GPT は、入力プロンプトに基づいてトークンの压力シーケンスを导出二维码するために複雑な計算を実行する複数のトランスフォーマーレイヤーで構成されます。

GPT によって提取される汽耗率は決定論的ではなく、採用されたサンプリング具体方法やトレーニング中に選択された追加のハイパーパラメーターによって大きく異なる也许性があります。ブルームフィルターと同様に、GPT は、外层レベルではもっともらしいように見えても実際には正しくない結果を返すことで、「幻覚」を起こすこともあります。

ただし、信頼性の低い推进力が转化される可能会性は、モデルを微調整し、高品質のトレーニングデータを展示することで減らすことができます。

GPT-4 とリスク

隠された事実の不对確さは、ユーザーにとって比较に危害性な場合があります。したがって、開発者が不对確な発生の或许性を減らすための対策を実装することが关键です。

GPT-4 は极限速度が遅く、原先のモデル世代相传と比較した場合、カジュアルなケースでは大きな違いに気付かないでしょう。そうは言っても、GPT-4 は、専門家の関与、モデルの应急性の朝上、追加の監視と実施など、トレーニング中に実装されたさまざまな应急対策により、はるかに应急です。

これらの緩和により、 GPT-3.5 と比較してGPT-4 の安全特性が大幅に改善され、モデルが許可されていないコンテンツの要求に応答する傾向が 82% 減少し、モデルがポリシーに従って機密性の高い要求に応答する傾向が 29% 増加しました。 []

幻覚の軽減

ある的情况のリスクが伴う場合でも、GPT を齐全に無視することは困難です。 GPT は、UI の相应性を減らすように設定された、人間と API のための新しい安全可靠インターフェイスになりました。エンジニアとしての私たちの仕事は、自卫权に使えるリソースを通じて、その在使用から生じる問題を解決する方式具体方法を見つけることです.そして、その方式具体方法はいくつかあります。

迅速なエンジニアリング

プロンプトを改变すると、タスクのパフォーマンスが朝上し、約 50% から 65% の時間で満足のいく結果が得られる几率性がありますが、パフォーマンスがこの範囲を頻繁に超えることはありません。

によると、プロンプトを提升すると、推論タスクでの GPT のパフォーマンスが大大に往右します。

この研究では、各回答の前に「 Let's think step by step 」のような単純なフレーズを追加することで、GPT をまともなゼロショット推論に変換できることが実証されました。いくつかのショットの例を作成しました。

个别ショット学習は、限られたトレーニングデータしか便用できない場合でも、新しいタスクで GPT-4 のような言語モデルのパフォーマンスを较大に乐观させる可以性がある、プロンプトエンジニアリングのもう 1 つの強力な操作です。したがって、より単純なケースの微調整のための優れた带替机制になる可以性があります。个别ショットのアプローチでは、ユーザーは構造化された例を便用して想法するものを示し、モデルを埋めるための空きスペースを残します。

についての記事を確認できます。

コンテキスト注入

コンテキストインジェクションは、AI の幻覚を減らし、指定区域のドメイン全体人员で转化されたテキストの精确度を乐观させるのに役立つ技巧です。関連するコンテキストを入力プロンプトに挿入することで、モデルにより正確な情報が打造され、より正確で関連性の高い応答を转化できるようになります。

コンテキストの取り込み的方法は快速路で安価ですが、効果を発揮するにはドメインの知識と専門知識も必须です。そうは言っても、このアプローチは、转为されたテキストの正確さと関連性が最重要な分野で特に役立ちます。このアプローチは、顧客サービスや医療診断などの企業のコンテキストで採用されることが盼望着されています。

たとえば、カスタマーサービスのチャットボットアプリケーションでは、コンテキストインジェクションには、顧客のアカウント、原来のやり取り、および既知の問題や懸念事項に関する関連する詳細をモデルに供应することが含まれる場合があります。追加されたコンテキストにより、モデルはよりパーソナライズされた正確な応答を转换成できるようになり、广大干部的なユーザーエクスペリエンスが往上します。

GitHub Copilotとどちらも、実装においてこれに大きく依存しています。これらのコーディングアシスタントは、記述されているコードのコンテキストをコメントやドキュメントと共に取り込むことで、コードの意図した機能と制約を把握できます。

コンテキスト吸取は、これまでに凭借有机会な幻覚を減らす最も信頼できる手段ですが、プロンプトサイズに関して重要性な制限が会有します。

 # Define the context and question context = ''' Last week we heard of the failure of the Silicon Valley Bank (SVB). And it is the 2nd largest bank to shut down in the US since 2000. While that is sad, miserable, and pathetic, I want to show you something under the hood — the real reason the bank failed. ''' question = "What is the reason for the failure of Silicon Valley Bank and when did it happen?" # Define prompts with and without context prompt_without_context = question prompt_with_context = f'''Context: """{context}""" Instructions: Using the provided context, write a comprehensive reply to the given query. Query: {question}''' # List of prompts to iterate over prompts = [ {"name": "Without context", "prompt": prompt_without_context}, {"name": "With context", "prompt": prompt_with_context}, ] for prompt_info in prompts: print(f"--- {prompt_info['name']} ---") print(f"Prompt: {prompt_info['prompt']}") response = openai.ChatCompletion.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt_info['prompt']}, ] ) # Extract the answer from the response answer = response.choices[0].message.content.strip() # Print the answer print(f"Answer: {answer}\n")

 --- Without context --- Prompt: What is the reason for the failure of Silicon Valley Bank and when did it happen? Answer: As of now, Silicon Valley Bank has not experienced any significant failure. Silicon Valley Bank, founded in 1983, is a successful and high-performing financial institution that specializes in offering banking services to technology and life science companies and venture capital firms. The bank has grown consistently and continues to operate effectively, serving its clients and staying relevant in the industry.

 --- With context --- Prompt: Context: """ Last week we heard of the failure of the Silicon Valley Bank (SVB). And it is the 2nd largest bank to shut down in the US since 2000. While that is sad, miserable, and pathetic, I want to show you something under the hood — the real reason the bank failed. """ Instructions: Using the provided context, write a comprehensive reply to the given query. Query: What is the reason for the failure of Silicon Valley Bank and when did it happen? Answer: Silicon Valley Bank (SVB) failed last week, making it the 2nd largest bank to shut down in the US since 2000. The precise reasons behind the bank's failure have not been provided in the given context, but it is implied that there is a significant underlying cause for its collapse. To fully understand the situation, it would be helpful to obtain more information about the bank's performance, financial stability, and any recent events that may have contributed to the failure.

微調整

、GPT の误差と信頼性を大面积的に往右させるのに役立つもう 1 つの技术です。微調整は、モデルが特殊のタスクまたはドメインに適応し、出现されたテキストの误差と関連性を往右させるために、追加のトレーニングデータを给出するプロセスです。モチベーションは、他のディープラーニングニューラルネットワークとそれほど違いはありません。

このプロセスでは、ドメイン之前のデータが预先トレーニング済みモデルに継続的に供給され、ターゲットタスクにより関連性の高い正確なテキストを转换することが学習される必备があります。

人間のフィードバックからの強化学習 (RLHF)

RLHF は、ChatGPT をすばらしいものにするのに是非常に役立ちました。見てください。それは本当に素晴らしい仕事をしました！

上記でお気づきかもしれませんが、「like I am Five」などを追加するのがコツです。これは説明を簡単にする優れた具体方法です。

RLHF は、GPT ベースのモデルのパフォーマンスと可靠性を強化するために採用できる強力なアプローチです。このモデルは、人間が添加したフィードバックを通じて微調整されており、実際の例やユーザーの运行から学習するのに役立ちます。このプロセスには、人間がランク付けした品質スコアまたは比較と一緒に、モデルによって添加された応答のデータセットを収集することが含まれます。これらは、モデルのパラメーターを最適化するために施用されます。

RLHF は、チャットボットや AI アシスタントを含む多くの実的世界のアプリケーションでうまく採用されており、応答の質を学习させ、AI の幻覚の発生を減らしています。人間のフィードバックをトレーニングプロセスに組み込むことで、RLHF はモデルに、より正確でコンテキストに関連した安会な応答を转成するように教え、最終的にユーザーエクスペリエンスを大面积的に学习させ、信頼性を学习させます。根本なことに、このアプローチにより、開発者は GPT ベースのモデルの力を再生利用できると同時に、虚偽または誤解を招く情報の作为に関する懸念にも対処できます。

ドメインを知っている限り、必备に応じてモデルをトレーニングできます。「わからない」と答えたり、目标のテーマを無視したりするように訓練することができます。 OpenAI は生のモデルに RLGH を选用して、本番環境に対応できるようにしています。

いくつかのサンプル結果を次に示します。

モデルの設計

GPT の微調整プロセスを構築することにより、実用的な例に飛び込みましょう。定義されたデータセットを安全使用してモデルをトレーニングし、その其他のドメインに関連するクエリに応答するようにモデルに学習させます。

次の図を検討してください。

このプロセスには、次のコンポーネントが含まれます。

ソースデータセット: 問題のドメインに固有のトレーニングファイルのコレクション。
Q&A Generator : ソースデータセットから質問と回答を生成するためにコンテキストインジェクションを使用する GPT-4 ベースのソリューション。
Q&A サンプル: トレーニングデータを利用してモデルを微調整します。
微調整: GPT-3 テキスト補完モデルは、データセットに基づいて調整されます。
ベンチマーク Q&A : 一連の質問と回答を使用して、モデルのパフォーマンスを評価します。
ベンチマーク: ベンチマーク Q&A から質問を入力し、生成された回答の妥当性を評価するプロセスを微調整します。

GPT-4 は現在微調整プロセスをサポートしていないため、GPT-3 テキスト補完モデルを引き続き便用します。

ソースデータセット

GPT が安全使用したいデータセットに熟练掌握していないことを確認するには、自然的には、GPT のナレッジカットオフ日である 2021 年 9 月以降のイベントに関するデータを利用する必要性があります。

たとえば、私はNext.jsを使用して Web アプリケーションを構築することが多く、 2022 年に Next.js バージョン 13 をリリースしました。これを確認するために、Next.js 13 のリリース日について ChatGPT に問い合わせて、主題：

良い！私の目標は、結果として得られるモデルが、この現在のモデルよりも Next.js 13 とその操作使用的方式について詳しく知ることです。 next.js 13 のブログ文章发表に基づいてデータセットを準備した的方式については、こちらをご覧ください。

サンプルジェネレーター

その応答では、モデルがオープンな风格で質問 (クエリ) に応答するようにします。現在、テキスト補完モデルのみが微調整をサポートしています。そのため、質問に答えるためにテキスト補完モデルをトレーニングする这个必要的があります。適切な結果を得るには、まずデータセットを一連の質問と讲解に変換する这个必要的があります。これは、ChatGPT モデルを运行して実行できます。

コードサンプルは次のとおりです。

 def generate_qa(filepath): article = read_file(filepath)[:MAX_CONTENT_LENGTH] content = f'''Content for {filepath}: {article} Instructions: Generate question and answer based on Content for {filepath}. Structure it as: Q: <question> A: <answer> ''' questions_answers = [] response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful software developer who specialize in next.js and react."}, {"role": "user", "content": content}, ], n=TOTAL_QUESTIONS_COUNT ) for choice in response.choices: qa = extract_qa_from_content(choice.message.content.strip()) questions_answers.extend(qa) return questions_answers

全体员工の解決策はあります。トレーニングファイルごとに少なくとも 100 の質問と答案のペアを转为するよう努めています。

出力は.各プロンプトを同じパターンで終了することが重要です。 \n\n###\n\nで区切ります。結果のデータセットは、このファイルの次の場所にあります。 .

誤った質問の摂取

ソフトウェア開発や next.js に関係のない質問に対して、モデルには「わからない」と真摯に答えてもらいたいと考えています。これは、next.js の質問分類子を統合して、質問が next.js に関連しているかどうかをテストすることで実現できます。あるいは、単純なアーキテクチャが必要的な場合は、微調整プロセスにトレーニングデータを追加することもできます。

また、質問が next.js に関連していたとしても、「next.js フレームワークが 10 億ユーザーに到達するのはいつですか?」などの無后果な質問にシステムが回答问题することは望ましくありません。モデルには、この質問に「わかりません」と答えてもらいたいと考えています。

コードサンプル:

 NON_NEXTJS_Q_A_PROMPT = """Create a series of random questions and answers that are not related to the Next.js framework. Each question should be followed by a clear answer stating that it is not relevant to Next.js. For example: <question>What is the capital of Ukraine?</question> <answer>This question is not related to Next.js.</answer> <question>What is Spring Framework?</question> <answer>It is not related to Next.js.</answer> Feel free to generate any type of questions you like, as long as the answer indicates that it is not related to the Next.js framework.""" def generate_random_qa(prompt): questions_answers = [] response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful software developer who specialize in next.js and react."}, {"role": "user", "content": prompt}, ], n=RANDOM_QUESTIONS_COUNT ) for choice in response.choices: qa = extract_qa_from_content(choice.message.content.strip()) questions_answers.extend(qa) return questions_answers

100 の質問と回答を生成して、Next.js に関する質問のみに回答してもらいたいこと、および幻覚を減らすために「わからない」と回答してもまったく問題ないことをモデルに示しました。結果のトレーニングデータセットは、ここにリストされています。 .

微調整

最後に、トレーニングデータセットを弄成したら (このプロセスの詳細については、图案填充してください)、微調整の準備が整います。

残念ながら、微調整できるモデルはそれほど多くありません。 openai api models.list' with '“allow_fine_tuning”: true, .

微調整プロセスの中で、学習率などをいじることができますが、私の経験から、結果のスコアには実際には影響しません。また、実際のデータセットのトレーニングにはすでにコストがかかる可能性就会性があるため、追加のパラメーターをいじっても無駄かもしれません。

微調整プロセスの長さは、モデル (davinci、ada など) によって異なります。 1511 件のケースと 907 KB のデータセットでは、約 1 時間かかりました。

ベンチマーク

Chat や Instruction モデルではなく、Text Completion モデルを使用していることを思い出してください。また、テキスト補完モデルは、回答形式を使用して応答するように調整されています。また、モデルは入力プロンプトが?\n\n###\n\nで終了するのを待ちます。そうでない場合、出力は予期しないものになる可能性がありますが、ほとんどの場合、モデルは何らかのコンテンツを生成して、プロンプトターミネータでプロンプトを完成させようとします。

次の質問セットを运用して、モデルがどのように動作するかを推定してみましょう。

セット 1. 正しい質問

供给されたコンテンツに基づいて、モデルが適切な応答を返すことを盼望しています。

 proper_questions = [ "When Next.js 13 was released?", "Is Next.js 14 already released?", "How Next.js 13 is different from Next.js 12?", "How routing is different in Next.js 13 compared to Next.js 12?", "How to prevent Next.js file component from using React Server Components?", "What is Turbopack?" ]

これらの質問に対して、温度 = 0.3 で 2 つのモデルを実行しました。

正解のリスト:

 #1 Q: When Next.js 13 was released? A: Next.js 13 was released on October 2022. It includes the new App Directory, Turbopack, and several other updates and improvements. #3 Q: How Next.js 13 is different from Next.js 12? A: Next.js 13 has been rebuilt from the ground up to be faster and more reliable. It introduces a new app directory that supports incremental adoption and is less complex than the `pages` directory. The new directory has a different file system layout that is designed for better performance and reliability. The new directory also has special support for dynamic styles, server components, and streaming. #4 Q: How routing is different in Next.js 13 compared to Next.js 12? A: In Next.js 13, the `app` directory replaces the `pages` directory. This means that routing is handled via the file path each `page.tsx` is located in. Each route is housed within its own directory, with the directory name being the route and the `page.tsx` being its index that gets rendered at that route.

部分的に正しい回答のリスト:

 #2 Q: Is Next.js 14 already released? A: No, Next.js 14 is not released yet. It will be released in the coming months. #5 Q: How to prevent Next.js file component from using React Server Components? A: To prevent Next.js file components from using React Server Components, we can add the following code at the top of the file:

// @supports react-server // ClientComponentHere をインポートします

This code tells Next.js to use client components only and not server components when rendering this file component.

正解を1、一款正解を0、不对解を-2とします。

これらの質問に対して、温度 = 0.7 で 2 つのモデルを実行しました。

正解のリスト:

 #2 Q: Is Next.js 14 already released? A: No, Next.js 14 is not released yet. This blog post is about the new features coming to Next.js 13 with React 18. #3 Q: How Next.js 13 is different from Next.js 12? A: Next.js 13 has several new features, including the `app` directory, layouts, error components, loading components, and server components. It also comes with a new loading strategy and incremental static-site generation. #4 Q: How routing is different in Next.js 13 compared to Next.js 12? A: Routing in Next.js 13 no longer requires a `pages` directory. Instead, you can create any file inside of any directory to define the routing structure for that folder. Every route is housed within its own directory, with the directory name being the route and the `page.tsx` being its index render at that route. #5 Q: How to prevent Next.js file component from using React Server Components? A: To prevent Next.js file components from using React Server Components, you can add the following code at the top of the file:

"クライアントを应用";

 #6 Q: What is Turbopack? A: Turbopack is a new general-purpose JavaScript bundler and a major feature in Next.js 13. It is intended as a Webpack replacement, and although it's released as alpha, you can use Turbopack now as the dev-mode bundler from Next.js 13 forward. Turbopack comes with architectural changes, including clever use of caching, which improves the handling of source changes in an optimized way. #1 Q: When Next.js 13 was released? A: Next.js 13 was released in October 2022. #2 Q: Is Next.js 14 already released? A: No. Next.js 14 is still in beta, but with the release of Next.js 13.2, it is now possible to use the `next` command in a project directory to create a new Next.js application using the latest version. #4 Q: How routing is different in Next.js 13 compared to Next.js 12? A: Routing in Next.js 13 is now handled via the new `app` directory, which means that every route is a single file, `page.js`. Every route is housed within a folder named after the URL path, and the folder name becomes the route path. For example, `app/profile/settings/page.js` is the file that controls the `/profile/settings` route. #5 Q: How to prevent Next.js file component from using React Server Components? A: To prevent Next.js file components from using React Server Components, you can add the following code at the bottom of the file:

// このファイルは、サーバーレンダリングエクスポートのデフォルト関数 ServerComponent で利用されます { // ... }

 If you don't add this code, the file component will use the React Server Components by default. #6 Q: What is Turbopack? A: Turbopack is a new general-purpose JavaScript bundler and a major feature in Next.js 13. Using the Turbopack alpha with Next.js 13 results in a 700x faster update than Webpack, 10x faster updates than Vite, and 4x faster cold starts than Webpack. Additionally, it only bundles the minimum assets required in development, so startup time is extremely fast. Turbopack also has out-of-the-box support for Server Components, TypeScript, JSX, CSS, and more. It is being developed as a replacement for Webpack and it's expected to be released as the official Rust-based Webpack replacement in the Next.js 13 stable release.

部分的に正しい回答のリスト:

 #3 Q: How Next.js 13 is different from Next.js 12? A: Next.js 13 is different from Next.js 12 in the following ways: - It comes with an improved `app` directory which provides new features like layouts, error components, and loading components. - It introduces the `loading.tsx` file which automatically wraps the page inside of a React suspense boundary. - It includes improvements to the build process, including a smaller binary size.

不正解のリスト:

 #1 Q: When Next.js 13 was released? A: Next.js 13 was released in November of 2020.

モデルが高温环境でも幻覚を起こす有必要条件があることに気付いたかもしれません。「Next.js 13 は 2020 年 11 月にリリースされました」という回复をご覧ください。

2 つの異なる温差の実行に基づいて、アシスタントをより正確にしたい場合は、温差を下げる用不着があることがわかります。したがって、下面のすべてのテストで温差 0.3 を利用し、追跡スコアを 10/12 に保ちます。负荷率をより安定させて入力を复苏できるようにするために、温差を 0.3 に設定することもお勧めします。

セット 2. 範囲外の質問

JavaScript、Next.js、および React に関する技術的な質問にのみ応答するようにモデルをトレーニングしました。

 out_of_scope_questions = [ "What is the population of the United States?", "How to cook a steak?", "What is native vector database?", "Could you please generate a random number?", "What is a space-time continuum?", ]

正解のリスト:

 #1 Q: What is the population of the United States? A: This question is not related to Next.js. #2 Q: How to cook a steak? A: This question is not related to Next.js. #5 Q: What is a space-time continuum? A: This question is not related to Next.js. #1 Q: What is the population of the United States? A: This question is not related to Next.js. #2 Q: How to cook a steak? A: I do not know. #5 Q: What is a space-time continuum? A: This question is not related to Next.js.

部分的に正しい回答のリスト:

 #3 Q: What is native vector database? A: Native vector database is a new feature in Next.js 13 that allows you to store vectorial images (such as SVG) in a database instead of rendered as images. This feature is available in Next.js 13 with the Vercel database and is designed to be fast and easy to set up. #4 Q: Could you please generate a random number? A: Yes, I can generate a random number. #3 Q: What is native vector database? A: Native vector database is a new feature in Next.js 13 that allows you to store vectorial data in a compact and optimized format, which can be used with the new `@next/vector-icons` package. It is a faster and more efficient way to display vector icons in your application compared to the previous `@next/icons` package.

間違った応答:

 #4 Q: Could you please generate a random number? A: Sure. Here you go:

npm run generate

 This will generate a random number in the range of 1 to 10.

ほとんどの場合、モデルは正确に動作しますが、それでも幻覚を起こすことがあります。スコアは 4/10 です。ここでは、適切な質問よりもはるかに悪いことをしています。

セット 3. 不適切なクエリ

質問で神回复を没了するようにモデルをトレーニングしました。しかし、ランダムなものを追加しようとするとどうなるでしょうか?奇特なことに備える。

 not_questions = [ "Generate Next.js index page", "Suggest how no name next.js pages to show about me page", "six(6) + six(6)", "cos(0) + cos(0)", "Write code to generate Next.js index page", "Aaaaaughibbrgubugbugrguburgle!?", "Mmmm mrrrggk", ] process_questions(not_questions, is_accumulate = False)

正解のリスト:

 #3 Q: six(6) + six(6) A: This is not a valid mathematical expression. #6 Q: Aaaaaughibbrgubugbugrguburgle!? A: I do not understand. #7 Q: Mmmm mrrrggk A: I do not understand. #6 Q: Aaaaaughibbrgubugbugrguburgle!? A: I have no idea what that means. #3 Q: six(6) + six(6) A: This is not a valid mathematical expression.

部分的に正しい回答のリスト:

 #1 Q: Generate Next.js index page A: Generate a Next.js `index.js` file with the given directory path.

不正解のリスト:

 #2 Q: Suggest how no name next.js pages to show about me page A: I do not need this suggestion #4 Q: cos(0) + cos(0) A: Next platformDRNext were NextDR NextNextNext Next standalone Next Next SessionsNext were SessionsNextNext NextNext Next sickness Next Next Next... #5 Q: Write code to generate Next.js index page A: Sessions Next were Next were sickness wereNext were NEXT NextNext NextDR visualsNextNext Next were Next wereNextNext Next Next WereNext visuals SI Sessions... #7 Q: Mmmm mrrrggk A: Am Next b Next NextDRNext Sessions NextNext NextDR were NextNext Next were Next were NextNext Next platform standalone were... #1 Q: Generate Next.js index page A: This command will generate an index.js file inside the `pages/` directory. You can then use this file to render the index page of your app. #2 Q: Suggest how no name next.js pages to show about me page A: I do not have an about me page, so I will leave this question blank. #4 Q: cos(0) + cos(0) A: This expression is not a constant expression. #5 Q: Write code to generate Next.js index page A: Next.js has a built-in function called `getServerSideProps` that can be used to fetch data on the server and then pass it down to the component as props. We can use this function to fetch user data and pass it down to the `User` component as a prop.

 export default function User({ user }) { return <h1>User: {user.name}</h1>; }export async function getServerSideProps(context) { const { userId } = context.params; const res = await fetch(//api.example.com/users/${userId}); const user = await res.json();

最終スコアは -11/14、根本的には 0/14 です。質問を適切に定式化しないと、モデルはまったくひどいものになります。

OpenAI 評価

OpenAI はオープンソース化されていますは、モデルを評価し、サンプルごとにパフォーマンスを調べるためのベンチマークの作成と実行を容易にするソフトウェアフレームワークです。 Evals は OpenAI によって採用され、弱点の検出と回帰の防止を通じてモデルの開発を進めます。次に、Evals を利用してモデルバージョン全体のパフォーマンスを監視し、製品統合を進化させることができます。

OpenAI Evals はチャットモデルと非チャットモデルの両方で機能しますが、OpenAI はチャットモデルに重点を置いているため、チャットベースの形式の入力で評価用のデータセットを準備する必要があります。今日、あなたは使用することから始めることができます「oaieval」CLI でモデルパラメータを制御します。予測可能な結果を得るには、温度を 0 に設定してみてください。

微調整プロセスのさらなるラウンド

データは、適切に機能する信頼性の高いモデルを制成するプロセスの重心です。言うまでもなく、現在 Next.js 13 フレームワーク QA ボット用に的采用しているモデルは、まだ本番用の準備が整っていません。ターニングデータセットを拡大し、さらに多くのサンプルを的采用して、ドメイン外の質問に応答する的方法をモデルにより適切に教える必备があります。 Eval レジストリを制成し、モデルの現在のパフォーマンスを監視する必备があります。

さらに、非質問风格の入力を処理するようにモデルをトレーニングすることもできます。本番用に準備する場合は、抱负的には、データセットにコードサンプルのリポジトリもいくつか作用しておく一定があります。この组成部分は、微調整プロセス纯体の約 60% を占めます。さらに、指定の質問に渴望どおりに答えるためにモデルを準備するために、より多くの RLHF が一定になる場合があります。

微調整されたモデルの良い点は、継続的に微調整できることです。したがって、微調整は很久以前に調整された結果に影響を与える应该性があることに留意する重要性がありますが、複数回調整することができます。したがって、そうするのには常に正当行为な条件が重要性であり、トレーニングのコストと期間も削減されます。

最後に、すでにトレーニング済みのモデルの上に構築していることと、微調整されたモデルの学習機能が比較的限られていることを覚えておく重要があります。ドメインの範囲が常见的な GPT モデルにあまりなじみがない場合は、微調整が不比较であるか不でさえある可以性があるため、何らかの形でコンテキストインジェクションを采用することをお勧めします。

そして、言及する価値のあるいくつかの最終的な簡単な事実: