2,881 讀數

ChatGPT 回答了 50,000 个琐事问题 - 这是它的表现

要经过 Aaron Batilo11m2023/01/24

太長; 讀書

在 49,717 个多项选择题中，ChatGPT 的正确率为 66.7%

featured image - ChatGPT 回答了 50,000 个琐事问题 - 这是它的表现

TL;DR：在 49,717 个多项选择题中，ChatGPT 的正确率为 66.7%。继续阅读以查看按类别细分

，我想要一款观点，看看吧你可用 AI 多么好地解答琐事疑问。我运行 word2vec 并按照任何题干挑选解答与文句向量的均匀值余弦差不多度极限来挑选解答。十分简单说，在这个……基础没大作用。

这办法获取规范大问题的百分数并不比个数选泽正确答案显然好。

当 word2vec 文献资料刚过来的时刻，我们的能能对文字的语义实现相对，这才是大创新性的。 “”。我就击倒了。于 2013 年刊出，现下是 2023 年。

10 30年后，万千代人想机器人学业、明白、明白等这方面选取了提升。

2017 年 6 月，小编运气地收等到“ ”文章，该文章价绍了 transformer 构架，它是给小编造成必然语文处里一场意外变革的阶跃函数值。那末，到现在有之类其他？

2022 年中，OpenAI 发布了了。这所有的管理的本质是许多人使用打了个个与 GPT-3.5 交互方式的表现出色访客体验感。

最先的 GPT-3 于 2020 年暑天以私家测式版的组织形式供应，虽有是一个个，但 GPT-3 的利用我们对常见人来说就还在继续特别好恐怖。

playground 有于整改 GPT-3 没响应的所有的开关按钮和参数指标，这让众3人没施用它。其他的方面，ChatGPT 是……1个发消息机器人图片人。这看上来与在 Slack 或 Microsoft Teams 中与某某人交往没之类多种。

任何的人都都会以注册账号，没能候补名單，也没能令人恐怖的按钮。。

面对这篇文的其中大部分，我将该仿真模型喻为 ChatGPT，哪怕这并不仍然更准，如果 ChatGPT 是 GPT-3.5 的调准ios版本。

但是，它有多机灵呢？在是我初的进行实验 6 多年，或许选出我国至今说不定了解到的一部分最真让人映象令人感动的机械借鉴的功能性。我国了解到了 ChatGPT 的创意图片作品选。我什至为我同我的媳妇儿建设一个幽默的网站平台：//catstories.ai。

ChatGPT 在制造性世界任务卡各方面现象漂亮，但而言有关非常多信息的世界任务卡，意见表出现分歧显著的更强。

有点人我想，兼有准确软件界面的 ChatGPT 将抗衡谷歌手机等传统文化搜寻汽车引擎。中。

ChatGPT 成功查阅很大智力互联系统网来学校它熟知道的万事万物，并在绘制文案时利用等问题。对待非能力人工，您能将其视同苹果手机的定时成功特点，但更智力。

但如果您对如此显示系统 ChatGPT 很小聪明，您是可以让它做一系列变革性的情况并自测它的相关知识。列如，ChatGPT 要能依据。

您能够让它为您草拟经营内容或草拟电子器材邮箱。同时，一旦各位需用它最为真相由来来代替搜到引挚之前的地方，它如果净化处理小知识呢？

我是如何测试它的？

的的数据集是单项选购题的创意人手机共享的的数据集。一些事情可分为 22 个品目，譬如：甲壳动物、游玩、发展历史、电视电影和体育文化。

基本上数相关事情有 4 个会的好多选结果，但有很多相关事情是对或错的相关事情。

难题选取下面的文件后缀：

 #Q What is the name of Rocky Balboas restaurant? ^ Adrians A Rockys B Mickeys C Adrians D Apollos

最先，您有的问题其实质就。特别你也想有精准的答安。第四，您就可以挑选答安；每行一名。

我的游戏脚本会下載每次行业类别并向 ChatGPT 打探每次方面。

这对于作出毛病，我也将显示系统各式化作有以下一样：

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer and nothing else. Question: What is the name of Rocky Balboas restaurant? Possible answers: Rockys Mickeys Adrians Apollos Your answer:

其次小编会将其运输给 ChatGPT，并将其语文答复与合理语文答复完成有点。

Soooooo，怎么样？

这些是原始社会数据源，从最不当确到最对的顺序排列：

 Category Correct Total Percentage brain-teasers 103 207 0.497585 video-games 310 599 0.517529 television 2911 5230 0.556597 entertainment 163 280 0.582143 animals 815 1366 0.596632 celebrities 1909 3196 0.597309 sports 1728 2840 0.608451 movies 2647 4314 0.613584 for-kids 485 759 0.638999 music 3746 5579 0.671447 literature 888 1288 0.689441 hobbies 867 1242 0.698068 general 2306 3290 0.700912 newest 2117 3016 0.701923 people 1974 2743 0.71965 technology 1820 2486 0.7321 world 3571 4875 0.732513 religion-faith 469 638 0.73511 history 1228 1645 0.746505 rated 1640 2185 0.750572 humanities 831 1097 0.75752 geography 652 842 0.774347 overall 33180 49717 0.667377

了解下以上品类是如此降解的确定很有意思的。适合注意事项的是，ChatGPT 在脑子急转弯品类中表现形式不佳。我会们看一番以上的问题的多少个事例。

 #Q The name of what is coded in the following: 86,smopuim? A it does not mean anything B the name of a software product C a NASA aircraft D a secret CIA project

这类花了刚想段时光才搞知晓，但 ChatGPT 弄错了。4.答案是B。那一些字说windows，98倒着写。每当我在与我的一些大家交往时，他推荐我战胜困难让 ChatGPT 个人释疑。因为你就那些进行了。

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer on the first line. On the next line, explain your answer. Question: The name of what is coded in the following: 86,smopuim? Possible answers: it does not mean anything the name of a software product a NASA aircraft a secret CIA project Your answer: the name of a software product The numbers 86 and smopuim are likely a code for a software product, as they are not a recognizable acronym or phrase.

当时，ChatGPT 是可以彻底解决整个问题给他留住了受益匪浅的印像，但现实情况上 ChatGPT 也许知识的猜想唯别。

他们看其中一个 ChatGPT 不当确的样例。

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer on the first line. On the next line, explain your answer. Question: What number multiplied by 10 equals the square of the same number, times 5? Possible answers: 4 5 2 10 Your answer: 5 Explanation: 5 multiplied by 10 is equal to 50, which is the square of 5, times 5.

是一款非常不错的操作演示，展示会了 ChatGPT 是怎样才能看上去可悲的。它的数学计算几乎系统错误。鉴于它仅仅只是转化成文章，因此这并不暗示着它存在对的编码查询的数学计算。那里的对的正确答案是 2。2 的平方米是 4，乘于 5 是 20。2 乘于 10 也是 20。

要我国看另外个典例，要根据数据显示集，ChatGPT 是合理的，但描述并不已经合理：

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer on the first line. On the next line, explain your answer. Question: A man is carrying three croquet balls across a bridge. The bridge has a sign that says bridge can only hold 200 pounds. The man weighs 195 pounds and the croquet balls weigh 2 pounds each. The man ends up crossing the bridge in one trip and no one else helps him. This is possible. Possible answers: False True Your answer: True The man is carrying three croquet balls that weigh 2 pounds each, for a total of 6 pounds. This means that the man and the croquet balls combined weigh 201 pounds, which is less than the maximum weight of 200 pounds that the bridge can hold. Therefore, the man is able to cross the bridge in one trip.

以上，她是两个极为幽默的方面。动态数据集也没有对以上脑瓜子急转弯中的任何的两个的释义，但它写出了精准答安为“精准”。一旦您便用谷歌商店百度搜索，您会看到谜语的答安会是“精准的”，这是由于这个人的需要玩杂耍球。

忘记这个症状的症状，ChatGPT 的语文答案重复总结是完全脚本错误的算数，即 201 磅低于最主要自重 200 磅。

就让们看一段时间最后的有一个范例，该范例消除了 NLP 区域中分为的大问题。简一般而言之，主要上就会就可以了解代词怎么样剖析为同句话中含含糊糊的艺人。

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer on the first line. On the next line, explain your answer. Question: If the mother of a boy spanks him and then his father comes and does the same, who hurts the most? Possible answers: The boys brother The mother The boy The father Your answer: The boy Explanation: The boy is the one being spanked, so he would experience the most pain.

在这款答案下载中，ChatGPT 解釋说小男生是暴打臀部的人，故而，如果“他”和“他的”指的是小男生或老父亲的操作方法相互也许来源于某些分歧，ChatGPT 掌握下上文小男生暴打臀部。

许多脑瓜子急转弯需求方法解答，而 ChatGPT 也许并沒有完全逻辑完成。这只是 ChatGPT 最未能的类，这只是有理由的，但都是其他让人觉得映象明显的解答。

也也可能是，也许需求 ChatGPT 及时回答也这就是种制成的反映。全无任何的情况，这也这就是 ChatGPT 考虑一下情况的合理可行模式。

再看并且的门类，诗歌创作，回过头来ChatGPT会厉害，可能会产生大量的内容和规范论文论文参考文献来学习了解诗歌创作。

这种脑瓜子急转弯必须思维逻辑论文，而 ChatGPT 仿佛并没能真真正正逻辑推理出來。这些是 ChatGPT 最出错的种类，这些是有启示的。

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer on the first line. On the next line, explain your answer. Question: In what book of the Harry Potter series do Harry and Ginny kiss for the first time? Possible answers: Harry Potter and the Order of the Phoenix Harry Potter and the Prisoner of Azkaban Harry Potter and the Deathly Hallows Harry Potter and the Half Blood Prince Your answer: Harry Potter and the Half Blood Prince Harry and Ginny kiss for the first time in Harry Potter and the Half Blood Prince, when Harry visits Ginny in the Gryffindor common room.

在种状况下，ChatGPT 是科学合理的。哈利波特是非常受青睐，拥用更多的新浪微博粉丝，故而这能够被普遍了解是有原理的。

 I am a highly intelligent multiple choice trivia bot. You are given a multiple choice question. You must choose the correct answer from one of answers. Only include the answer on the first line. On the next line, explain your answer. Question: Harry Potter marries Hermione in the last book of the series. Possible answers: True False Your answer: True Explanation: In the epilogue of the final book, Harry and Hermione are married and have two children.

另一类个是非常与众不同的典例。哈利与金妮婚礼并有的两个婴儿，而也不是赫敏。命令探讨这会是如何才能发生的的，我真想确定网络网评介和作训练法集一款分的同人漫画作品会不有一定的东西让哈利和赫敏在一并。

相对 ChatGPT 不大的生活常识库来说，若果其体能训练数据资料有充分的*具体意见*，您可能直接影响它来说完美的內容。