On November 15th, MetaAI and Papers with Code announced the release of Galactica, a game-changer, open-source large language model trained on scientific knowledge with 120 billion parameters.As one of my friends , the model can write whitepapers, reviews, Wikipedia pages, and code. It knows how to cite and how to write equations. It’s kind of a big deal for AI and science.On November 17th, Galactica was shut down.Why? Because, as with all deep learning models, it didn’t understand the task at hand and was wrong in many cases. This shouldn’t be an issue, especially if we add a warning saying the model may be wrong and not to trust it blindly. Just like nobody trusted Wikipedia, we couldn’t put this as a reference in High School projects. The issue is that .Still, the model is available to researchers, and I believe it is important to keep it open-sourced.As another one of my friend's shared, all the drama around the new model seems a bit excessive. Of course, the model isn’t perfect, just like all others that are currently available online. We need it online to test its limitations, work on it and improve it. We should see these kinds of publications as students and allow for mistakes and improvements without fear of being shut down or canceled.Anyways, we are not here to discuss that. Hopefully, .We are here to see what Galactica is, or was, and how it could achieve writing papers, reviews, code, and more…
Learn more in the video
References
►Read the full article:
►Taylor et al., 2022: Galactica,
►My Newsletter (A new AI application explained weekly to your emails!):
Video Transcript
0:00on November 15th Metairie and papers0:03with code announced the release of0:04galatica a game changer open source0:07large language model trained on0:09scientific knowledge with 120 billion0:12parameters as one of my friends shared0:14on Twitter the model can write white0:16papers reviews Wikipedia pages and code0:19it knows how to cite and how to write0:22equations it really is kind of a big0:24deal for AI and science on November 17th0:28Galactica was shut down why because as0:31with all deep learning models it didn't0:34understand the task at hand and was0:36wrong in many cases this shouldn't be an0:39issue especially if we add a warning0:41saying the model may be wrong and not to0:43trust it blindly just like nobody0:45trusted Wikipedia we couldn't put it as0:48a reference in high school projects the0:50issue was that Galactica was wrong and0:52biased but sounded right and uteritative0:55still the model is available to0:57researchers and I believe it's important0:59to keep bit open sourced as another of1:02my friends shared all the drama around1:04this new model seems a bit excessive of1:06course the model isn't perfect just like1:08all others that are currently available1:10online we need it online to test its1:13limitations work on it and improve it we1:16should see these kinds of Fabrications1:18as students and allow for mistakes and1:21improvements without the fear of being1:22shut down or canceled anyways we are not1:26here to discuss that hopefully it will1:28be back online soon we are here to see1:30what Galactica is or was and how it1:33could achieve writing papers reviews1:35code math and more basically Galactica1:39is a large language model with a size1:41comparable to gpt3 but specialized on1:44scientific knowledge more precisely it1:46was trained on a large and curated1:48Corpus of scientific knowledge including1:50over 48 million papers textbooks and1:54lecture notes millions of compounds and1:56proteins scientific websites1:58encyclopedias and more as they highlight2:00data were of high quality and highly2:03curated which is one of the big2:05difference with gpt3 So in theory2:08Galactica contains pretty much all of2:10Humanity's scientific knowledge imagine2:12having an amazing memory and the time to2:15read millions of research remembering2:18most of it well this is Galactica it2:21seems like its memory isn't so good2:23after all and it mixes everything even2:25though we could assume most information2:27present in the training data set was2:29accurate even considering all devices2:31and failures Galactica stays pretty2:34powerful and outperforms pretty much all2:36other approaches for scientific related2:39tasks it's just not enough for a product2:41we can have confidence in still it's2:44worth understanding how it works2:46especially because it will come back2:48even more powerful pretty soon as we2:51mentioned Galactica is a large language2:53model similar to gpt3 or Bloom2:55specifically trained for as they say2:58organize science there's also a lot of3:01engineering going on in this model3:03allowing so much versatility in its3:05inputs and outputs like special3:07tokenization of citations or protein3:09sequences which you can learn more in3:11their paper linked below their3:13tokenization effort is by far the3:15biggest contribution of this work3:17tokensation basically means the way the3:20model will see the data instead of words3:23math or shapes that we understand I3:26actually share a video on embedding and3:28tokenization later this week so if that3:30sounds interesting stay tuned for that3:33and subscribe to not miss it so accept3:35this weird tokensation and3:37pre-processing steps what is Galactica3:39and what does it do after taking the3:42words or different scientific inputs and3:44preparing it for the model doing3:46tokenization no surprise Galactica is3:50yet another Transformer based3:52architecture like gpt3 with a couple of3:55variations including the tokenization3:57differences so I definitely invite you3:59to but one of the many videos I or some4:02of my friends made covering the4:04Transformer architectures as I won't get4:06into how they work once again the second4:09major difference between Galactica and4:11other large language models is what they4:13call the prompt pre-training this means4:16that they will include prompts extracted4:18from the training data set alongside the4:21data itself which has been shown to4:23maximize the generality of the model4:25while boosting performance on some tasks4:28of interest and that's pretty much it as4:31I said the architecture is very similar4:33to what you already know and mostly the4:35training and pre-processing schemes vary4:37which shows that the model isn't4:39everything but how we preach through the4:41data for it might actually matter even4:43more you can basically see the4:45difference between gpt3 and Galactica as4:48the same student with a bad science4:49teacher versus a good one it has the4:52same capabilities and resources the4:55teacher just made it more accessible and4:57understandable for him of course this4:59was just an overview of the paper and I5:02strongly recommend reading it there are5:04tons of details about the multiple5:06engineering tricks they've implemented5:08along with results analysis details on5:11all the tasks they tackle using the5:13model and how it understood the input5:15data and its predictions its limitations5:18biases and more I hope you've enjoyed5:21this video and I will see you next week5:23with another amazing paper and a specialvideo covering what embeddings are