Add 'Simon Willison's Weblog'

master
Andrea Nestor 4 months ago
parent ffb7f37cb2
commit c7ff816b6c

@ -0,0 +1,42 @@
<br>That design was [trained](https://veloelectriquepliant.fr) in part [utilizing](http://www.lagardeniabergantino.it) their [unreleased](http://aobbekjaer.dk) R1 "thinking" model. Today they've [released](https://rightlane.beparian.com) R1 itself, in addition to an entire [household](https://lachlanco.com) of [brand-new designs](https://numama.ru) obtained from that base.<br>
<br>There's a whole lot of stuff in the [brand-new release](https://clubsworld.net).<br>
<br>DeepSeek-R1-Zero seems the [base model](https://institutosanvicente.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek warn](https://patrologiagraeca.org) that "DeepSeek-R1-Zero comes across difficulties such as endless repetition, poor readability, and language blending." ... so they also released:<br>
<br>DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 across math, code, and reasoning jobs". That a person is also MIT accredited, and is a similar size.<br>
<br>I do not have the [capability](https://oke.zone) to run [designs bigger](http://47.101.207.1233000) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can [easily play](https://www.drapaulawoo.com.br) with myself. That's where the new [distilled](http://www.footebrotherscanoes.net) models are available in.<br>
<br>To [support](http://1080966874.n140159.test.prositehosting.co.uk) the research study community, we have [open-sourced](https://www.telefonospam.es) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](https://va-teichmann.de) [distilled](http://183.238.195.7710081) from DeepSeek-R1 based upon Llama and Qwen.<br>
<br>This is a [remarkable flex](http://harimuniform.co.kr)! They have actually [models based](https://www.ftpol.com) on Qwen 2.5 (14B, [lovewiki.faith](https://lovewiki.faith/wiki/User:KirkCuster893) 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://infocursosya.site) those [Llama designs](https://apartstudioqm.pl) have an MIT license attached, which I'm [uncertain](http://www.mcjagger.net) works with the [underlying Llama](https://travelswithsage.com) license. [Qwen models](https://casadellagommalodi.com) are Apache [certified](https://my-energyco.com) so perhaps MIT is OK?<br>
<br>(I also [simply noticed](https://skytube.skyinfo.in) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](http://melkbosstrandaccommodations.co.za) to how they copied those in.)<br>
<br>[Licensing](https://www.mgvending.it) aside, these [distilled designs](https://git.multithefranky.com) are [fascinating](https://fionajeanne.life) beasts.<br>
<br>[Running](https://wisc-elv.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized variations](https://truthharvester.net) are already [starting](https://decrimnaturesa.co.za) to show up. So far I have actually [attempted simply](http://1proff.ru) among those- unsloth/[DeepSeek-R](https://academyofcrypto.com) 1-Distill-Llama-8[B-GGUF launched](https://www.leegenerator.com) by [Unsloth](https://dev.forbes.ge) [AI](http://melkbosstrandaccommodations.co.za)-and it's actually [enjoyable](http://tucsonherpsociety.org) to have fun with.<br>
<br>I'm [running](https://www.yago.com) it [utilizing](https://bertlierecruitment.co.za) the [combination](http://energeabc.com) of Ollama, LLM and the [llm-ollama plugin](https://tours-classic-cars.fr).<br>
<br>First I [fetched](https://academychartkhani.com) and ran the [design utilizing](https://pluginstorm.com) Ollama itself:<br>
<br>This [downloads](https://www.restaurantdemolenaar.nl) around 8.5 GB of design information and begins an [interactive chat](http://www.officeschool.net) [interface](https://karjerosdienos.vilniustech.lt).<br>
<br>Once the model has actually been brought, LLM can talk with it too. I [prefer utilizing](https://benjewett.com) LLM for [experiments](https://www.textilartigas.com) since it logs whatever to SQLite for later [expedition](https://yenitespih.com).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each [response](https://expandedsolutions.com) begins with a ... tag containing the chain of idea used to [assist produce](https://yesmouse.com) the [reaction](https://ec-multiservicos.pt). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
<br>Okay, so I [require](http://starcom.com.pk) to come up with a joke about a [pelican](http://www.piotrtechnika.pl) and a [walrus running](https://www.prettywomen.biz) a tea room together. Hmm, that's a [fascinating combination](https://escritoriodemidiape.com). Let me consider how these two [characters](http://odkxfkhq.preview.infomaniak.website) might [connect](http://112.48.22.1963000) in an [amusing scenario](https://www.odekake.kids).<br>
<br>First, let's consider their [characteristics](https://krakow.net.pl). [Pelicans](https://mudandmore.nl) are known for their long beaks and Webbed feet, [frequently](http://www.saragarciaguisado.com) seen near the beach or water. [Walruses](https://host-it.fi) have huge teeth, thick fur, and they're generally found in [chillier](http://eigo.jpn.org) climates, like [icebergs](https://mycoachline.com) or [snowy locations](http://tesma.co.kr). So, [combining](http://www.rexlighting.co.kr) these 2 into a [tea space](https://luciamattituck.com) [setting](http://lifestyle-safaris.com) is already a funny image.<br>
<br> [...]
<br>It then thought for 20 [paragraphs](https://git.micg.net) before [outputting](http://sopchess.gr) the joke! Here's the complete [transcript](https://nextonlinecourse.org) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](https://www.constructionview.com.au) and a [walrus choose](http://genamax.com.ar) to open a quaint little [tea space](https://beta.talentfusion.vn) together. The walrus, with its big size, [struggles](http://pion.ru) to [discover comfortable](http://124.71.40.413000) chairs, so it rests on the table by accident, [knocking](http://docker.clhero.fun3000) over the teapot. Meanwhile, the pelican, trying to help, [utilizes](https://anbaaiq.net) its beak to place dishes on the table, [triggering](https://www.itsallsavvy.com) a few spills.<br>
<br>After a series of funny incidents, the walrus looks up and states with a smile, "This isn't so fishy any longer." The [pelican smirks](https://www.okayama1.co.jp) and says, "Maybe not, however we do have a great deal of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://somoshoustonmag.com) and Walrus run a not likely tea room. Their [sizes result](http://--.u.k37cgi.members.interq.or.jp) in [comical mayhem](http://saskiakempers.nl).
- Punchline: "This isn't so fishy any longer" is a pun on the [situation](https://harmonia345.com) ending up being less [ridiculous](https://www.capital.gr). The [follow-up joke](https://suffolkwedding.com) [introduces](http://www.ensemblelaseinemaritime.fr) "krill," a play on words that ties into both [marine life](http://www.saragarciaguisado.com) ("fish") and tea, making it funny while [fitting](https://marinaionita.com) the theme.<br>
<br>So yeah, the joke is [dreadful](https://ttzhan.com). But the [procedure](https://job-maniak.com) of [arriving](https://ok-net.com.ua) was such an [intriguing insight](https://somersetmiri.com) into how these [brand-new models](http://wasserskiclub.de) work.<br>
<br>This was a fairly little 8B design. I'm looking [forward](http://www.michiganjobhunter.com) to trying the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://tawaraya1956.com) of Llama 3.3 70[B-currently](http://tennesseantravelcenter.org) my [favourite](https://lasvegaspackagedeals.org) GPT-4 [class model](https://mantaw.com) that I've run on my own [machine-I](https://batfriendly.org) have high [expectations](http://www.ilparcoholiday.it).<br>
<br>Update 21st January 2025: I got this [quantized](https://thekinddessert.com) version of that Llama 3.3 70B R1 [distilled design](https://xtusconnect.com) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](https://shankhent.com)?<br>
<br>I [attempted](https://eastamptonplace.com) my [timeless Generate](https://www.gomnaru.net) an SVG of a [pelican riding](https://mecaoffice.com.br) a [bike prompt](http://ipolonina.ru) too. It did [refrain](https://www.dobreljekarne.hr) from doing really well:<br>
<br>It aimed to me like it got the order of the [elements](https://git.multithefranky.com) wrong, [king-wifi.win](https://king-wifi.win/wiki/User:IsobelPettway79) so I followed up with:<br>
<br>the [background](http://neuronadvisers.com) ended up [covering](http://gmsf.kr) the [remainder](https://sbu-poslovi.rs) of the image<br>
<br>It believed some more and provided me this:<br>
<br>Just like the earlier joke, the chain of thought in the [records](http://git.scxingm.cn) was far more [intriguing](https://www.markant.ch) than the end [outcome](https://www.britishdragons.org).<br>
<br>Other [methods](http://leadmall.kr) to try DeepSeek-R1<br>
<br>If you wish to try the model out without [installing](https://game-uv.kelo-cote.bg) anything at all you can do so using [chat.deepseek](https://mahmoud80lucas.edublogs.org).[com-you'll](https://holamaestro.com.ar) need to [develop](http://euro2020ticket.net) an [account](http://shirayuki.saiin.net) (check in with Google, [utilize](http://florence.boignard.free.fr) an [email address](https://www.contraband.ch) or offer a [Chinese](http://venus-ebrius.com) +86 phone number) and then pick the "DeepThink" [choice listed](http://president-park.co.kr) below the prompt [input box](https://lasvegaspackagedeals.org).<br>
<br>[DeepSeek provide](https://www.pianaprofili.it) the design via their API, using an [OpenAI-imitating endpoint](https://www.dcnadiagroup.com). You can access that via LLM by [dropping](http://www.sdhbartovice.cz) this into your [extra-openai-models](https://www.alanrsmithconstruction.com). [yaml configuration](https://tentazionidisicilia.it) file:<br>
<br>Then run [llm secrets](https://www.tylerbhorvath.com) set [deepseek](http://tesma.co.kr) and paste in your API secret, then [utilize llm](https://siemreapwaxingandspa.com) [-m deepseek-reasoner](https://www.pianaprofili.it) ['prompt'](http://47.101.139.60) to run [triggers](https://bbqtonight.com.sg).<br>
<br>This won't show you the [thinking](https://recruitment.econet.co.zw) tokens, sadly. Those are provided by the API (example here) but LLM doesn't yet have a method to [display](https://evidentia.it) them.<br>
Loading…
Cancel
Save