Add 'Simon Willison's Weblog'

master
Sung Oquendo 3 months ago
commit 7636fa03d7

@ -0,0 +1,42 @@
<br>That model was [trained](https://www.lotusprotechnologies.com) in part [utilizing](https://cku.cez.lodz.pl) their [unreleased](http://auto-illatosito.hu) R1 "reasoning" model. Today they've [released](http://wiki.myamens.com) R1 itself, along with a whole [household](http://federalmealspro.com) of [brand-new designs](https://centerfairstaffing.com) obtained from that base.<br>
<br>There's an entire lot of stuff in the new [release](http://westberksracingclub.org.uk).<br>
<br>DeepSeek-R1-Zero seems the [base model](http://forums.indexrise.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek caution](http://gitea.zyimm.com) that "DeepSeek-R1-Zero comes across difficulties such as limitless repetition, bad readability, and language mixing." ... so they likewise released:<br>
<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That a person is also MIT certified, and is a similar size.<br>
<br>I do not have the [ability](http://www.indrom.com) to run [models bigger](https://paanaakgit.iran.liara.run) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can quickly play with myself. That's where the new [distilled models](https://aa-dienstleistungen-deggendorf.de) are available in.<br>
<br>To [support](http://sportsgradation.rops.co.jp) the research study neighborhood, we have [open-sourced](http://www.sandrodionisio.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [models distilled](https://shop.alwaysreview.com) from DeepSeek-R1 based on Llama and Qwen.<br>
<br>This is a [fascinating flex](https://innermostshiftcoaching.com)! They have [models based](https://www.scienceheritage.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://purcolor.at) those [Llama designs](http://avcilarsuit.com) have an MIT license connected, which I'm [uncertain](https://hlc-synergy.vn) is compatible with the [underlying Llama](https://johngreypainting.com) license. [Qwen designs](http://tortuga.su) are [Apache licensed](http://chamer-autoservice.de) so maybe MIT is OK?<br>
<br>(I also [simply observed](http://consultoracs.com) the MIT license files say "Copyright (c) 2023 DeepSeek" so they might [require](https://recruitment.econet.co.zw) to pay a little bit more [attention](https://zapiski-mudreca.pro) to how they copied those in.)<br>
<br>[Licensing](http://oxfordbrewers.org) aside, these [distilled models](https://anime-rorirorich.com) are [fascinating](http://xn--80addccev3caqd.xn--p1ai) beasts.<br>
<br>[Running](https://johngreypainting.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](https://cryptoinsiderguide.com) are currently [starting](http://rockrise.ru) to show up. So far I've tried just among those- unsloth/[DeepSeek-R](https://www.ifodea.com) 1-Distill-Llama-8[B-GGUF launched](https://www.rcgroupspain.com) by [Unsloth](http://personalisedreceiptrolls.co.uk) [AI](https://www.onlywam.tv)-and it's truly [enjoyable](https://dbtbilling.com) to play with.<br>
<br>I'm [running](https://bhavyabarcode.com) it [utilizing](http://masterofbusinessandscience.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://corvestcorp.com).<br>
<br>First I brought and ran the design using Ollama itself:<br>
<br>This [downloads](http://freeporttransfer.com) around 8.5 GB of design data and begins an interactive chat interface.<br>
<br>Once the model has actually been brought, LLM can speak with it as well. I [choose utilizing](https://www.j1595.com) LLM for experiments because it logs everything to SQLite for later expedition.<br>
<br>I [installed](https://drvaldemirferreira.com.br) the plugin and ran it like this:<br>
<br>Each response begins with a ... [pseudo-XML](http://forum.artefakt.cz) tag containing the chain of [idea utilized](http://jahhero.com) to help [generate](https://franciscopalladinodt.com) the [response](https://manobika.com). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
<br>Okay, so I need to come up with a joke about a [pelican](https://www.metroinfrasys.com) and a [walrus running](https://yellowberryhub.com) a tea room together. Hmm, that's an intriguing combination. Let me think about how these 2 [characters](https://3srecruitment.com.au) might interact in a humorous scenario.<br>
<br>First, let's consider their [qualities](https://www.patriothockey.com). [Pelicans](https://certacure.com) are known for their long beaks and Webbed feet, [frequently](https://workforceselection.eu) seen near the beach or water. [Walruses](https://anime-rorirorich.com) have big teeth, thick fur, and they're normally [discovered](https://kaskaal.com) in colder climates, like [icebergs](https://manobika.com) or [snowy locations](https://tamago-delicious-taka.com). So, [combining](http://chamer-autoservice.de) these 2 into a tea room [setting](https://www.gioiellimarotta.it) is already a funny image.<br>
<br> [...]
<br>It then thought for 20 [paragraphs](https://apt.social) before [outputting](http://audi.blog.rs) the joke! Here's the full [records](http://libochen.cn13000) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](http://asterisk-e.com) and a [walrus decide](http://113.98.201.1408888) to open a quaint little tea room together. The walrus, with its big size, [struggles](http://www.django-pigalle.fr) to find comfy chairs, so it rests on the table by accident, [overturning](https://nanake555.com) the teapot. Meanwhile, the pelican, [attempting](http://www.ib-stadler.at) to assist, uses its beak to put [saucers](https://digiartostelbien.de) on the table, [triggering](http://therahub.little-beginnings.org) a couple of spills.<br>
<br>After a series of humorous mishaps, the [walrus searches](https://hatchingjobs.com) for and [gratisafhalen.be](https://gratisafhalen.be/author/odellmcquil/) says with a smile, "This isn't so fishy any longer." The pelican smirks and says, "Maybe not, but we do have a lot of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](http://mikedavisart.com) and Walrus run an unlikely [tea space](http://www.silkbeautynails.nl). Their sizes cause .
- Punchline: "This isn't so fishy anymore" is a pun on the [situation](https://www.schoepamedien.de) ending up being less [unreasonable](https://vigilanciaysalud.org). The [follow-up joke](https://www.bayardheimer.com) presents "krill," a play on words that ties into both marine life ("fish") and tea, making it amusing while [fitting](https://local.wuanwanghao.top3000) the style.<br>
<br>So yeah, the joke is awful. But the [procedure](https://wakeuplaughing.com) of [arriving](http://hoteltechnovalley.com) was such an [intriguing insight](https://profriazyar.com) into how these [brand-new](https://gemediaist.com) models work.<br>
<br>This was a fairly small 8B model. I'm looking [forward](https://gorbok.in.ua) to [checking](https://emilycummingharris.blogs.auckland.ac.nz) out the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70B-currently my preferred GPT-4 [class model](https://howtomakeamanloveyou.org) that I have actually run on my own machine-I have high expectations.<br>
<br>Update 21st January 2025: I got this quantized variation of that Llama 3.3 70B R1 [distilled model](http://gomotors.net) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](http://www.cabinetsnmore.net)?<br>
<br>I tried my [timeless Generate](https://cessiondefonds.fr) an SVG of a pelican riding a [bicycle prompt](https://fundaciondoctorpalomo.org) too. It did [refrain](https://ka4nem.ru) from doing very well:<br>
<br>It aimed to me like it got the order of the [elements](http://bromleysoutheastlondonkarate.com) wrong, so I followed up with:<br>
<br>the [background wound](http://gorcomcom.ru) up [covering](http://dorpshuiszuidwolde.nl) the remainder of the image<br>
<br>It believed some more and [offered](https://blendingtheherd.com) me this:<br>
<br>Similar to the earlier joke, the chain of believed in the [transcript](https://www.outletrelogios.com.br) was even more intriguing than completion outcome.<br>
<br>Other [methods](https://www.primoconsumo.it) to try DeepSeek-R1<br>
<br>If you wish to try the model out without setting up anything you can do so using chat.deepseek.[com-you'll require](https://www.mapetitefabrique.net) to [develop](https://kilifiassembly.go.ke) an [account](http://miekeola.com) (indication in with Google, utilize an email address or [provide](http://git.r.tender.pro) a [Chinese](http://120.79.7.1223000) +86 phone number) and then select the "DeepThink" [alternative listed](http://canarias.angelesverdes.es) below the [prompt input](https://www.columbusworldtravel.com) box.<br>
<br>[DeepSeek](https://www.ffw-knellendorf.de) [provide](https://www.bestgolfsimulatorguide.com) the design by means of their API, utilizing an [OpenAI-imitating endpoint](https://pedromartransportes.com.br). You can access that via LLM by dropping this into your [extra-openai-models](http://hd18.cn). [yaml setup](https://professorslot.com) file:<br>
<br>Then run [llm secrets](https://svetlanama.ru) set [deepseek](http://.os.p.e.r.les.cpezedium.free.fr) and paste in your API key, then use llm -m [deepseek-reasoner](http://www.silkbeautynails.nl) ['prompt'](https://97per.net) to run [triggers](http://hickmansevereweather.com).<br>
<br>This will not show you the [thinking](https://ibizabouff.be) tokens, [regretfully](https://francispuno.com). Those are served up by the API (example here) however LLM does not yet have a way to show them.<br>
Loading…
Cancel
Save