Everything we announced at our first LlamaCon

ai.meta.com

212 points by meetpateltech 3 days ago

blitzar 3 days ago

Its impressive that Llama and the Ai teams in general survived the meta-verse push at Facebook. Congrats to the team for keeping their heads down and saving the company from itself.

Its all Ai all the time now though, not seen any mention of our reimagined future of floating heads hanging out together in quite some time.

divan 3 days ago

I'm working in Quest 3 almost every day. I use Immersed, as it implements virtual displays for my MacBook better than others, but I'm impressed with the Meta ecosystem. Granted, social interaction is still awkward without proper face expressions, but it feels closer each year to the depicted vision.
I recently travelled and needed to work (coding and video editing in DaVinci) a lot in hotels and random places. I can't bring large screens everywhere (and I hate to work with small fonts and screens), and Quest 3 was a perfect fit here. Sometimes at home or office (I have a private one), I just don't want to sit on my buttoks all the time, so I put on VR goggles and can keep working in any position (lying on a sofa or even sunbathing outdoors).
As soon as new XR/MR glasses become lighter (there are some good ones already - Visor, Beyond BigScreen 2, etc), more and more people will discover how usable and optimized for work this tech is.
- mrcwinn 3 days ago
  
  In no way is Quest 3 better than Apple Vision Pro for desktop work. I’ve used both. It’s not close.
  (There are other critiques of AVP that might not make it the right choice, but desktop work experience isn’t one of them.)
  
  jitl 3 days ago
  
  Wearing the AVP 3 days in a row for 30 minutes each day leaves me with neck pain that lasts another 3 days. It’s just too heavy for me and for a lot of others, even if the visual quality is very high the overall experience is very poor if it leaves me with a medical issue. I’m still impressed with some aspects and look forward to a lighter weight HMD from Apple but right now… can’t do it.
  
  vlovich123 2 days ago
  
  Research has repeatedly shown it’s less about the weight and more about it not being balanced properly. Yet they continue to insist on imbalanced VR systems so that it seems lighter and has less weight on the packaging (because otherwise you’d have to counterbalance the heavy display + battery).
  
  SOLAR_FIELDS 2 days ago
  
  Wait so if it’s about the balance and you need to add weight to get it to balance properly isn’t it also about the weight?
  
  AppleBananaPie 2 days ago
  
  Yes but by your definition then it's weight without balance and weight with balance.
  The first one can't be fixed without removing weight.
  The second one can be fixed by adding more weight with the total weight not being the cause of the problem but rather the distribution of the total weight.
  
  vlovich123 2 days ago
  
  It’s about the imbalanced weight which is meaningfully different from total weight as you have 3axis involved. Reducing the weight won’t solve the pain problem from prolonged use but you frequently hear this complaint from customers that the headset is too heavy when in reality that’s not actually it. If you counterbalance it helps but most such techniques still only counterbalance front/back and not side to side. There are even extended battery packs that help with this which seems like a good albeit expensive compromise but I haven’t tried them. But still that’s only back/front and not side to side.
  
  jitl 2 days ago
  
  Well the AVP has a massive battery that contributes to the weight and adding it w/ a strap to the back to add balance did not help me. I tried back-mounted battery with a 3rd party head gear thingy which alone (no battery) makes the AVP more comfortable; with the battery strapped on the back it is more balanced and somewhat comfortable but just far too heavy for me.
  
  vlovich123 2 days ago
  
  I’m not saying total weight doesn’t matter at all. I don’t know the weight of the AVP setup. And as you not properly balancing helped. Also remember that I note that you’re laterally imbalanced even when you fix the front/back issue.
  But it’s obvious that total weight does matter (eg a 1 ton weight would crush your spine) but below some threshold it’s more about weight distribution and balance than total weight.
  
  drusepth 3 days ago
  
  I've only briefly used an AVP (used a friend's before he returned it), but I didn't really notice that much difference in screen quality between it and my Quest 3. That's really what I was hoping it'd excel at given the price, but the real killer for me was just the bulkiness/weight of it compared to other headsets. I can't see myself working in it for longer than maybe 30-45 min compared to the 6+ hours I've put in with virtual desktops in Quest 3/Vive.
  
  divan 2 days ago
  
  I haven't tried AVP because of the price. But surely would love to.
  
  danpalmer 3 days ago
  
  Perhaps on macOS, but I'd be surprised if the AVP has the same experience for Linux/Windows/ChromeOS?
- bigyabai 3 days ago
  
  I'm quite a big fan of my Quest 1 as a cheap flight sim headset, too. I don't end up using it more than maybe twice a week, but that's more than worth-it for the $400 I paid 5 years ago. It installs (or "sideloads" in present vernacular) Android apps like any other device, browses the web, and streams wireless VR from my desktop via ALVR when I want to play games. It does a lot of stuff you wouldn't expect out of a "depreciated" piece of hardware.
  The trepidation behind VR for professional applications makes sense to me - it's expensive and tough to compare with what it's replacing. As a pure vehicle for fun though, I genuinely have no regrets with my Quest hardware. It was easily a better purchase than my Xbox One.
- gavinray 2 days ago
  
  How do you work laying down?
  Do you put a keyboard on your stomach or something?
  
  divan 2 days ago
  
  Yes, on laps/stomach. In VR mode, almost all these work-oriented apps have a concept of "desk view" or "portal" - i.e. window into "reality" to show keyboard. Meta's Horizon OS detects and shows physical keyboards automatically I believe. But a lot of time, I just use Mixed Reality mode - so only my screens floating in the corners of the room are rendered on top of "reality". If you forget that you're in VR goggles, it's the same as just lying down and working on huge screens on your ceilings or walls.
- mwambua 3 days ago
  
  How do you power it for extended sessions? Is it constantly connected to power - using wifi/airlink between it and your Mac? Or do you use the link cable?
  
  Loughla 2 days ago
  
  I use this: https://www.bobovr.com/products/s3-pro?variant=4793078051664...
  It looks goofy but gives you a massive battery, and the fan is nice as well.
  
  divan 2 days ago
  
  Yes, it's tethered to my laptop with a USB cable.
- tiahura 3 days ago
  
  Have you thought about donating your brain to science for an examination as to why you are not racked migraines and nausea the way most of us are when we use these things more than an hour or so?
testfrequency 3 days ago

It’s coming: https://www.uploadvr.com/meta-employees-reportedly-working-w...

mohsen1 3 days ago

I am guessing because of Qwen 3 release they pulled back the reasoning model that was likely due to launch today.

htrp 3 days ago

alibaba stole Facebook's lunch money

bentt 2 days ago

This company is so untrustworthy, I don't see how anyone could dedicate any time to working with their platform or technology. Any acts of benevolence that they are putting forth now are sure to be followed by the most underhanded rug pulls you can imagine. My guess is that they'll let the model out for free, but they'll want to own the "memory" that defines people that use it. Who knows, but they're definitely thinking about the future beyond just playing defense and using FB Ad money to artificially compete in this space by giving things away.

throwaw12 3 days ago

Feels like Meta is going to Cloud services business but in AI domain. They resisted entering cloud business for so long, with the success of AWS/Azure/GCP I think they are realizing they can't keep at the top only with social networks without owning a platform (hardware, cloud)

paxys 3 days ago

In this case the market basically validated itself. Companies are already using Llama for production workloads. It is offered as a first class LLM option in AWS, Azure, GCP and all other major hosting providers. Meta may have been getting marginal licensing fees out of it but now wants a bigger piece of the pie.
throwaw12 3 days ago

SAM 3 (Segment Anything Model) is coming this summer
- daemonologist 3 days ago
  
  SAM's a really cool model, that's something to look forward to. I didn't see that in the LlamaCon notes, is that something they've announced elsewhere or just a rumor atm?
  
  ipsum2 3 days ago
  
  It was mentioned briefly. https://ai.meta.com/sam3
Keyframe 3 days ago

If Lidl can venture into cloud business, I guess so can Meta.
- llmguy 3 days ago
  
  Don't forget the earths only bookstore either
swatcoder 3 days ago

They seem to see tge writing on the wall and have been panicked for a while, yes.
Gobbling up rising brands kept their finances going for a while, but the grand Metaverse pivot was clearly their (much struggling) attempt to invent their own titanic platform akin to Android or iPhone.
With that not gaining as much traction as they wanted as quickly as they wanted, they're still on the hunt, as here.
- retinaros 3 days ago
  
  The metaverse is a great idea but they should have partnered with Epic for this or Valve. The implementation was subpar
  
  aprilthird2021 3 days ago
  
  I think they didn't want a gaming platform, that just ended up being what was most appealing about it
  
  retinaros 2 days ago
  
  If you simulate a world online. Its a gaming platform. Just think how llm training would be if metaverse was live. The edge meta would have.
eddythompson80 2 days ago

I don’t know if an API service means they are going up against AWS next. I could totally see it though, and would make sense for them as a company. It makes more sense for them than Amazon of all places.

nailer 2 days ago

This is odd - it keeps mentioning Open Source but the Llama license isn’t open source at all - see https://www.llama.com/llama3/license/. The “additional commercial terms” section violates the open source definition as does the advertising clause.

I would’ve hoped to have seen Meta, in their supposed dedication to open source, actually fix it.

nailer 2 days ago

https://github.com/meta-llama/llama3/issues/156 if anyone else feels like commenting

logicchains 3 days ago

No new model? Maybe after the Qwen 3 release today they decided to hold back on Llama 4 Thinking until it benchmarks more competitively.

smcnally 3 days ago

Beyond solid benchmarks, Alibaba's power move was dropping a bunch of models available to use and run locally today. That's disruptive already and the slew of fine tunes to come will be good for all users and builders.
https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2...
- NitpickLawyer 2 days ago
  
  > Beyond solid benchmarks, Alibaba's power move was dropping a bunch of models available to use and run locally today.
  I agree, the advantage of qwen3's family is a plethora of sizes and architectures to chose from. Another one is ease of fine-tuning for downstream tasks.
  On the other hand, I'd say it's "in spite" of their benchmarks, because there's obviously something wrong with either the published results, or the way they measure them, or something. Early impressions do not support those benchmarks at all. At one point they even had a 4b model be better than their prev gen 72b model, which was pretty solid on its own. Take benchmarks with a huge boulder of salt.
  Something is messing with recent benchmarks, and I don't know exactly what but I have a feeling that distilling + RL + something in their pipelines is making benchmark data creep into the models, either by reward hacking, or other signals getting leaked (i.e. prev gen models optimised for one benchmark are "distilling" those signals into newer smaller models. No, a 4b model is absoulutely not gonna be better than 4o/sonnet3.7, whatever the benchmarks say).
- walterbell 3 days ago
  
  What's the minimum GPU/NPU hardware and memory to run Qwen3 locally?
  
  Havoc 3 days ago
  
  There is a 0.6B model so basically nothing.
  And the MoE 30B one has a decent shot at running OK without GPU. I'm on a 5800x3d so two generations old and its still very usable
  
  laweijfmvo 3 days ago
  
  I'm running 4B on my 8GB AMD 7600 via ollama
  
  smcnally 3 days ago
  
  `model.safetensors` for Qwen3-0.6B is a single 1.5GB file.
  Qwen3-235B-A22B has 118 `.safetensors` files at 4GB each.
  There are a bunch of models and quants between those.
  
  azinman2 3 days ago
  
  Does it run in 8x80G? Or does the KV cache and other buffers push it over the edge?
  
  dpe82 3 days ago
  
  Qwen3 is a family of models, the very smallest are only a few GB and will run comfortably on virtually any computer of the last 10 years or recent-ish smart phone. The largest - well, depends how fast you want it to run.
  
  littlestymaar 3 days ago
  
  There are models down to 0.6B and you can even run Qwen3 30B-A3B reasonably fast on CPU only.
paxys 3 days ago

They released the Llama 4 suite three weeks ago.

ahmedfromtunis 3 days ago

Does anyone use llama as their primary model for any usecase? Maybe it's my fault for not spending much time with it, but I still couldn't find the applications for which llama has an advantage over the competition.

philipkglass 3 days ago

I recently needed to classify thousands of documents according to some custom criteria. I wanted to use LLM classification from these thousands of documents to train a faster, smaller BERT (well, ModernBERT) classifier to use across millions of documents.
For my task, Llama 3.3 was still the best local model I could run. I tried newer ones (Phi4, Gemma3, Mistral Small) but they produced much worse results. Some larger local models are probably better if you have the hardware for them, but I only have a single 4090 GPU and 128 GB of system RAM.
- galeos 3 days ago
  
  How did you find ModernBERT performance Vs prior BERT models?
  
  philipkglass 3 days ago
  
  I didn't try original BERT at all because I didn't get good results from any LLMs on small document excerpts, so I assumed that a substantial context was necessary for good results. Traditional BERT only accepts up to 512 tokens, while ModernBERT goes up to 8192. I ended up using a 2048 token limit.
  
  stavros 3 days ago
  
  Would you happen to know of any resources for how to distill a ModernBERT model out of a larger one? I'm interested in doing exactly what you did, but I don't know how to start.
  
  philipkglass 3 days ago
  
  I was trying to identify "evergreen" and "time-sensitive" kinds of writing -- basically, I wanted to figure out if web pages captured in 2016 would still have content that's interesting to read today or if the passage of time would have rendered them irrelevant.
  Here's the training code that I used to fine-tune ModernBERT from the ~5000 pages I had labeled with Llama 3.3. It should be a good starting point if you have your own fine-tuning task like this. If you can get away with a smaller context than I used here, it will be much faster and the batches can be larger (requires experimentation).
  https://pastebin.com/Saq1EyAB
  
  stavros 2 days ago
  
  Thank you!
marcodena 3 days ago

Here https://research.atspotify.com/2024/12/contextualized-recomm... ;)
mvieira38 3 days ago

It's pretty popular in the local LLM space
- nowittyusername 3 days ago
  
  Is it? I am pretty active in those spaces and most folks it seems are using any number of the Chinese models like qwen, qwq, etc...
  
  NitpickLawyer 2 days ago
  
  If by those spaces you mean reddit, then yeah I've also noticed this trend. It has became more egregious with the duality of L4 vs Qwen3 reception. L4 was blamed, mocked and everyone was posting shit about it (well, some of it was relevant since the launch was rushed and many providers had bad implementations for ~2-3 days) in stark contrast to qwen3 which also had inferencing problems (related to 3rd party tools using wrong settings) and by all accounts has overinflated their benchmark scores.
  Don't take the "activity" of those places as gospel, try the models on your own stacks, with your own benchmarks for best results.
  
  nowittyusername 2 days ago
  
  Nah not just reddit but also like 20 AI related discord servers. But a couple of things, first I do agree that no one should take anyone's opinion as gospel. I live by that and test all of my models. Second, I wasn't expressing my opinion of what model is good or not good just reflecting the broader trends I see within the communities (for better or worse). On llama 4 botched launch... On this particular matter I see this happen almost every launch. A model comes out, shit hyperparameters are used or the model organization does a poor job of communicating towards the community on best practices for how to run the model, community shits on model, rinse repeat EVERY god damned release cycle. So yeah I am quite aware of this particular phenomenon.
  
  logicchains 2 days ago
  
  L4 launched without a thinking model, making it an inferior choice for coding, one of the main LLM use cases. Even in benchmarks it wasn't even competitive at coding with 3-month-old Deepseek R1.
  
  NitpickLawyer 2 days ago
  
  Agreed, coding is not a strong point of L4. However the "hive mind" in some places thinks L4 is "a failure" and "useless". In reality it is an "ok" model, and most 3rd party benchmarks done after the inference lib updates were in line with whatever meta announced.
- behnamoh 3 days ago
  
  Nah, most people have moved on to Gemma, Qwen, Mistral Small/Nemo variants.
- littlestymaar 3 days ago
  
  It used to, but Llama 4 is useless for local LLM for most people.
aprilthird2021 3 days ago

The biggest advantage is that it's free and available on your phone already.
But if you have an Android phone, Gemini on that phone is far superior. And if you have Apple, well maybe that's all the people who use it

wewewedxfgdf 3 days ago

Can someone explain to me please why Meta doesn't create subject specific versions of their LLMs such as one that knows only about computer programming, computers, hardware software.

I would have imagined such a thing would be smaller and thus run on smaller configurations.

But since I am only a layman maybe someone can tell me why this isn't the case?

cube2222 3 days ago

Generally, all that non-tech content still helps the model “to learn”.
Also, the software you’re working on will generally in some way have a real-world domain - without knowing it the AI all likely be a less effective assistant. Design conversations with it would likely be pretty non-fun, too.
Finally, the “bitter lesson” article[0] from a couple years ago is I think somewhat applicable too.
[0]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
ntonozzi 3 days ago

One of the weirdest and most interesting parts of LLMs is that they grow more effective the more languages and disciplines they are trained in. It turns out training LLMs on code instead of just prose boosted their intelligence and reasoning capabilities by huge amounts.
- gardenhedge 3 days ago
  
  Source? Sounds interesting
  
  CelticBard a day ago
  
  There are a lot of papers that stumble upon this while working with LLMs and its mentioned off hand in the conclusions. If you want a paper that explicitly tested this here you go - https://aclanthology.org/2022.coling-1.437.pdf
bbatha 3 days ago

To add on to the sibling. Specialized models, including fine tuned ones, continually have their lunch eaten by general models within 3-6 months. This time round is mixture of experts that’ll do it, next year it’ll be something else. Tuned models are expensive to produce and are benchmark kings but less do less well in the real world qualitative experience. The juice just ain’t worth the squeeze most of the time.
Meta does have some specialized models though, llamaguard was released for llama 2 and 3.
- littlestymaar 3 days ago
  
  > Tuned models are expensive to produce
  The expensive part is building the dataset, training itself isn't too expensive (you can even fine-tune small models on free colab instances!), and when you have your dataset, you can just fine tune the next generalist model as soon as it's released and you're good to go ago.
KTibow 3 days ago

Other companies have done this (see Qwen Coder). It doesn't scale past a few disciplines like math and code though, and using mixtures of experts give you most of the same benefits.
- littlestymaar 3 days ago
  
  Unlike what their names imply, MoE don't have a concept of domain expert at all.
  It's just a fancy name for sparse evaluation of the total network to save compute and memory bandwidth.

hedayet 3 days ago

Facebook did a great job open sourcing Llama and pushing the market to being competitive, but this list seems super shallow.

0. Introducing Llama API in preview

This one is good but not centre stage worthy. Other [closed] models have been offering this for a long time.

1. Fast inference with Llama API

How fast? and how must faster than others? This section talks about latency and there's absolutely no numbers in this section!

2. New Llama Stack integrations

Speculations with 0 new integration. Llama Stack with NVIDIA had already been announced and then this section ends with '...others on new integrations that will be announced soon. Alongside our partners, we envision Llama Stack as the industry standard for enterprises looking to seamlessly deploy production-grade turnkey AI solutions.'

3. New Llama Protections and security for the open source community

This one is not only the best on this page, but is actually good with announcement of - Llama Guard 4, LlamaFirewall, and Llama Prompt Guard 2

4. Meet the Llama Impact Grant recipients

Sorry but neither the gross amount $1.5 million USD, nor the average $150K/recipients is anything significant at Facebook scale.

amusingimpala75 3 days ago

Meta needs to stop open-washing their product. It simply is not open-source. The license for their precompiled binary blob (ie model) should not be considered open-source, and the source code (ie training process / data) isn’t available.

michaelt 3 days ago

> the source code (ie training process / data) isn’t available
The training data is all scraped from the internet, ebooks from libgen, papers from Sci-Hub, and suchlike.
They don't have the right to redistribute it.
observationist 3 days ago

They've painted themselves into a corner - the second people see the announcement that they've enforced the license on someone, people will switch to actual open source licensed models and Meta's reputation will take a hit.
It's ironic that China is acting as a better good faith participant in open source than Meta. I'm sure their stakeholders don't really care right now, but Meta should switch to Apache or MIT. The longer they wait the more invested people will be and the more intense the outrage when things go wrong.
- piperswe 3 days ago
  
  Applying Apache or MIT to a binary blob doesn't make it open source either
  
  littlestymaar 2 days ago
  
  As if binary blobs were subject to copyright laws in the first place.
  The whole “licensing” stuff on language model is a scam, or more precisely, an attempt to create a new kind of IP laws from thin air.
  
  charcircuit 2 days ago
  
  Are you implying movies (binary blobs) are not subject to copyright laws?
  
  littlestymaar 2 days ago
  
  The blob itself isn't, exactly: you cannot just reencode a movie and claim copyright protection over the resulting blob.
  What's protected is the content of the movie, and it's protected because it derives from human creativity.
  > The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.”
  > […]
  > Similarly, the Office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.
  source: https://www.copyright.gov/comp3/chap300/ch300-copyrightable-...
  
  charcircuit 2 days ago
  
  >you cannot just reencode a movie and claim copyright protection over the resulting blob.
  Because that would be a derivative work.
  >the content of the movie
  Which exists as a binary blob. Copying that binary blob requires a license to do so.
  
  littlestymaar a day ago
  
  > Because that would be a derivative work
  No, derivative work require human creativity themselves. Compiling or re-encoding still doesn't count.
  See : https//www.law.cornell.edu/uscode/text/17/101
  A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a "derivative work".
  > Which exists as a binary blob.
  Nope, for copyright protection it must exist at least as one binary blob, but having multiple binary blobs (with different resolutions) doesn't make it a different copyright piece. It's the underlying creation that is protected, not a particular instance of it. Star Wars, the Empire Strikes Back is what's registered at the Copyright Office, not Star_Wars_The_Empire_Strikes_Back.720p.avi.
  > Copying that binary blob requires a license to do so.
  Fortunately no, otherwise your internet provider would need a license from the copyright holders to copy the blob from Netflix server to your machine.
  One last time: copyright isn't about the blob, it's about the creation stored on it. The process of creating the blob doesn't grant you any copyright protection of you don't own the underlying material.
  
  charcircuit a day ago
  
  >No, derivative work require human creativity themselves.
  Then it would just be a copy then. Copies need a license.
  >Fortunately no, otherwise your internet provider would need a license from the copyright holders to copy the blob from Netflix server to your machine.
  No, I believe this is because internet providers do not save the content which means that a copy is not considered to be made. If copies were allowed of binary blobs people could legally make pirate sites sharing copies like that.
  
  littlestymaar 21 hours ago
  
  > No, I believe this is because internet providers do not save the content which means that a copy is not considered to be made.
  Nope, that's not the reason, and that's why you don't need to give a copyright license to Apple before storing your personal pictures to iCloud either, nor does Apple need a license to store copyrighted material you got a license for (like software or paid downloaded movies). Copying a blob isn't a license infringement in itself, because the blob itself was never protected by copyright.
  > If copies were allowed of binary blobs people could legally make pirate sites sharing copies like that.
  No, because sharing is what you'd get prosecuted for.
  Part of me thinks you should really try to start learning the basis of stuff before arguing on the internet about it, but who am I to judge your life choices. I did my best to help you learn something, but if you refuse to there's nothing more I can do.
  
  charcircuit 20 hours ago
  
  >that's why you don't need to give a copyright license to Apple before storing your personal pictures to iCloud either
  You do which is why it's a part of the terms of service for icloud.
  https://www.apple.com/legal/internet-services/icloud/en/terc....
  >No, because sharing is what you'd get prosecuted for.
  Copyright controls both reproduction and distribution.
  >start learning the basis of stuff before arguing on the internet about it
  You are being unnecessarily smug and condescending.
  
  littlestymaar 20 hours ago
  
  > You do which is why it's a part of the terms of service for icloud.
  404
  > Copyright controls both reproduction and distribution.
  Reproduction in the copyright sense isn't about blob copying. RAID 1 isn't a copyright infringement either… And neither is a Windows defragmentation (which is just the OS copying files around).
  > You are being unnecessarily smug and condescending.
  You are needlessly obstinate on a topic you don't understand.
NitpickLawyer 2 days ago

> their precompiled binary blob (ie model)
I agree with you that their license is not open source, but model weights are not binary blobs! Please stop spreading this misconception.
bbayer 3 days ago

This is actually my first impression while I am reading the post. Mentions "open source" everywhere but dude how the earth it is open source without training data.
- ronsor 3 days ago
  
  Almost no company is going to release training data because they don't want to waste time with lawsuits. That's why it doesn't happen. Until governments fix that issue, I don't even think the "it's not really open without training data!!!" argument is worth any time. It's more worth focusing on the various restrictions in the LLaMA license, or even better, questioning whether model weights can be licensed at all.
aprilthird2021 3 days ago

I get the argument completely, but isn't the open-washing a little acceptable if they're the only big company releasing open-weights models?
- CaptainFever a day ago
  
  My issue with Meta's open-washing is that it is also not open-weight, given the license restrictions. It's "weight-available", I suppose. Try OLMo instead.
- lern_too_spel 2 days ago
  
  What is the point of considering this hypothetical? Google, Microsoft, Nvidia, Apple, and many Chinese big-tech companies also release open weights models, most with fewer restrictions.

andhuman 2 days ago

I had higher expectations of this. I was hoping they would release an Omni type model that could handle voice input and output voice. Oh well.

scosman 3 days ago

Anyone manage to sign up for the waitlist? I just get a redirect loop back to the login when requesting access.

zoobab 2 days ago

"Open source" greenwashing. Still no training data in sight.

Havoc 3 days ago

Unlucky timing for meta...

littlestymaar 3 days ago

It's not about luck, pretty sure that Qwen intentionally bullied them.

yapyap 3 days ago

Was there a ball pit

oofbaroomf 3 days ago

Yeah, it was for the Llama team because they love playing in ball pits instead of releasing good models.

retinaros 3 days ago

did I read well that they have a gated 3.3 8b?

ilrwbwrkhv 3 days ago

Lmao why are they doing LlamaCon, a convention with a subpar product?

kepler1 3 days ago

This actually a legit question under the surface.
The problem, in my opinion, is that MZ/CC/AA-D, are feeling that they have to be releasing models of some flavor every month to stay competitive.
And when you have the rest of the company planning to throw you a on-stage party to announce whatever next model, and the venue and guests are paid for, you're gonna have the show whether the content is good or not.
Llama program right now is "we must go faster." But without a clear product direction or niche that they're trying to build towards. Very little is said no to. Just be the best at everything. And they started from behind, how can you think you're gonna catch up to 1-2 year head start, just with more people? The line they want to believe is "the best LLM, not just the best OSS LLM".
Because of the constant pressure to release something every month (nearly, but not a huge exaggeration), and the product direction coming from MZ himself, the team is not really great at anything. There is a huge apparatus of people working on it, yet half of it or more, I believe, is baggage required because of what Meta is.
I guess we'll see how long this can be maintained.

jsfunfun 2 days ago

[flagged]

oulipo 3 days ago

Meh

mgdev 3 days ago

There is a potential world where Meta uses AI as a vector to tap into the home.

Like, literally building smart homes.

Locally intelligent in ways that enable truly magical smart home experiences while preserving privacy and building trust.

But connected in ways that facilitate pseudo-social interactions, entertainment, and commerce.

Meta's biggest competitors are Apple and Amazon. This is the first clear opportunity they've had to leapfrog both.

OtherShrezzing 3 days ago

>There is a potential world where Meta [is]... literally building smart homes... while preserving privacy and building trust
I'm earnestly not sure what Meta are less qualified for. Building physical homes or building privacy & trust.
- mgdev 3 days ago
  
  Visit SE Asia sometime and you'll experience a very different sentiment. Hundreds of millions of people rely on Meta to provide valuable services every day, some of them borderline essential. This is undebatable.
  The outsized public hatred toward Meta is almost entirely driven by a bureaucratic, anti-technology Europe (that has finally realized that their overstepping is hurting their future) and a US political institution that needed someone to demonize to keep us all distracted.
  There are very good reasons to dislike Meta and Meta products. But they're likely not the ones you're referring to.
  
  Nevermark 3 days ago
  
  Their business model ties profitability directly to maximal surveillance and psychological manipulation, as the basis for inducing addiction, manufactured demand, and impulse spending. With only theatrical attempts at hiding the lack of inhibitions or safeguards about harnessing material damaging to children, teens, adults and society at large.
  That is the economic structure of their business model.
  Now juice that model with $ billions of revenue and $ trillions in potential market cap for shareholders, who demand double digit percentage growth per year.
  That defines the scale of available resources to drive the business model forward.
  This is a machine designed to scale up and maximally leverage seemingly small conflicts of interest into a global monster that feeds on mental and social decay.
  ——
  Of course, it benefits Facebook and customers to mix in as much genuine side products and services with real value as possible.
  But that only wedges the destructive core into individual lives and society even more.
  Now add AI algorithms to their core competencies of surveillance integration and psychological manipulation, and to the side value honey features.
  We are getting Stockholm’ed and stewed in a lot of high walled slow cookers these days.
  
  pzo 3 days ago
  
  What kind of services? IM app? Thailand uses LINE, Vietnam: Zalo, Cambodia and Myanmar Telegram and Viber, in Indonesia many uses few IM at the same time.
  
  n_ary 3 days ago
  
  It is not the IM apps, SE region which I suspect the author is referencing predominately uses Whatsapp.
  The value in SE is mostly B2C, instead of marketplace feature, most local tiny(or even big ones willing to evade tax by not having any physical presence) businesses will open a small business or general page and publish their wares as posts. Lives will be used to demo products or services now and then. People follow these pages and flock over to buy things.
  In a sense, Facebook and Whatsapp are like Amazon/Aliexpress of SE Asia. I was there for 5 months visiting a friend(and recovering from burnout), and number of people using such pages to sell anything from basic clothing to food to services are HUGE! It is literally a huge business hub for people to discover and make online purchases. In summary, Facebook pages are the e-commerce front(due to lack of shopify/amazon and similar operators who can handle logistics and payments) for individual businesses.
  There were many journalist reports about this phenomenon several years back, but I am too sleepy and tired to link those.
  
  stcroixx 3 days ago
  
  Dumb fucks is what the founder of the company has thought about it's users since day 1. They've been caught lying to cover up terrible things they've done so many times it's just assumed at this point. Anyone relying on their services is being taken advantage of first by Meta, and then by their own failed economy that won't provide an alternative. I've never once considered what Europe thinks.
com2kid 3 days ago

I wrote a blog post about exactly this - https://meanderingthoughts.hashnode.dev/lets-do-some-actual-...
Offline first locally hosted AI household assistants.

kennethologist 2 days ago

Meta announced several key updates to enhance building with Llama and strengthen the open-source ecosystem:

1. Llama API Preview: Launched a limited preview of the Llama API, a developer platform simplifying Llama application development with easy API key creation, playgrounds, SDKs, and tools for fine-tuning and evaluation. It emphasizes model portability and privacy.

2. Fast Inference Collaborations: Announced collaborations with Cerebras and Groq to offer developers access to faster Llama model inference speeds via the Llama API.

3. Expanded Llama Stack Integrations: Revealed new and expanded Llama Stack integrations with partners like NVIDIA, IBM, Red Hat, and Dell Technologies to make deploying Llama applications easier for enterprises.

4. New Llama Protection Tools & Program: Released new open-source security tools including Llama Guard 4, LlamaFirewall, and Llama Prompt Guard 2, updated CyberSecEval 4, and announced the Llama Defenders Program for partners to help evaluate system security.

5. Llama Impact Grant Recipients: Announced the 10 international recipients of the second Llama Impact Grants, awarding over $1.5 million USD to support projects using Llama for transformative change.

Overall, the announcements emphasize making Llama more accessible, easier to build with, faster, more secure, and supporting its diverse open-source community.

WhitneyLand 2 days ago

If you’re going to post AI slop comments, one way to make them more appreciated is to make a human editing pass at it.
For example, the entire first sentence could be collapse to “Announcements”. Some sentences are just pablum entirely.
It’s helpful to post summaries here, but they need to be curated.
- icapybara 2 days ago
  
  Eh, it's fine. Saves some people from pasting it into ChatGPT themselves.
  
  WhitneyLand 2 days ago
  
  Not sure how we disagree, I’m saying save people even more time by cleaning it up.