Date: October 25, 2024
Meta has taken a significant step towards providing faster and smaller AI models to broader audiences with better mobile device compatibility.
Meta AI is making significant upgrades to its Llama models to offer 2.4X faster processing of tasks while reducing up to 56% in model sizes. This effort is derived from the rising hardware processing requirements and lack of adequate power supply that demand sleeker AI models with equivalent performance capabilities. High energy costs, lengthy training times, and expensive semiconductors necessary for computational power can now be met with the release of the latest Llama 3.2 AI model.
The new AI model is built on two distinct techniques: Quantization-Aware Training (QAT) with LoRA adapters, which prioritizes accuracy, and SpinQuant, a state-of-the-art post-training quantization method to enable portability. The release includes downloadable formats for both versions.
The shrink-down of model sizes backed by powerful output capabilities will dramatically enhance research and business optimization efforts through cutting-edge AI technologies without needing specialized and costlier infrastructure.
Llama 3.2 has surpassed quality and safety industry benchmarks while achieving a whopping 2-4X faster processing speeds. The new AI model also achieved an average 56% reduction in size and 41% lesser memory usage compared to its previous BF16 format.
This advancement also improves compatibility with mobile devices, offering better features for mobile users within their hardware specifications. This reduction is powered by a technique that precisely reduces the model’s weights and activations from 32-bit floating-point numbers to lower-bit representations.
Meta AI also utilizes 8-bit and 4-bit quantization strategies, which reduce memory consumption and computational power demands while ensuring the retention of critical features of Llama 3, like advanced Natural Language Processing, real-time application integrations, and visual inference tasks.
Meta AI has also partnered with industry-leading partners to make the sleeker AI model Llama 3.2 available on Qualcomm and MediaTek System on Chips (SoCs) with Arm CPUs. The partnership aims to empower advanced performance capabilities on consumer-grade hardware, tapping a broader audience network and popular platforms. Llama 3.2 underscores the importance of addressing scalability issues common for businesses and research organizations while maintaining a high level of performance. Early benchmarks indicate that Quantized Llama 3.2 performs approximately 95% of the full Llama 3 model at 60% less memory usage. This achievement will help establish higher credibility for AI chatbots in terms of reducing the environmental impact of training and deploying LLMs.
By Arpit Dubey
Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. With a knack for crafting compelling narratives, Arpit has a sharp specialization in everything: from Predictive Analytics to Game Development, along with artificial intelligence (AI), Cloud Computing, IoT, and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician's mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.
OpenAI Is Building an Audio-First AI Model And It Wants to Put It in Your Pocket
New real-time audio model targeted for Q1 2026 alongside consumer device ambitions.
Nvidia in Advanced Talks to Acquire Israel's AI21 Labs for Up to $3 Billion
Deal would mark chipmaker's fourth major Israeli acquisition and signal shifting dynamics in enterprise AI.
Nvidia Finalizes $5 Billion Stake in Intel after FTC approval
The deal marks a significant lifeline for Intel and signals a new era of collaboration between two of America's most powerful chipmakers.
Manus Changed How AI Agents Work. Now It's Coming to 3 Billion Meta Users
The social media giant's purchase of the Singapore-based firm marks its third-largest acquisition ever, as the race for AI dominance intensifies.