The Llama series of models from Meta

1751022147 Malaya Rout
Share the Reality


Meta’s most popular LLM series is Llama. Llama stands for Large Language Model Meta AI. They are open-source models.

Llama 3 was trained with fifteen trillion tokens. It has a context window size of 8000 tokens. There are 8- and 70-billion-parameter models. Both are pre-trained and instruction-tuned. The 8-billion-parameter model has a knowledge cut-off of March 2023, and the 70-billion-parameter model has a knowledge cut-off of December 2023. A model’s parameter count is the number of weights in its artificial neural network architecture. The knowledge cutoff (also called data cutoff) is the last date at which new data was included in the model’s training corpus before training was finalised. The context window is the number of tokens the model can consider in a single interaction, including the prompt, any prior turns (that are still in scope), and sometimes the model’s own earlier outputs. The models are unimodal, meaning they take text as input and produce text as output.

Llama 3.1 has 8-, 70-, and 405-billion-parameter models. They support English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. They are pre-trained and instruction-tuned. They are unimodal with text in and text out. They use an auto-regressive and optimised transformer architecture. They are tuned with SFT and RLHF with publicly available instruction datasets. SFT stands for Supervised Fine-Tuning, where a pre-trained model is further trained on labelled datasets of instructions paired with ideal responses to improve its ability to follow user directives accurately. RLHF stands for Reinforcement Learning from Human Feedback. This method refines a model by training a reward function on human preferences and then using reinforcement learning to optimise the model toward higher-reward behaviours aligned with human values.

Each of them has a context length of 128000 tokens and a knowledge cut-off date of December 2023. They had a release date of 23 July 2024. Note the seven-month gap between the cut-off date and the release date. The time is allocated to model training, fine-tuning, benchmarking, and release-related activities. Their usage is governed by the Llama 3.1 Community License (a custom commercial license). They were trained on Meta’s custom-built GPU clusters.

Llama 3.2 is Meta’s first open-source multimodal LLM. Its 11B and 90B parameter versions are vision models. They process text and images. They analyse and respond to questions about images, perform image captioning and visual reasoning tasks, and edit images based on text instructions. Its 1B and 3B parameter versions are text-only models. They were released in September 2024 and have a context length of 128000 tokens. The unimodal models support eight languages and are designed for edge and mobile devices. Llama 3.3, released in December 2024, is a 70B parameter model.

Llama 4, released on 5 April 2025, has three variants (Scout, Maverick, Behemoth). Llama 4 Scout has 17 billion active parameters with 16 experts (109 billion total parameters). It has a 10-million-token context window. Llama 4 Maverick has 17 billion active parameters with 128 experts (400 billion total parameters). Llama 4 Behemoth has 288 billion active parameters with 16 experts (nearly 2 trillion total parameters). All of them have multimodal capabilities. They are called a mixture of experts. In the context of LLMs, a Mixture of Experts (MoE) is a neural network architecture that uses conditional computation to decouple model capacity (total parameters) from computational cost (active parameters).

Another interesting release from Meta is Code Llama with 70B parameters. There are a foundational variant and a Python variant of Code Llama 70B. Neither of them follows natural language instructions. Only the Code Llama Instruct 70B is fine-tuned for natural language interpretation and generation. They are free for research and commercial use and are based on Llama 2. There are also 7B, 13B, and 34B models of Code Llama. All of them have a context window of 100000 tokens. They perform well on coding benchmarks. HumanEval assesses the model’s ability to complete code from docstrings. The Mostly Basic Python Programming (MBPP) tests the model’s ability to write code based on a description.

Tiny Llama from Meta is trained on 3 trillion tokens and has 1.1 B parameters. It is called a Small Language Model (SLM). It is open-source. LLaMA-Omni, developed by the Chinese Academy of Sciences, is based on Meta’s open-source Llama 3.1 8B Instruct model. It supports real-time speech interactions with large language models.

Happy Llama-ing and have a great new year!



Linkedin


Disclaimer

Views expressed above are the author’s own.



END OF ARTICLE





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *