By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: 256K Tokens on One GPU? Jamba’s Engineering Magic Explained | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > 256K Tokens on One GPU? Jamba’s Engineering Magic Explained | HackerNoon
Computing

256K Tokens on One GPU? Jamba’s Engineering Magic Explained | HackerNoon

News Room
Last updated: 2025/04/11 at 12:27 AM
News Room Published 11 April 2025
Share
SHARE

Authors:

(1) Opher Lieber, with Equal contribution; (2) Barak Lenz, with Equal contribution; (3) Hofit Bata; (4) Gal Cohen; (5) Jhonathan Osin; (6) Itay Dalmedigos; (7) Erez Safahi; (8) Shaked Meirom; (9) Yonatan Belinkov; (10) Shai Shalev-Shwartz; (11) Omri Abend; (12) Raz Alon; (13) Tomer Asida; (14) Amir Bergman; (15) Roman Glozman; (16) Michael Gokhman; (17) Avashalom Manevich; (18) Nir Ratner; (19) Noam Rozen; (20) Erez Shwartz; (21) Mor Zusman; (22) Yoav Shoham.

Table of Links

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

5. Evaluation

In general we approach benchmarks cautiously, as they correlate only partially with what matters in real applications, and furthermore invite gaming the system in order to boast vanity numbers. Nevertheless, we present several indicative results.

5.1 Academic Benchmarks

We report results with a wide range of standard academic benchmarks:

Common sense reasoning: HellaSwag (10-shot) [47], WinoGrande (5-shot) [37], ARC-E (0-shot) and ARC-Challenge (25-shot) [9], and PIQA (zero-shot) [3].

Reading Comprehension: BoolQ (10-shots) [8] and QuAC (zero-shot) [5].

Others: GSM8K (3-shot CoT) [10], HumanEval (pass@1) [4], Natural Questions closed-book (NQ; 5-shot) [26], and TruthfulQA (zero-shot) [27].

Aggregate benchmarks: MMLU (5-shot) [20] and BBH (3-shot) [43].

Table 2: Comparison of Jamba with other publicly available models. Jamba obtains similar or better performance with much better throughput.Table 2: Comparison of Jamba with other publicly available models. Jamba obtains similar or better performance with much better throughput.

Table 2 compares Jamba to several publicly available models on common academic benchmarks for evaluating language models. We compare with Llama-2 13B [45], which has about the same number of active paramters as our model, Llama-2 70B, which is larger than our model, Gemma [44], which has 7B parameters, and Mixtral [23], which has about the same number of active and total parameters as our model.

Noticebly, Jamba performs comparably to the leading publicly available models of similar or larger size, including Llama-2 70B and Mixtral. At the same time, our model has a smaller number of total available parameters than Llama-2 (52B compared to 70B). Moreover, as a sparse model, Jamba has only 12B active parameters, similar to Mixtral’s 12.9B active parameters. However, as a fullyattentional model, Mixtral has a large memory footprint with long sequences, requiring 32GB for the KV cache with 256K tokens. In contrast, thanks to its hybrid Attention-Mamba architecture, Jamba’s KV cache takes only 4GB even at such a long context (Section 2). Importantly, our Jamba achieves such a strong performance while having much better throughput than Llama-2 70B and Mixtral, up to 3x improvement (Section 3.2).

In summary, Jamba demostrates the ability of hybrid architectures to reach the performance of state-of-the-art Transformer based models of the same size class, while having the benefits of an SSM.

5.2 Long-Context Evaluations

We have successfully trained Jamba models with context lengths of up to 1M tokens. The released model handles context lengths of up to 256K tokens. In this section, we evaluate it on synthetic and naturalistic benchmarks that test is long-context capabilities.

5.2.1 Needle-in-a-haystack

As Figure 4 shows, Jamba has excellent performance in the needle-in-a-haystack evaluation, which requires retrieving a simple statement planted in a long context window [24]. This result is noteworthy especially given that our implementation of Jamba uses only 4 attention layers.

5.2.2 Naturalistic long-context evaluation

We evaluate Jamba’s ability to handle long contexts using question-answering benchmarks, consisting of long inputs. To this end, we repurpose five of the longest-context datasets from L-Eval [2], by structuring them in a few-shot format (we use 3-shots in all experiments here). Specifically, we evaluated the models on the following datasets: NarrativeQA (QA on narratives; [25]), LongFQA (finance; [2]), Natural Questions (NQ; Wikipedia; [26]), CUAD (law; [21]), and SFiction (science fiction). The average input length in these datasets ranges from 6K to 62K tokens. These context lengths are further highly expanded by the few-shot format.

Figure 4: A needle-in-a-haystack evaluation showing Jamba’s ability to recall statements placed inthe middle of contexts of up to 256K tokens length.Figure 4: A needle-in-a-haystack evaluation showing Jamba’s ability to recall statements placed inthe middle of contexts of up to 256K tokens length.

Table 3 summarizes the evaluation results, in terms of F1. Jamba outperforms Mixtral on most of the datasets as well as on average. In addition, as these long-context tasks require substantial computation, here Jamba’s efficiency shines, with much better throughput with long contexts (Section 3.2).

Table 3: Results (F1) on long-context QA benchmarks, with a 3-shot format.Table 3: Results (F1) on long-context QA benchmarks, with a 3-shot format.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article My husband is always on his iPad. I miss him. How can I ask for screen-free time together? | Leading questions
Next Article Are Your Poops Regular? Experts Share What Your Bowel Movements Mean
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Elizabeth Holmes’ partners’ blood test start-up is very real and not a joke
News
Stellantis’ Chinese partner set to build first European factory in Italy · TechNode
Computing
11 iOS 18 features you’re missing if you haven’t updated your iPhone yet
News
Freshippo achieves four months of profitability after major restructuring: report · TechNode
Computing

You Might also Like

Computing

Stellantis’ Chinese partner set to build first European factory in Italy · TechNode

1 Min Read
Computing

Freshippo achieves four months of profitability after major restructuring: report · TechNode

3 Min Read
Computing

Huawei secures self-driving tech contract for BYD’s premium brand: report · TechNode

1 Min Read
Computing

Final trailer for Black Myth: Wukong reveals 72 Transformations and Four Heavenly Kings · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?