Table of Links
- Abstract and Introduction
- SylloBio-NLI
- Empirical Evaluation
- Related Work
- Conclusions
- Limitations and References
A. Formalization of the SylloBio-NLI Resource Generation Process
B. Formalization of Tasks 1 and 2
C. Dictionary of gene and pathway membership
D. Domain-specific pipeline for creating NL instances and E Accessing LLMs
F. Experimental Details
G. Evaluation Metrics
H. Prompting LLMs – Zero-shot prompts
I. Prompting LLMs – Few-shot prompts
J. Results: Misaligned Instruction-Response
K. Results: Ambiguous Impact of Distractors on Reasoning
L. Results: Models Prioritize Contextual Knowledge Over Background Knowledge
M Supplementary Figures and N Supplementary Tables
D Domain-specific pipeline for creating NL instances
E Accessing LLMs
To access these LLMs, we use the Mistral AI (mistral), and the open-source weights of the remaining models, available at the HuggingFace Hub[5] repositories:
• mistralai/Mistral-7B-v0.1
• mistralai/Mistral-7B-Instruct-v0.2
• mistralai/Mixtral-8x7B-Instruct-v0.1
• google/gemma-7b
• google/gemma-7b-it
• meta-llama/Meta-Llama-3-8B
• meta-llama/Meta-Llama-3-8B-Instruct
• BioMistral/BioMistral-7B
The pretrained LLM weights are used through the transformers[6] python library. All models were loaded with standard configurations and their respective default tokenizers, using the AutoModelForCausalLM and AutoTokenizer classes. Additionally, the models were loaded with the options device_map=“auto”, torch_dtype=“auto” attn_implementation=“flash_attention_2″ and offload_buffers=True to make the best use of GPU resources available.
For the instruction models, the inputs were passed through each model’s chat template (with tokenizer.apply_chat_template), so that they would follow the appropriate prompt format. Responses were cleaned of special symbols for evaluation. Table 3 contains the relevant characteristics of all analyzed models.
Authors:
(1) Magdalena Wysocka, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom;
(2) Danilo S. Carvalho, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and Department of Computer Science, Univ. of Manchester, United Kingdom;
(3) Oskar Wysocki, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and ited Kingdom 3 I;
(4) Marco Valentino, Idiap Research Institute, Switzerland;
(5) André Freitas, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom, Department of Computer Science, Univ. of Manchester, United Kingdom and Idiap Research Institute, Switzerland.
[5] https://huggingface.co
[6] https://huggingface.co/docs/transformers/