Our Analysis On Think-and-Execute And Pseudocode | HackerNoon

Table of Links

Abstract and 1. Introduction

2 Think-and-Execute

3 Experimental Setup

4 Results

5 Analysis

6 Related Work

7 Limitations and Discussion

8 Conclusion and References

A Experimental Details

B Details of Think-and-Execute

C Prompts Used in Our Experiments

D Human-written Pseudocode Prompts

E Generated Analyses

F Generated Pseudocode Prompts

G Qualitative Analysis

5 Analysis

We conduct experiments to address the following research questions:

• RQ1: Is task-level pseudocode more helpful than instance-specific pseudocode?

• RQ2: Does pre-training on code corpora improve reasoning?

• RQ3: How is the quality of the logic discovered by THINK-AND-EXECUTE compared to human-written logic?

5.1 Implementing the Underlying Logic is more Effective than Instance-specific Logic in Pseudocode (RQ1)

We conduct an analysis to check if the improvement of THINK-AND-EXECUTE is contributed by our chosen format for the task-level instruction, i.e., pseudocode. We compare THINKAND-EXECUTE with a concurrent work, Chain-of-Code (CoC) (Li et al., 2023). In Table 3, THINK-AND-EXECUTE outperforms CoC, showing about 2x improvement in the average score. The main difference between THINK-AND-EXECUTE and CoC is that we use pseudocodes which are generated to express logic shared among the tasks instances, while CoC incorporates pseudocode as part of the intermediate reasoning steps towards the solution of a given instance. Hence, the results indicate the advantages of applying pseudocode for the generation of task-level instruction over solely using them as a part of rationales.

Figure 4: Analysis on the effect of code pre-training on the reasoning capability in applying THINK-AND-EXECUTE. Without pre-training on code corpora the accuracies drop notably.

Table 4: Comparison between THINK-AND-EXECUTE and Human-written P.

5.2 THINK-AND-EXECUTE Requires Knowledge in Code (RQ2)

To understand whether SLMs acquire the ability to understand the task-level logic written in pseudocode during pre-training on code corpora, we compare the performance of CodeLlama-13B with Llama-13B using THINK-AND-EXECUTE. In Figure 4, CodeLlama-13B shows better reasoning capabilities compared to Llama-13B in all tasks. These results suggest that the improvement from using THINK-AND-EXECUTE could depend on the knowledge of code, which is usually obtained by pre-training with code corpora. Writing code usually involves understanding the logic behind the given problem and expecting the execution results of a code, which resemble the same reasoning process of THINK-AND-EXECUTE.

5.3 THINK-AND-EXECUTE can Generate a Logic Comparable to Human’s (RQ3)

To gauge LLMs’ capabilities in discerning the underlying logic of a task, we compare THINKAND-EXECUTE (using GPT-3.5-Turbo as the Instructor) with human-written pseudocode prompts. The results are shown in Table 4. Using the GPT-3.5-Turbo the Reasoner, THINKAND-EXECUTE scores 60.4% in terms of accuracy, which is superior to the human-written P (with an accuracy of 55.7%). Especially, in the tasks of Navigate and Tracking Shuffled Objectives, pseudocode prompts generated by THINK-AND-EXECUTE elicit better performance. This also holds true when adopting CodeLlama-7B and -13B as the Reasoner, further suggesting the effectiveness of our THINK step over human writers.

5.4 Impact of LLMs’ Capability on THINK-AND-EXECUTE

In examining the impact of LLMs’ capabilities within our framework, we investigate the influence of both the Reasoner and Instructor components on performance, as depicted in Table 5. Notably, higher accuracy scores are observed when utilizing GPT-3.5-Turbo as Reasoners compared to CodeLlama-13B and CodeLlama-34B. Additionally, the effectiveness

Table 5: Analysis of the effect of the capability of Reasoner and Instructor on the performance. We report the average performance on the 7 tasks.

of the Instructor also plays a crucial role, with GPT-3.5-Turbo exhibiting the highest accuracy scores across all configurations. These results underscore the significance of both the Reasoner and Instructor components in enhancing the performance of THINK-AND-EXECUTE.

Authors:

(1) Hyungjoo Chae, Yonsei University;

(2) Yeonghyeon Kim, Yonsei University;

(3) Seungone Kim, KAIST AI;

(4) Kai Tzu-iunn Ong, Yonsei University;

(5) Beong-woo Kwak, Yonsei University;

(6) Moohyeon Kim, Yonsei University;

(7) Seonghwan Kim, Yonsei University;

(8) Taeyoon Kwon, Yonsei University;

(9) Jiwan Chung, Yonsei University;

(10) Youngjae Yu, Yonsei University;

(11) Jinyoung Yeo, Yonsei University.

Our Analysis on Think-and-Execute and Pseudocode | HackerNoon

Table of Links

5 Analysis

5.1 Implementing the Underlying Logic is more Effective than Instance-specific Logic in Pseudocode (RQ1)

5.2 THINK-AND-EXECUTE Requires Knowledge in Code (RQ2)

5.3 THINK-AND-EXECUTE can Generate a Logic Comparable to Human’s (RQ3)

5.4 Impact of LLMs’ Capability on THINK-AND-EXECUTE

Leave a Reply Cancel reply

Stay Connected

Latest News

Honor X50 smartphone sales surpass 10 million in Chinese market in 10 months

Soundcloud changed its AI policy so it can train on users’ audio

China’s CATL and French shipping firm CMA CGM to set up joint venture · TechNode

Netflix just got a tasty new Vince Vaughn dramedy that’s already hit No. 1 — and it’s based on a true story

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

5 Analysis

5.1 Implementing the Underlying Logic is more Effective than Instance-specific Logic in Pseudocode (RQ1)

5.2 THINK-AND-EXECUTE Requires Knowledge in Code (RQ2)

5.3 THINK-AND-EXECUTE can Generate a Logic Comparable to Human’s (RQ3)

5.4 Impact of LLMs’ Capability on THINK-AND-EXECUTE

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News