Table of Links
Abstract and 1 Introduction
2 Related Work
2.1 Program Synthesis
2.2 Creativity Support Tools for Animation
2.3 Generative Tools for Design
3 Formative Steps
4 Logomotion System and 4.1 Input
4.2 Preprocess Visual Information
4.3 Visually-Grounded Code Synthesis
5 Evaluations
5.1 Evaluation: Program Repair
5.2 Methodology
5.3 Findings
6 Evaluation with Novices
7 Discussion and 7.1 Breaking Away from Templates
7.2 Generating Code Around Visuals
7.3 Limitations
8 Conclusion and References
5 EVALUATIONS
We conducted three evaluations to understand the quality of our LLM system: 1) a comparison study against an industry standard and baseline informed by professional animated logo designers 2) an empirical analysis of program repair testing different experimental settings, 3) an evaluation with novices to understand LogoMotion’s capacity for customization. These evaluations are centered around the following research questions:
RQ1: Across a wide range of designs, to what extent does LogoMotion support content-aware animation?
RQ2 What are the overall strengths and the weaknesses of LogoMotion at animation?
RQ3 What sorts of errors does LogoMotion tend to make?
RQ4 How capably can visually-grounded program repair debug such errors and what settings of program repair impact performance?
RQ5 To what extent can user interaction and iteration improve the quality of the automatically generated animated logos?
The first evaluation is a comparison study comparing LogoMotion animations against Magic Animate [4] which automatically recommends animations to all elements on a page after a user selects a page-level style (e.g. “Bold”, “Professional”, “Elegant”). We additionally compare LogoMotion with an ablated version of our system (which we will refer to as LogoMotion-Ablated). We ablated parts of our system that 1) conceptually grouped HTML and 2) suggested a design concept to see how these higher-level LLM operators impacted content awareness. Our exact hypotheses for RQ1 were the following.
H1a Compared to the other conditions, LogoMotion would produce animations that were more content-aware.
H1b Compared to the other conditions, the LogoMotion would be improved in terms of sequencing.
H1c Compared to the other conditions, LogoMotion would be better in terms of execution quality.
5.0.1 Methodology. We first gathered a test set of 23 templates that spanned different categories of objects (animate and inanimate), layerings, and use cases. Use cases included holiday greetings, school clubs, advertisements, and branding. All templates were sourced from Adobe Express and Canva and are accessible online.
Each template was exported as a PDF and then converted into an HTML representation using the methods mentioned in subsection 4.2. We ran LogoMotion to get four animations for each template. We then ran LogoMotion-Ablated to get another four animations for each template. In general, generating a set of four LogoMotion/ LogoMotion-Ablated outputs took approximately 12 minutes. To have an industry standard set to compare against, we also collected four options from Magic Animate, which is a AI-based tool for automatic animation that is one of Canva Pro’s premium features. Their page-level presets are conceptually similar to templates (e.g. all elements fade in from one side of the canvas, elements wipe into place). We took first their recommended animation presets for the layout and then had an external designer pick the next best three presets, ensuring we had a strong baseline to compare against.
To evaluate the animations, three professional designers were recruited to rate 276 animations (23 templates x 12 runs per template) spanning the three conditions. Designers were introduced to the task with a remote call, calibrated with good and bad examples to understand the rubric, and compensated for their time. Each animation was presented in randomized order and rated on a scale of 1-5 for each of the following dimensions: 1) Relevance, 2) Sequencing, and 3) Execution Quality. Relevance describes how relevant the animation is to the subject matter of the logo. It is a measure of how content-aware the animation approach is. Sequencing was a measure of how well the animation was sequenced in terms of coordination and timing across elements. Execution Quality judged the animation for how well it was executed and if it had any flaws.
5.0.2 H1a. Relevance. We averaged across the ratings of three design professionals. LogoMotion was rated to have significant more relevance to the subject matter of the animated logos than both Magic Animate and its ablated version (H1a, LogoMotion-Full: M = 3.05, 𝜎 = 0.64; LogoMotion-Ablated: M = 2.68, 𝜎 = 0.58; Magic Animate: M = 2.33 , 𝜎 = 0.33, 𝑝 ≤ 0.001). From this we confirm H1a; LogoMotion-Full was the top condition in terms of content-aware animations.
When sorted by average relevance across raters, the top rated animations tend to come from LogoMotion (15 of top 20) or LogoMotionAblated (5 of top 20). LogoMotion animations that were rated highly tended to show motion that was archetypical of their subject. The video for this paper shows examples of lanterns blowing as if in slight wind, crabs crawling zig-zag across the screen, and skiiers skiing downhill at a diagonal. Frames from examples are depicted in Figure 5. In the first row, a black knight is translated into the canvas in an L-like motion, like a chess piece being set down, as a bishop piece scales up. In the last row, we see a hot air balloon slowly rise in over the mountains, after which the logo title fades in letter by letter.
5.0.3 H1b. Sequencing. In terms of Sequencing, LogoMotion-Full was not significantly different from the other two conditions (H1b,
LogoMotion-Full: M = 3.15, 𝜎 = 0.55; LogoMotion-Ablated: M = 3.18, 𝜎 = 0.41 ; Magic Animate: M = 3.12, 𝜎 = 0.43).
Qualitatively, LogoMotion-Full and LogoMotion-Ablated were both capable of implementing the logical sequencing of a logo reveal. They could time primary elements before secondary elements or vice versa, but generally always put the text last. LogoMotion was also capable of handling complex text sequencings such as arced text and letter-by-letter typewriter effects (see Figure Figure 5).
LogoMotion generally sequenced layers in from bottom to top, though animations rated lower for Sequencing tended to have errors in layer ordering. Background elements could also be used effectively in animations. For example, in the paper’s accompanying video, we show that in an animated logo for a martial arts club, a silhouette of kicking karate character pops in from the left. As it “lands” into place, the background shakes, as if from the impact of their kick. This added dimensionality was from the design concept stage, which suggested, “for added impact, you might consider a brief screen shake or a vibration effect when the silhouette ‘lands’ to draw even more attention to the primary element’s hero moment.”
LogoMotion was also capable of generating animations that reflected gestalt properties such as symmetry. For example, the symmetrical elements of the Circus example in Fig. 5 animated in at the same time, and the musical notes in the Acapella example above it animated in a contralateral, but symmetrical fashion. LogoMotion could also create synchronization by mapping and staggering animation functions to groups of similar objects. This was how it implemented text animation, but it could also do this for elements
within a secondary group (e.g. mapping a flickering action to a group of stars).
5.0.4 H1c. Execution Quality. In terms of Execution Quality, the full LogoMotion pipeline did not perform significantly differently from the other conditions (H1b, LogoMotion-Full: M = 3.25, 𝜎 = 0.54; LogoMotion-Ablated: M = 3.38, 𝜎 = 0.46; Magic Animate: M = 3.22, 𝜎 = 0.39).
LogoMotion-Ablated scored the highest on execution quality. Many animations for this condition tended to be conceptually similar (all elements fade or translate into place from a slight displacement) and were thus minimal in animation complexity and easy to execute. One factor that impacted execution quality was that LogoMotion could at times produce visual flaws that our bounding box checker was unable to catch. For example, LogoMotion could sometimes suggest animation code that targeted attributes like background color or outer glow. These would make it past our program repair stage (which did not check those properties) and bring down the execution quality ratings.