Table of Links
Abstract and 1 Introduction
2 Background
3 Approach and 3.1 Differential Testing for XML Processors
3.2 XPath Expression Generation
3.3 XML Generation
4 Evaluation
4.1 Effectiveness
4.2 Efficiency
4.3 Comparison to the State of the Art
4.4 Analysis of BaseX Historical Bug Reports
5 Related Work
6 Conclusion, Acknowledgments, and References
4.2 Efficiency
Existing-generator baselines. We considered the only two—to the best of our knowledge—approaches to generate XPath expressions. Neither of them was specifically designed to be combined with a XPath test oracle. XQgen [42] generates XPath queries for micro benchmarking. Its generated predicates only check for sub-element existence. The XQuery generator designed by Todic and Uzelac [41] generates XPath queries for automatically testing index support in DBMSs. Given that indexes apply only to sargable queries (i.e., simple comparisons), the expressions it generates are simple. Both approaches generate XPath expressions based on an XML schema, while XPress generates XPath expressions based on the actual XML document. Based on this, we expect both of them to have low applicability for our differential-testing approach. Given that neither implementations are publicly available, we re-implemented them based on the description in the papers.
Self-constructed baselines. We also constructed our own baselines to investigate the efficiency of the separate components of XPress. XPress has two main components, namely (1) the targeted predicate generation by using the targeted node to refer to existing nodes and attributes and (2) the predicate rectification to avoid empty result sets. To evaluate the effect of the components individually, we enabled them individually to test whether they improve XPress’s bug detection efficiency.
Configurations. We considered four configurations for our selfconstructed baselines. Apart from our proposed approach introduced in Section 3.2 as (1) Targeted, we derive configuration (2) Targeted without Rectification, (3) Untargeted with Rectification, and (4) Untargeted without Rectification. In (2) Targeted without Rectification, we disable the rectification process, which would otherwise ensure targeted node selection. Since selecting a targeted node for predicate generation guidance always requires at least one node in the result set, we stop generating new sections after an empty result set is produced. In (3) Untargeted with Rectification, we generate predicates without using targeted node information to supply parameters that reference existent context and trigger corner cases for function nodes, while keeping the rectification to ensure that at least one node from the candidate set is included in the result set. In (4) Untargeted without rectification, we remove both components to generate predicates randomly, while omitting rectification.
Methodology. We set each baseline to run for 24 hours [30]. We repeated each experiment 10 times to account for potential performance deviations, and report the arithmetic mean for all metrics. As our testing target, we selected BaseX 10.4, which is the BaseX version that we first started testing. The reason for selecting BaseX as a representative is that we found most bugs in BaseX and all bugs were fixed, allowing us to determine the number of unique bugs we found in a testing campaign by deduplicating bug-inducing test cases automatically. Specifically, given two bug-inducing test cases, we could determine whether they trigger the same underlying bug by identifying their fix commits; only if their associated fix commit are different, do we consider the bugs unique. This is a best-effort technique, as, for example, one fix commit might address multiple bugs. We disabled the generation of the has-children functions as well as using relative XPath expressions in predicates, as they consistently lead to crashes, triggering known bugs.
Results of existing generators. Neither XQGen nor the Combined XML/XQuery generator found bugs in our experiment. This is expected, as previously proposed approaches were not designed for
automated testing. As mentioned above, XQGen generates predicates that only check for element existence. The XQuery generator designed by Todic and Uzelac generates simple predicates that include at most one comparison operator.
Results of different configurations. As Figure 8 shows, our proposed approach, Targeted outperforms the other configurations. Within 24 hours, it found the most number of unique bugs (namely 12.5). Both configurations with targeted generation clearly outperformed the untargeted approaches, while rectification shows a similar performance in the speed of bug detection. As shown in Table 3, both targeted generation and rectification reduce the testing throughput, as they obtain intermediate results using the XML processor under test. Despite generating only 50% of the number of test cases as compared to (4) Untargeted without Rectification, (1) Targeted detected 20× more bug-inducing test cases and 2× more unique bugs. The results show that selecting a target node to guide the XPath generation process improves testing efficiency significantly. As observed above when discussing the small-scope hypothesis, most of the bugs that we found can be reproduced using a single section, explaining the limited effectiveness of rectification. However, we still believe that rectification is an important component, since without it, bugs requiring multiple sections with non-empty results could hardly be found.
Code coverage. We collected code coverage for three processors’ core modules for XPress for 24 hours [30] of execution. The result is shown in Table 4. To put the numbers in relation, we collected coverage also for the projects’ test suites; Saxon has no publicly available test suites and is therefore excluded. For the three XML processors, the line coverage ranged from 15% to 20%, and the
branch coverage ranged from 10% to 16%. The coverage percentages are low, which is expected. The main reason for low code coverage is that XML processors typically also have other components than XPath processing. Taking BaseX as an example, around 21% of uncovered code was GUI-related, 10% was due to lack of full-text functionality support, and 5% were database commands. In Saxon, as another example, XSLT modules have not been covered. A further 18% uncovered code in BaseX involved unimplemented functions; it would be straightforward to implement many additional ones, such as math functions, but the many functions available would make this a tedious task. In Section 4.4, we detail unsupported XPath features, implementing which might allow us to find more bugs. XPress’s test-case generation process primarily aims at generating semantically valid expressions, which results in low error-checking branch coverage, quantifying which is difficult, as the relevant code is spread throughout the code base.
Authors:
(1) Shuxin Li, Southern University of Science and Technology China and Work done during an internship at the National University of Singapore ([email protected]);
(2) Manuel Rigger, National University of Singapore Singapore ([email protected]).