Table of Links
Abstract and 1 Introduction
2 Background
3 Approach and 3.1 Differential Testing for XML Processors
3.2 XPath Expression Generation
3.3 XML Generation
4 Evaluation
4.1 Effectiveness
4.2 Efficiency
4.3 Comparison to the State of the Art
4.4 Analysis of BaseX Historical Bug Reports
5 Related Work
6 Conclusion, Acknowledgments, and References
4.4 Analysis of BaseX Historical Bug Reports
Unlike formal verification approaches, automatic testing approaches might miss bugs in the system tested. Due to the lack of ground truth, we cannot generally determine which bugs are overlooked by our approach. However, as a best-effort approach, we studied historical bug reports in order to determine whether XPress could have found them.
Bug reports. We analyzed all historical BaseX bug reports in its GitHub bug tracker. We selected BaseX, because the majority of issues are closed (1618 out of 1640). The issue tracker of BaseX is used for confirmed bug reports filtered from reports from the mailing list, and the BaseX maintainers carefully label and document them. For these reasons, it was easy to identify and classify the underlying problem of each bug report.
Methodology. We manually analyzed all historical bug issues until 2023 Apr 17 in BaseX, which were 1597 issues, after excluding the issues we reported. To confine the study of bug reports within the scope of XPath, we selected bug reports triggered by only XPath expressions. To determine whether a bug could be theoretically found by XPress, we mainly checked three aspects of the reports. For XPress to cover the test case, both the XML document and the XPath expression in the test case should not include any unimplemented functions or language features. Second, we could construct the sections and the predicate tree structure of XPress for involved predicates to form the pattern of the bug-inducing XPath expression. Third, XML processors should disagree on the result set. Note that this is a best-effort approach, because we might both incorrectly conclude that XPress might find a bug (e.g., it might be unlikely that the test case would be generated in practice) or incorrectly conclude that a bug cannot be found even when a different test-case within the reach of XPress would trigger the same underlying bug.
Results. Out of the total 78 bugs that we collected, we identified 20 bugs that could have been detected by XPress. For the other 58 bugs, we identified 4 kinds of bugs that XPress would have failed to find, namely due to (1) unimplemented functionalities (51 cases), (2) invalid inputs where the expected result would be an error (6 cases), (3) processors producing different results (2 cases), and (4) miscellaneous other issues (6 cases). Bugs belonging to more than one group are included in all involved groups. The differential testing oracle fails to detect the bugs with processors producing different results, while we consider the other categories mostly as implementation limitations in test-case generation. Therefore, out of all 78 bugs, 76 bugs (97%) could be detected through differential testing. This further demonstrates the effectiveness of employing a differential testing oracle for XPath-related testing.
Unimplemented functionalities. Most uncovered bug reports are due to unimplemented functionalities. Unsupported functions include constructors defined by the XML or XPath language standards, array and map functions, and also constructors of derived datatypes [2], such as xs:NMtokens. Given enough time, it would be straightforward to implement them in XPress. For/while loops, variable declaration, if-else conditional expressions, and self-defined functions are also unimplemented. These could be supported based on approaches that have been proposed in the context of compiler testing [32, 43]. Neither the XML documents nor XPath expressions that XPress constructs involve namespaces, which allow distinguishing items with the same tag name. They could be integrated into the XPress test-case generator. By implementing all these features, an additional 38 bugs (48%) could have been found.
Expected errors. Bug reports grouped into expected is error refers to invalid test cases, which are successfully executed instead of throwing an error. XPress constructs both syntactically and semantically valid expressions and therefore could not detect bugs within this category. However, the differential testing oracle could detect these bugs by comparing the errors of the different XML processors.
Different results. The different result category contains queries for which different processors intentionally produce different results, which shows the limitation of the differential testing oracle. One example is the function id, which selects nodes with xml:id attributes. BaseX takes attributes named as id as xml:id attributes, while Saxon and eXist-DB require an explicit declaration.
Authors:
(1) Shuxin Li, Southern University of Science and Technology China and Work done during an internship at the National University of Singapore ([email protected]);
(2) Manuel Rigger, National University of Singapore Singapore ([email protected]).