Table of Links
Abstract and 1 Introduction
2 Background
3 Approach and 3.1 Differential Testing for XML Processors
3.2 XPath Expression Generation
3.3 XML Generation
4 Evaluation
4.1 Effectiveness
4.2 Efficiency
4.3 Comparison to the State of the Art
4.4 Analysis of BaseX Historical Bug Reports
5 Related Work
6 Conclusion, Acknowledgments, and References
4.3 Comparison to the State of the Art
We are aware of only one automated testing approach that has been proposed to test XML processors [41]. It tackled the test oracle problem by using differential testing by comparing the results of Microsoft’s SQLServer with and without using indexes. Their approach was specifically designed to test SQLServer’s index support and is not publicly available. Due to the narrow testing scope, and since the tool is not publicly available, we could not conduct experiments to directly compare the approaches. However, we further extended our tool to support differential testing with index configurations. Both approaches are complementary, as XPress could not only use differential testing among various XML processors, but also create or omit indexes to find additional bugs.
Index support in BaseX, eXist-DB, Saxon, and libxml2. Database indexes are data structures built to speed up data retrieval [31] and are DBMS-specific. Not all XML processors are DBMSs—as in-memory processors, Saxon and libxml2 lack support for indexes. BaseX and eXist-DB both enable structural indexes, such as storing all distinct paths of nodes by default. For value indexes to optimize querying on content values, BaseX creates text index and attribute index automatically. Users can further define additional indexes. Additionally, BaseX provides token indexes, which apply to specific functions, such as contains-token. eXist supports range indexes, which could be defined for specific nodes or attributes to speed up related comparison searches on their contents.
Methodology. We tested eXist’s range index and BaseX’s token index using the XPath expression generation approach as described in Section 3.2. Due to the found unfixed bugs in eXist, we conducted differential testing within eXist by checking the results with and
without range index definition. For BaseX, we defined a token index and compared its results directly with the results of Saxon.
Results. Throughout the testing method, we detected one additional bug for BaseX[10] and found no additional bugs in eXist. We reported the found bug shown in Figure 9 to the BaseX developers, who quickly fixed it. The query selects all nodes with tag name M in the document which holds attribute v that contains token “a”. BaseX returned node M without token index, as expected, while unexpectedly returning an empty result set when not using an index. Overall, while the results suggest that using or removing indexes might find additional bugs, doing so had low effectiveness. A potential explanation could be that our test-case generation approach does not consider when indexes could be applied, which might result in low testing efficiency.
[10] https://github.com/BaseXdb/basex/issues/2222
Authors:
(1) Shuxin Li, Southern University of Science and Technology China and Work done during an internship at the National University of Singapore ([email protected]);
(2) Manuel Rigger, National University of Singapore Singapore ([email protected]).