Authors:
(1) Yanpeng Ye, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, and these authors contributed equally to this work;
(2) Jie Ren, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, Department of Materials Science and Engineering, City University of Hong Kong, Hong Kong, China, and these authors contributed equally to this work;
(3) Shaozhou Wang, GreenDynamics Pty. Ltd, Kensington, NSW, Australia ([email protected]);
(4) Yuwei Wan, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China;
(5) Imran Razzak, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia;
(6) Tong Xie, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and School of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]);
(7) Wenjie Zhang, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]).
To structure the inference results into triples, we employ three labels that signify the material as the core label, with the core label serving as the head, the names of other labels forming the relations, and their values acting as tails. Within the hierarchy of core labels, “Formula” is accorded the highest priority, followed by “Name”, and then “Acronym”. Furthermore, each head and tail node is linked to the DOI of the source article associated with the triplet. This setup allows us to ascertain the provenance of the relation between any two nodes by examining the intersection in their connected entities. Then we transfer the triples into FMKG and store the FMKG via graph database Neo4j, which also supports the subgraph matching function, where subgraph matching naturally suits the need to search for certain materials with user-input conditions. To facilitate access to the detailed information, we also make the dataset available in the CSV format for straightforward data handling.