Federated SPARQL query benchmark

This benchmark is designed to test the federated SPARQL query extension that is being proposed as part of the W3C SPARQL working group. It uses queries that have to be executed across several of the Bio2RDF endpoints, and comply with problems that have been validated by Biology domain experts. The benchmark covers a range of incrementally complex queries from a basic query, and more details can be found in the paper listed at the bottom of the page.

First, we query the geneId endpoint. From that endpoint we retrieve a set of gene ids. Using that set of gene ids we get the MeSH descriptor from the Pubmed endpoint. Doing that we enricht the information about genes that we got in the first query. From these two queries we get the MeSH descriptor which we will use to query the MeSH endpoint. This endpoint contains the National Library of Medicine's thesaurus, for the reference of these descriptors in the RDF database and will allow us to obtain a more detailed information. In the next query we ask to the HHPID endpoint for those the interaction between the genes in the first endpoint and the ones in HHPID. The HHPID will provide interactions with the HIV virus.

Files availables:

Semantics and optimization of the SPARQL 1.1 federation extension, Carlos Buil Aranda, Marcelo Arenas, Oscar Corcho. To appear in Extended Semantic Web Conference (ESWC2011), Semantic Data Management track, 2011.




