Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

The data cyclotron query processing scheme


Goncalves,  Romulo
0 Pre-GFZ, Departments, GFZ Publication Database, Deutsches GeoForschungsZentrum;

Kersten,  Martin
External Organizations;

External Ressource
No external resources are shared
Fulltext (public)
There are no public fulltexts stored in GFZpublic
Supplementary Material (public)
There is no public supplementary material available

Goncalves, R., Kersten, M. (2011): The data cyclotron query processing scheme. - ACM Transactions on Database Systems, 36, 4, 1-35.

Cite as: https://gfzpublic.gfz-potsdam.de/pubman/item/item_5025250
A grand challenge of distributed query processing is to devise a self-organizing architecture which exploits all hardware resources optimally to manage the database hot set, minimize query response time, and maximize throughput without single point global coordination. The Data Cyclotron architecture [Goncalves and Kersten 2010] addresses this challenge using turbulent data movement through a storage ring built from distributed main memory and capitalizing on the functionality offered by modern remote-DMA network facilities. Queries assigned to individual nodes interact with the storage ring by picking up data fragments, which are continuously flowing around, that is, the hot set. The storage ring is steered by the Level Of Interest (LOI) attached to each data fragment, which represents the cumulative query interest as it passes around the ring multiple times. A fragment with LOI below a given threshold, inversely proportional to the ring load, is pulled out to free up resources. This threshold is dynamically adjusted in a fully distributed manner based on ring characteristics and locally observed query behavior. It optimizes resource utilization by keeping the average data access latency low. The approach is illustrated using an extensive and validated simulation study. The results underpin the fragment hot set management robustness in turbulent workload scenarios. A fully functional prototype of the proposed architecture has been implemented using modest extensions to MonetDB and runs within a multirack cluster equipped with Infiniband. Extensive experimentation using both microbenchmarks and high-volume workloads based on TPC-H demonstrates its feasibility. The Data Cyclotron architecture and experiments open a new vista for modern distributed database architectures with a plethora of new research challenges.