Retroscope:
Retrospective Monitoring of Distributed Systems
Aleksey
Charapko, Ailidani Ailijiang, Murat Demirbas, Sandeep Kulkarni
Abstract
Retroscope is a comprehensive lightweight distributed monitoring tool that enables
users to query and reconstruct past consistent global states of the system. Retroscope achieves this by augmenting the system with
Hybrid Logical Clocks (HLC) and by streaming HLC-stamped event logs for storage and processing; these HLC timestamps are then used for constructing global (or
nonlocal) snapshots upon request. Retroscope provides
a rich querying language (RQL) to facilitate
searching for global predicates across past consistent states. The search is
performed by advancing through global states in small incremental steps,
greatly reducing the amount of computation needed to construct consistent states.
The Retroscope search algorithm is
embarrassingly-parallel and can employ many worker processes (each processing
up to 150,000 consistent snapshots per second) to handle a single query. We
evaluate Retroscope’s monitoring capabilities
in two case studies: Chord and Apache ZooKeeper.
Paper:
Return to the publication list
Return to the Sandeep's home page