Retroscope: Retrospective Monitoring of Distributed Systems

Aleksey Charapko, Ailidani Ailijiang, Murat Demirbas, Sandeep Kulkarni



Retroscope is a comprehensive lightweight distributed monitoring tool that enables users to query and reconstruct past consistent global states of the system. Retroscope achieves this by augmenting the system with Hybrid Logical Clocks (HLC) and by streaming HLC-stamped event logs for storage and processing; these HLC timestamps are then used for constructing global (or nonlocal) snapshots upon request. Retroscope provides a rich querying language (RQL) to facilitate searching for global predicates across past consistent states. The search is performed by advancing through global states in small incremental steps, greatly reducing the amount of computation needed to construct consistent states. The Retroscope search algorithm is embarrassingly-parallel and can employ many worker processes (each processing up to 150,000 consistent snapshots per second) to handle a single query. We evaluate Retroscope’s monitoring capabilities in two case studies: Chord and Apache ZooKeeper.


