Implementation and Performance Evaluation of Fuzzy File Block Matching

In 2007 USENIX Annual Technical Conference, June 2007.

Bo Han and Pete Keleher

The fuzzy file block matching technique (fuzzy matching for short), was first proposed for opportunistic use of Content Addressable Storage. Fuzzy matching aims to increase the hit ratio in the content-addressable storage providers, and thus can improve the performance of underlying distributed file storage systems. In particular, fuzzy matching can potentially save significant network bandwidth and reduce file transmission costs. Fuzzy matching employs shingling to represent the fuzzy hashing of file blocks for similarity detection, and error-correcting information to reconstruct the canonical content of a file block from some similar blocks.

In this paper, we present the implementation details of fuzzy matching and a very basic evaluation of its performance. In particular, we show that fuzzy matching can recover new versions of GNU Emacs source from older ones.

	title = "Implementation and Performance Evaluation of Fuzzy File Block Matching",
	author = "Bo Han and Pete Keleher",
	booktitle = {2007 USENIX Annual Technical Conference},
	month = {June},
	year = {2007},

Available: bibtex, abstract,