We propose DiffuStereo, a novel system using only sparse cameras (8 in thiswork) for high-quality 3D human reconstruction. At its core is a noveldiffusion-based stereo module, which introduces diffusion models, a type ofpowerful generative models, into the iterative stereo matching network. To thisend, we design a new diffusion kernel and additional stereo constraints tofacilitate stereo matching and depth estimation in the network. We furtherpresent a multi-level stereo network architecture to handle high-resolution (upto 4k) inputs without requiring unaffordable memory footprint. Given a set ofsparse-view color images of a human, the proposed multi-level diffusion-basedstereo network can produce highly accurate depth maps, which are then convertedinto a high-quality 3D human model through an efficient multi-view fusionstrategy. Overall, our method enables automatic reconstruction of human modelswith quality on par to high-end dense-view camera rigs, and this is achievedusing a much more light-weight hardware setup. Experiments show that our methodoutperforms state-of-the-art methods by a large margin both qualitatively andquantitatively.