Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo
If you’ve ever watched a motion capture system struggle with a person’s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity. Getting a model to understand all of that, at once, across arbitrary real-world images, is genuinely difficult. Meta AI research team introduced Sapiens2 , the second generation of its foundation model family for human-centric vision. Trained on a newly curated dataset of 1 billion human images , spanning model sizes from 0.4B to 5B parameters, and designed to operate at native 1K resolution with hierarchical variants supporting 4K , Sapiens2 is a substantial leap over its predecessor across every benchmark the team evaluated. https://ift.tt/8pzVvfG What Sapiens2 is Trying to Solve The original Sapiens model relied...

