On a quiet stretch of coastline near Xieyang Island in the Beibu Gulf, an artificial intelligence originally designed to identify objects in photos has learned to listen—really listen—to the ocean. There, researchers led by Zhuo Xiao of Guangxi Minzu University have trained the Segment Anything Model (SAM), a foundation AI built for image segmentation, to detect the elusive calls of Bryde’s whales with over 96% accuracy. The breakthrough hinges on a simple but powerful idea: turn sound into sight. By converting seismic recordings into spectrograms—visual maps of sound frequencies over time—whale calls become repeating patterns in an image, something AI like SAM can recognize as easily as it would a tree or a car.

This innovation matters because whales are vanishingly hard to track. They roam vast, dark oceans, often in remote or politically sensitive waters. Passive acoustic monitoring—listening without disturbing—has long been a cornerstone of marine biology, but it’s labor-intensive and prone to human error. Scientists reviewing hours of audio might miss faint or infrequent calls. Now, an AI system not trained on whale sounds at all can spot them with remarkable precision, even catching calls that human analysts overlooked in recordings from January 26 and July 11, 2021. These dates were chosen to capture seasonal shifts in behavior, and the data revealed something subtle but significant: Bryde’s whales call differently in winter and summer. In colder months, their acoustic pulses come faster, suggesting tighter social coordination, while in summer, the intervals stretch out, hinting at more solitary habits.

The real test came when the team pushed beyond their local data. Could SAM recognize fin whale calls recorded off the coast of Ireland? Blue whale vocalizations from Canada? It did. The model’s ability to generalize across species and oceans signals a major leap—not just for one research team, but for global cetacean conservation. “We were pleasantly surprised by the strong performance,” Xiao said. “I think this reflects the power of foundation models that are pre-trained on massive generic image datasets.” Still, challenges remain. Background noise, false positives, and missed detections persist. And despite AI’s growing sophistication, the field still lacks a comprehensive library of whale call spectrograms—like trying to teach a student to identify birds without a field guide. The team plans to refine the model further by integrating multi-modal data, such as seismic and oceanographic sensors, and fine-tuning it specifically for cetacean calls.

For now, the system doesn’t replace human experts—it enhances them. It’s a force multiplier, sifting through oceans of data so scientists can focus on interpretation and action. As oceans grow noisier and whale populations remain vulnerable, tools like this offer a quiet hope: that we might finally hear the full story of what’s happening beneath the waves.