Experimentally validated AI model predicts

At Sungkyunkwan University in Seoul, Professor Balachandran Manavalan's team has built an artificial intelligence system that can predict which strains of tomato yellow leaf curl virus will devastate crops—before disease symptoms even appear. The model, called DeepTYLCV, reads viral genomes like a diagnostic code, identifying dangerous variants from DNA sequences alone.

Tomato yellow leaf curl virus is one of the world's most destructive agricultural pathogens. Severe strains trigger leaf curling, yellowing, stunted growth, and catastrophic yield losses. For decades, growers have relied on visible symptoms to identify infected plants, but by then the damage is done. Worse, highly virulent strains continue spreading globally, and some have already overcome the genetic resistance that breeders built into modern tomato varieties. Farmers need a way to spot trouble coming—before it arrives in their fields.

DeepTYLCV solves this by analyzing viral genomes rather than waiting for plants to show physical signs of infection. The model combines two powerful AI approaches: protein language model embeddings (which capture the overall architecture of viral sequences) with a hybrid system of Transformer encoders and multi-scale convolutional neural networks (which spot localized virulence patterns that matter most). This architecture allows the system to recognize both the big picture and the fine details that determine how aggressive a viral strain will be.

The research team, co-led by Dr. Nattanong Bupi, Hariharan Sangaraju, and Duong Thanh Tran, published their findings in Plant Communications. The key innovation here is experimental validation. Rather than simply running predictions on computer data, the researchers performed blind tests on 15 TYLCV isolates—some international reference strains, others collected from Korean fields. They then grew tomato plants, deliberately infected them with each viral variant, and measured what actually happened: symptom severity, viral DNA accumulation, and disease progression over three weeks.

The results were remarkable. DeepTYLCV achieved 100% concordance between its predictions and real-world infection outcomes. Every strain it classified as mild behaved mildly in plants. Every strain it flagged as severe caused severe disease. This is not a theoretical exercise—it is a tool that works.

This matters because DeepTYLCV succeeds where other approaches fail. Conventional field diagnosis depends on visible symptoms, which vary with environmental conditions and observer experience. Image-based AI models face similar limitations. DeepTYLCV needs only a genome sequence, which is becoming cheaper and faster to obtain each year. A farmer, researcher, or agricultural agency can now submit a viral sample for sequencing, feed the genome into this model, and get a reliable prediction within hours instead of weeks.

The work also represents a meaningful step forward from the team's previous effort. In 2023, they published IML-TYLCV, the first genome-based TYLCV severity prediction tool. But that model was trained primarily on Korean isolates, limiting its usefulness for the genetically diverse TYLCV strains circulating globally. DeepTYLCV overcomes this by learning from a more representative global dataset, making it applicable to emerging variants wherever they emerge.

For agriculture facing mounting pressure from pests and pathogens, this convergence of AI, viral genomics, and plant pathology offers a concrete path toward precision disease management. Farmers can now see virulent threats coming not through a field scope, but through a genome sequence—and act before symptoms arrive.

Experimentally validated AI model predicts virulence of tomato yellow leaf curl virus