When you hand control of a car to an AI system, you feel the difference immediately. On a recent test drive of XPENG's P7 equipped with VLA 2.0 autonomous driving, what struck me most was not the technological prowess—it was how human the driving felt. The car anticipated curves before they came, adapted its braking pressure to road conditions, and made judgment calls that felt intuitive rather than mechanical. This is the ambition behind XPENG's latest system: not to replicate how humans drive, but to reflect how humans think.

The key to this shift lies in a fundamental rethinking of how AI processes the real world. Traditional autonomous systems force everything through language models, translating visual information into text, then from text into action—a cumbersome process that leads to rigid, overthinking responses. VLA 2.0 eliminates that translation layer entirely. Instead, it operates on a direct visual-to-action pathway, much like a baseball player who doesn't analyze each muscle contraction before throwing. The result is an 33% reduction in prediction error and a system that handles complex, unpredictable scenarios with the calm response of an experienced driver.

At the hardware core sits XPENG's proprietary Turing AI chip, available in the new GX vehicle with up to 3000 TOPS of computing power—significantly more than competing systems. But raw power tells only part of the story. Through optimized integration of chip, compiler, and model, XPENG reports that on-vehicle chip utilization reaches approximately 4 times higher efficiency than traditional general-purpose chips paired with open-source models. The gains are staggering: a 51% boost in neural network computing speed, a 145% increase in information processing capacity, and a 300% surge in information throughput per second. This concentrated computing power means the car processes vastly more environmental data on board, without calling out to external servers—enabling faster, more intuitive responses.

The measure of this achievement speaks volumes: XPENG estimates that VLA 2.0's on-vehicle inference token consumption is roughly 80 times the daily Digital AI volume that all of China processes. The system supports a "32x ultra-dense computing chain" that feeds prediction accuracy improvements across all driving scenarios.

What emerges from this architecture is something closer to genuine machine learning than pre-programmed automation. When a driver shifts aggressively during a test drive, the system adapts, initially matching the aggressive style before settling into smoother control as it learns the driver's preferences and the road conditions. When it encounters what XPENG calls "long-tail scenarios"—rare, complex situations that training data seldom covers—the system preemptively identifies risks rather than reactively managing them.

Built on "human-like first principles" and operating on a "what you see is what you get" basis, VLA 2.0 gains a crucial advantage: stronger generalization. The software can transfer across scenarios and geographies without retraining. When a driver moves from one country's roads to another's, the car doesn't need to relearn driving; it adapts. This portability matters enormously for global deployment of autonomous technology.

The ambition here extends beyond making cars safer—though that remains the core purpose. It's about building machines that interact with the physical world the way intelligent beings do: through observation, adaptation, and intuitive response rather than laborious analysis. In autonomous driving, that shift from mechanical to thoughtful could reshape how we move through cities and across continents.