Are 'Aryan' and 'Dravidian' real genetic categories or colonial inventions?
Both, sort of. The terms are colonial categories that don't map cleanly to biology, but they roughly correspond to real ancestral components that population genetics has identified in South Asians.
Modern South Asian populations are descended from a mix of several ancestral groups. The two largest contributions, identified through extensive ancient DNA research over the last decade (David Reich's lab at Harvard has done much of this work), are usually called:
- Ancestral North Indians (ANI): a population with significant input from Iranian agriculturalists and Steppe pastoralists who entered the subcontinent in waves between about 4500 and 1500 BCE.
- Ancestral South Indians (ASI): a population descended from the earlier hunter-gatherers and agriculturalists of the subcontinent, with deep roots going back tens of thousands of years.
Almost every South Asian population today is a mixture of ANI and ASI in different proportions. Punjabis tend toward more ANI ancestry, Tamils toward more ASI ancestry, but the gradient is continuous and there are no clean lines. Every modern Indian is a mix.
The colonial framing - "Aryan invasion" of Dravidian natives - oversimplifies what was actually a long, complex, multi-wave history of migration, mixing, agriculture, and language change over four thousand years. The genetics broadly supports the existence of two major ancestral components but does not support the cleaner colonial narrative.
The vocabulary is loaded, the underlying genetics is real, and the politics of the topic has interfered with public understanding of the science. This is one of the most active research areas in South Asian genetics right now.
Arjun's answer is solid. From a non-scientist perspective: if you upload your DNA to enough services, this is one of the most striking parts of the results. Almost every Indian shows up as a continuous gradient - there is no clean "North Indian" or "South Indian" cluster in the data, more like a smooth spectrum.
It is one of those moments when the genetic data quietly contradicts a lot of inherited categories. Worth sitting with.