Most AI models being trained on poor quality datasets: Rajeev

Says that is why you have the embarrassing sight of billion-dollar Gemini/ChatGPT on many occasions spewing nonsense

image for illustrative purpose

Most AI models being trained on poor quality datasets: Rajeev

18 Jun 2024 7:11 AM IST

An X user posted that AI machines needed to be fed information and not intelligent to deconstruct or articulate like a human

New Delhi: Most artificial intelligence (AI) models are being trained on datasets which are poorly designed which has undermined the quality of the end products, former Union minister Rajeev Chandrasekhar said on Monday.

In a post on social media platform X, the former minister said large language models’ (LLMs) "bullshit content" comes from most models being trained on content/datasets that are, “to politely use the phrase, not quality assured”.

“That is why you have the embarrassing sight of billion-dollar Gemini/ChatGPT on many occasions spewing nonsense,” said Chandrasekhar.

He was reacting to Ethan Mollick, Associate Professor at the Wharton School of the University of Pennsylvania in the US, who posted that the idea that “LLMs bullshit - produce content without regard to truth - is not new”.

“But I don't think it is that helpful a framework, since, in the end, LLMs may be more accurate than humans in many tasks,” said the professor.

Chandrasekhar, former Minister of State for Electronics and IT, responded: “Garbage in, Garbage out is an old adage in programming especially if you are depending on scraping the internet”.

An X user posted that AI machines needed to be fed information and not intelligent to deconstruct or articulate like a human.

“End of the day, the human mind still decides what the machine has to say,” she added.

Artificial Intelligence LLMs Dataset Quality Rajeev Chandrasekhar ChatGPT Ethan Mollick Garbage Internet Scraping