Sundar Pichai, chief executive officer of Alphabet Inc., during the Google I/O Developers Conference in Mountain View, California, on Wednesday, May 10, 2023.

Google's new large language model, which the company announced last week, uses almost five times as much training data as its predecessor from 2022, allowing its to perform more advanced coding, math and creative writing tasks, CNBC has learned.

PaLM 2, the company's new general-use large language model (LLM) that was unveiled at Google I/O, is trained on 3.6 trillion tokens, according to internal documentation viewed by CNBC. Tokens, which are strings of words, are an important building block for training LLMs, because they teach the model to predict the next word that will appear in a sequence.

Google's previous version of PaLM, which stands for Pathways Language Model, was released in 2022 and trained on 780 billion tokens.

While Google has been eager to showcase the power of its artificial intelligence technology and how it can be embedded into search, emails, word processing and spreadsheets, the company has been unwilling to publish the size or other details of its training data. OpenAI, the Microsoft -backed creator of ChatGPT, has also kept secret the specifics of its latest LLM called GPT-4.

The reason for the lack of disclosure, the companies say, is the competitive nature of the business. Google and OpenAI are rushing to attract users who may want to search for information using conversational chatbots rather than traditional search engines.

But as the AI arms race heats up, the research community is demanding greater transparency.

Since unveiling PaLM 2, Google has said the new model is smaller than prior LLMs, which is significant because it means the company's technology is becoming more efficient while accomplishing more sophisticated tasks. PaLM 2, according to internal documents, is trained on 340 billion parameters, an indication of the complexity of the model. The initial PaLM was trained on 540 billion parameters.

Google didn't immediately provide a comment for this story.