Tether QVAC Launches Massive 148B-Token Dataset to Boost Open-Source AI Models Tether QVAC Launches Massive 148B-Token Dataset to Boost Open-Source AI Models

Tether QVAC Launches Massive 148B-Token Dataset to Boost Open-Source AI Models

Tether Data’s AI research division, QVAC, has released QVAC Genesis II on December 22, 2025, as a major expansion of the world’s largest publicly available synthetic educational dataset for AI pre-training. This version adds 107 billion new tokens to the 41 billion tokens from Genesis I, bringing the total to 148 billion tokens. 

The dataset now covers 19 educational domains, including new additions such as chemistry, computer science, statistics, machine learning, astronomy, geography, econometrics, and electrical engineering. Genesis II introduces an option-level reasoning data generation method and a dual-method pipeline to enhance the logical reasoning and accuracy of large language models. 

The release also features a regenerated college physics domain to improve performance in that specific area. This initiative aims to democratize access to high-quality training data and support the development of open-source models capable of complex problem-solving outside of centralized corporate environments.

Add a comment

Leave a Reply