Questionable AI Data Usage
Interestingly, the standards also talk about “data acquirers” having “transparency around the data they aim to acquire and a mechanism to determine whether to trust and use the data on offer.” This is particularly pertinent if companies are working with AI ventures, building their own systems on LLMs provided by others.
Microsoft works closely with OpenAI, for example, and has recently been involved in a spat with Chinese upstart, DeepSeek, which it accused of “improperly” obtaining OpenAI data to train its model.
IBM currently uses an open-source foundation for its platform, watsonx, but has written a detailed blog post about the benefits of open-source versus proprietary AI LLMs. In this, it talks specifically about the dangers of “incomplete, contradictory, or inaccurate data.” But it also nods to the issues of ensuring “training data was gathered with accountability” and that this data harvesting is “compliant with laws and regulations.”
Here, however, lies the issue, as AI frameworks are still being churned out in the USat quite a clip. Worse, there has been huge friction between those arguing for safety nets and those pushing for innovation – notably OpenAI. Also notably, OpenAI is not at this table.