AI Definitions: Synthetic Data
/Synthetic Data – This type of data is produced by a GenAI mathematical model. It can be created from scratch or derived from data that come from real-world systems. Some experts say we are running out of original human data to feed to LLMs for training and can use synthetic data in place of the real thing. If synthetic data can be made to work, it could negate the problem of using copyrighted material for training. Sceptics say using synthetic produced data will lead to a degradation of model’s performance. There is also the danger of misrepresenting synthetic GenAI data as real data, providing fertile ground for misconduct. Previously effective methods of spotting fraudulent data through statistical techniques, such as detection of nonrandom digits, are being made obsolete by the emergence of synthetic data. This possibility is why some scientists consider its use to be unethical.
More AI definitions here.