The Data Blog
Due to privacy laws and restrictions, the synthetic data generation market is evolving from a large base of companies generating the data based on legacy methods involving modifying an existing database using Extract Transform Load (ETL) technologies to fully synthetic generation which does not. Fully synthetic technologies involve the use of algorithms to generate the data or the use of AI/ML to analyze a production database and reproduce a facsimile. Complexity of the generated fully synthetic data and fit for use for the system under test varies widely from non-sensical randomly generated data using free tools to premium solutions of highly complex systems of systems databases and the ability to generate statistically significant data for the creation of confusion matrixes and measurement of systems error rates. The fully synthetic data generation market is migrating to higher complexity driven by the ability to make high revenue/profit Enterprise level sales and clear benefits as better test objects that reduce systems error rates while dramatically decreasing software development cycles at less costs than traditional methods.
A recent internet search revealed 43 companies participating in the test data generation market. The majority of companies relied on traditional ETL methods to generate the data with an impressive growth in new companies generating fully synthetic data. Many of these new companies were using a combination of AI/ML techniques and traditional ETL or lower complexity algorithmic solutions. An example is Tonic which has appeared in the market within the last few years with an impressive $35M in Round B venture funding. ExactData appears to remain the only company participating in the premium fully synthetic data generation market.