Data mastering is the process of unifying multiple independently constructed data sets about an entity, for example customers, suppliers, or parts. Every large enterprise has this information in data silos and must perform unification to get full value from their data.
In this talk, I argue that ALL mastering projects should move to the cloud for cost reasons. Moreover, I then argue that a software-as-a-service (SAAS) architecture is the most cost-effective cloud mastering environment. Furthermore, I argue that cloud-native solutions that use the underlying services of each cloud provider are the best path forward. Lastly, there is invariably a real-time mastering problem as changes occur in enterprise data, and mastering solutions must deal with this issue. Lastly, often there is some text accompanying structured data in a mastering problem, and I indicate the best way to handle this issue.
Professor, MIT CSAIL