Data-Centric AI
AI systems
Code and data are the foundations of the AI systems. Both of these components play an important role in the development of a robust model . AI systems work by combining large sets of data with intelligent, iterative processing algorithms to learn from patterns and features in the data that they analyze. Each time an AI system runs a round of data processing, it tests and measures its own performance and develops additional expertise.
The Components of AI Project Cycleare:-
1- Problem Scoping: Understanding the problem
2-Data Acquisition- Collecting a ccurateand reliable data
3-Data Exploration: Arranging the data uniformly
4- Modelling- Creating Models from the data
5-Evaluation- Evaluating the projec
#
What Is Data-Centric AI?
the discipline of systematically engineering the data needed to build a successful AI system.
In data-centric AI, the focus is on systematically iterating on the data to improve its quality so as to improve performance; it is a continuous process, something you do not only at the start but even after deployment into production.
Data-centric AI vs model-centric AI
In a data-centric approach, you spend relatively more of your time labeling, managing, slicing, augmenting, and curating the data, with the model itself remaining relatively more fixed.
#
Benefits of data-centric AI
Some of the world’s largest organizations have benefited from adopting data-centric AI. Using Snorkel Flow, a data-centric AI application development platform, companies from diverse industries such as banks, biotech, insurance providers, telecommunications, government agencies, and more have seen improvements in developing and deploying deep-learning-based solutions. A few improvements we’ve seen from the adoption of a data-centric approach include:
1-Faster development: A Fortune 50 bank built a news analytics application 45x faster and with +25% higher accuracy than a previous system.
2-Higher accuracy: A global telco improved the quality of over 200,000 labels for network classification resulting in a 25% improvement in accuracy over the ground truth baseline.
3-Cost savings: A large biotech firm saved an estimated $10 million on unstructured data extraction, achieving 99% accuracy using Snorkel Flow
Data-Centric Impacts Performance
A data-centric AI approach involves building AI systems with quality data — with a focus on ensuring that the data clearly conveys what the AI must learn. Doing so helps teams reach the performance level required and removes unnecessary trial-and-error time spent on improving the model without changing inconsistent data.
Data-Centric Promotes Collaboration
Quality managers, subject matter experts, and developers can work together during the development process to:
reach a consensus on defects and labels
build a model
analyze results
make further optimizations.
Data-Centric Reduces Development Time
With such an approach, teams can work in parallel and directly influence the data used for the AI system. By removing unnecessary back and forth among groups and looping in human input at the point where it’s needed most, the result is reduced development time.
#
Summary:
One of the common misconceptions about Data-Centric AI is that it’s all about data pre-processing. It is actually about the iterative workflow of developing a machine learning system, in which we improve the data over and over again. One of the most powerful ways to improve your AI system is to engineer the data to fix the problems identified by error analysis, and try training the model again
reference
https://datacentricai.org/