We’ve seen no shortage of controversy when it comes to AI. In just the past year, we’ve seen robo-editors confuse mixed-race singers, an algorithm appearing to penalize top-performing students from disadvantaged backgrounds, and an AI-powered tool appearing to favor white faces over black faces. Companies need a plan for mitigating any risk associated with AI. How should organizations develop and implement AI products without falling into ethical pitfalls along the way? The answer boils down to data quality and explainable AI; having the applications able to demonstrate what data was used, knowing who trained the AI training criteria, and making it as clear as possible how the AI came to its answers.
Data quality fuels AI quality
Organizations must remember that AI is not unethical per se, but is limited by the information it is fed and the objectives is given. AI algorithms are trained through a set of data that is used to inform or build the algorithm. If an algorithm is making inaccurate or unethical decisions, it may mean there was not sufficient data to train the model. A new concept of AI quality alongside data quality is fundamental to ensure compliance and ethical risks are spotted in time and properly addressed. It is essential that algorithms operate with enough high-quality data and within a well-defined and unbiased context.
Data and AI quality start with the people that do the work. A workers’ experience significantly impacts the work they deliver, and leadership should ensure developer teams and data scientists collecting and analyzing the data are well-rounded and diverse. Unfortunately, businesses around the world are struggling to establish this precedent. According to research, 80 percent of AI professors are men, and women AI researchers at big tech firms including Facebook and Google represent only 15 percent and 10 percent, respectively. When we don’t take steps to mitigate this, the results can be poor feedback loops that can trap people based on their origins, history or a stereotype.
Regular training is also essential to data quality, as designers need to be aware of all the implications that go into collecting and analyzing quality data. Depending on the difficulty and complexity of a task, customized training may be required to ensure the continued skill development of the data worker, which results in higher quality work. For simpler tasks, minimal training may be enough to deliver quality results.
Optimizing data quality
Building AI systems that battle bias is not only a matter of having more diverse and diversity-minded design teams. It also involves training the programs to behave inclusively. Many of the data sets used to train AI systems contain historical artifacts of biases. If those associations aren’t identified and removed, they will be perpetuated and reinforced. While AI programs learn by finding patterns in data, they need guidance from humans to ensure that the software doesn’t jump to the wrong conclusions. A “check list” should be created, supported by leadership, to guide data scientists and AI engineers and more broadly ‘Data Citizens’, to identify sources of bias in data and to check AI models for fairness. Where possible, developers should attempt to work with simple interpretable models, which also solve issues around explainability and sustainability.
No element is more essential than quality training data, which refers to the initial data that is used to develop an AI model, from which the model creates and refines its rules. The quality of this data has profound implications for the model’s subsequent development, setting a powerful precedent for all future applications that use the same training data. Data labeling is the backbone of good AI training, because, in a simplistic way, it ‘tells’ the model what a good or bad outcome looks like.
An AI model can learn from the labeled data and when such scenarios come again in real life, it can give the most suitable results. The best process for labeling quality training data is built for scale, with tight quality controls and clear parameters for task precision.
Establish a system of governance
Whilst AI ethicists are important to make AI right, they can’t succeed without the proper implementation standards and technology supports. Establishing a system of governance with clear owners and stakeholders for all AI projects is crucial. All involved parties have to play their role to make sure AI is not only successful, but fair and controlled at the same time. This includes monitoring data regularly to ensure bias is not creeping in and the models continue to operate as intended. Whether it’s the data scientist or a dedicated hands-on AI ethicist, someone has to be ultimately responsible for AI policies and protocols.
The call for AI ethicists is only growing as technology leaders publicly acknowledge that their products may be harmful to employment, privacy, and human rights. Several organizations have taken the lead on the actual hiring, while others have made recommendations to hire for this role under different titles like chief trust officer, ethical AI lead, or trust and safety policy advisor. Regardless of the title, a clear role should be established to make sure output and performance of AI are within a given ethical framework.
Turn statements into action
No data model is perfect. That means human beings need to constantly monitor the systems looking for potential ethical problems. Mere accountability isn't enough; all parties involved need to be actively tracking the AI's actions and interactions with humans and making adjustments as necessary to ensure the technology does not cross any ethical boundaries.
Organizations must identify the boundaries, how to enforce them and even how to change them, if necessary. We as professionals and individuals are expected to behave and act ethically and the same should be expected of AI. Just as we’ve always had a need for robust governance with standard analytics, we also have this responsibility with the data that feeds our AI systems — with potentially far greater consequences, as AI starts to ask the questions and define the answers itself.