The response to the algorithm Twitter uses to center images in tweets is worth looking at more closely, given the current situation in the United States, where racism is very much in the news, proving a highly divisive issue.
The purpose of Twitter’s algorithm is, in principle, very simple: when somebody uses an image in a tweet and it can’t be shown in its entirety, the fragment that is shown of it when the tweet appears in the user’s timeline tries to pick up as best as possible its meaning or to summarize in the best possible way its essence. To do this, the algorithm tends to select the parts of the image, for example, where it recognizes a text, or if it detects a face, it tends, logically, to focus on it. So far, so good.
The problem arises when people realize that the algorithm almost always chooses a white face over a black one. Try it yourself: put two faces, one white and one black, separated enough that they cannot be shown at the same time, then tweet it, and you’ll see.
The same applies to dogs, and with cartoon characters. It doesn’t matter on which side the black or white person’s picture is: left, right, up, down… the algorithm always chooses the white person’s picture. The background? Doesn’t seem to make much difference — adults or children? No. Everything indicates that the Twitter algorithm, for some reason, is racist.
Obviously, that first conclusion, like so many others made lightly, is completely wrong. For anyone who has even the slightest knowledge of Twitter or has had any contact with its managers, the idea that it is a racist organization is utter nonsense. After all, algorithms only reflect the data used to train them, often reflected in patterns that can’t initially be seen. Horrified, the company’s response is transparent and some of its executives, genuinely puzzled by the whole thing, used Twitter to offer an explanation. The algorithm was repeatedly tested with different images, but nobody thought to test whether it could turn out to be racist, as this was not a desirable or plausible outcome. For anyone (like me) who has been in touch with the company or knows personally some of its founders and executives, the whole thing makes no sense: Twitter, racist? Impossible.
Apparently, some of the latent variables in the data used to train the algorithm were to blame, aided by the complex set of weights and variable compositions. The problem is that analyzing which variable, which weight, or whatever made the algorithm behave the way it does is anything but obvious. Algorithms of this type generally tend to group combinations of variables with high correlation and consider them indicators that are used as latent variables in sometimes very complex mathematical procedures, which sometimes leads to difficulty in understanding where a particular bias comes from. It is possible that the collections of images with which the algorithm was trained contained a greater number of images of whites than blacks, and that effect was simply not detected in the tests that were done with the algorithm before it was rolled out.
How do you deal with a problem of this type? The fundamental thing, of course, is to be transparent: that result is not intentional, it has arisen from some kind of problem in the development of the algorithm, and it will be solved as soon as possible. If you can open it up to the public to be examined by a larger number of eyes, you will certainly have a better chance of understanding what has happened earlier, and of fixing it sooner. It is essential to understand that this can happen in almost any context, at almost any time, usually early on in the use of the algorithm, none of which undermines machine learning, and simply highlights the need for a longer training period.
The company is also planning to offer its users more control over how their images will appear in a tweet, and will try out several options and possibilities to do so. For the moment, it probably makes sense to let users decide, at least until the algorithm is well trained.
The best way to understand a machine learning algorithm is to understand how machine learning and its algorithms work. Their potential is huge, but they require full oversight. What has happened to Twitter is perfect for helping to understand this. But let’s be ready for the worst, no doubt about it, because we will certainly discover many more biases like these in many other algorithms in many other places in the foreseeable future.
This post was previously published on medium.com .
If you believe in the work we are doing here at The Good Men Project and want a deeper connection with our community, please join us as a Premium Member today.
Premium Members get to view The Good Men Project with NO ADS. Need more info?
Photo credit: iStockPhoto.com