A few weeks ago I published the introductory piece to this called “The Robots are Coming! (Why HR Should care about machine learning). In the next few posts we are going leap into more of what you need to know to embrace this change and make it a part of your future. If you are a nerd like me, you occasionally ponder about how to survive the future workforce. I’m not sure if it is going to look more like Terminator, iRobot, or Wall-E. No matter what, we should pursue whatever we can do to make the future a lot less Terminator-y.
Since the robot references related to automation can get a bit old, I’m going to change it up a bit. Recently the Jurassic World 2: Fallen Kingdom Poster was just released and the movie will be out in a year. As a point of reference, I’ll refer to the franchise as to point out one of the first key risks to be aware of as an HR professional dabbling into machine learning.
You may say “But I don’t understand advanced statistical programs, or want to learn how to do machine learning, or programming or any of that. Besides, HR doesn’t really have a seat at the table when it comes to topics like machine learning.”
Fair enough. But you have a choice. You can do be the lawyer in the original Jurassic Park and hide in the bathroom stall until the dinosaur eats you. Or you can be like Owen in the Jurassic World, becoming the “Alpha” leader in your organization and maintain eye contact and live on to another sequel.
Doesn’t seem like much of a choice, does it?
Here is a big lesson (and risk) that you need to know about machine learning:
The robots can learn from data it’s not supposed to have.
Imagine this scenario. Some vendor comes to present to your team and describes their giant, fancy “deep-learning” artificial intelligence system to predict which employees have the underlying characteristics to be promoted. The claim is that because it’s a model, not managers making judgements, that it is more accurate and not prone to human bias. You can spot great talent using the data, not managers who might be holding them back.
“That’s a great thing,” says your tech-inspired/risk averse leader. “Bias is poisonous to the workforce. We should let the data speak for itself, and find that next generation of talent.”
The rep smells opportunity. “We can load in your employee demographics, but leave out race, gender, and age since we wouldn’t want the model to be influenced by those variables.”
Time for your smart question.
“How do you know that all this data you’re collecting and training the machine on isn’t biased? If there is some underlying bias in the behavior you are tracking, how do you know that it’s not picking that up?”
The answer you get will probably be like in Jurassic Park, when the protagonists asked the DNA splicing scientist how they kept the dinosaurs from breeding. The response was, very logically and confidently, that they only release female dinosaurs so that they can’t reproduce.
Just as Jeff Goldblum’s character famously stated, “Life Finds a Way.” In the data science world, the data can find a way.
Let’s talk about who normal employee data can accidently be biased by race or age. The underlying premise is that other data points can often be very good proxies for those demographic factors. Those other variables don’t need to be causal – if there is a relationship between a benign data point and something like race, the model can accidently learn to be racist if that correlated variable influences a rule.
Unfortunately, most sensitive demographics tend to have normal variables that correlate.
Race might be correlated with zip code, region of the country, or even someone’s social network if that is part of an analysis. Race may also unintentionally pop up in something like college choice, if you had a large number of graduates from the University of Alabama and other graduates from Alabama A&M University, if there were differences in their treatment in the workforce the computer might not realize that one of those is a HCBU school with a high-correlation in racial differences between those schools.
Age is easily derived through past work experience, and it’s not hard to see how something as simple as a personal email address might indicate age (how many millennials have an @AOL.com email address?)
Factors like gender may have correlations to full-time or part-time status or with breaks in service or any number of other differences that may trend in HR data.
Remember, too, that machine learning starts with the machine learning from a set of scenarios and outcomes. If the machine is learning about “who is a good employee” from your employment rating system, and you had managers who were biased against a certain gender, race, or age of employee – then those outcomes would be viewed by the machine as being correct, even if they were biased. And if you say let’s use something more “objective” like historical career track (who progressed the most quickly), you might still be perpetuating any underlying biases that impact that career track. In effect, the computer may just codify the things you were afraid your managers were doing all along.
Before you say “That’s an outrageous hypothetical you came up with” – I’d challenge everyone interested in this area to watch this TED talk.
Going back to Jurassic Park, where late in the movie, the team accidentally stumbles onto a nest dinosaur eggs. Despite the earlier re-assurances of no male dinosaurs being released, there is the sudden realization that by using frog DNA to fill in the gaps in the dinosaur DNA, that they gave the dinosaurs the ability to reproduce asexually, and that “Life finds a way”. That makes for a good movie twist, but it is a bad outcome if it is an oversight on your part in your predictive learning system.
I give all these examples not to suggest that every machine learning system will have these problems, or that you can’t correct these problems. The architects of machine learning products, who are brilliant enough to build these programs and models, are also smart enough to control for these biases if they are asked to do so. But if the super-users and consumers of these products – yes, that’s you – are not even aware of the potential unintended consequences, then how likely is it to be requested?
You won’t be eaten by a bad algorithm. But being sued isn’t out of the question.
Maybe robots, dinosaurs, and HR work are all completely different things, but if the movies have taught me anything, it’s that almost anything brilliant that we build can get away from us if we aren’t careful.