Inside GEN, the nonprofit combatting human trafficking with big data

With assistance from Splunk, the Global Emancipation Network built Minerva - a platform that uses big data to help liquidate the operations of worldwide human trafficking rings
ID 113601896 © Alexlmx |

ID 113601896 © Alexlmx |

The vast majority of law enforcement agencies are woefully underfunded for tackling human trafficking, which according to statistics from the Global Emancipation Network (GEN) claims roughly 21 million men, women and children as victims every year, who between them are worth as much as US$50 billion to organised crime.

Law enforcement agencies have traditionally been hindered by painstaking and mentally arduous manual methods that take their toll on investigators, who are also at risk of becoming locked into for-profit solutions that are limited in their scope. They are struggling to keep pace with the worldwide networks that trade in human beings, which are shockingly open in their advertisements.

Sherrie Caltagirone is founder and executive director of the Global Emancipation Network. She has a career-long record in combatting trafficking, having previously held legislative and other advisory posts to help governments and NGOs bolster their policy approaches in countering the crimes.

But she wanted to measure the actual impact that these efforts had. After some time working at a rescue operation group then called Orphan Secure - with partners in the cyber intelligence space such as FireEye - she began to wonder why the cutting-edge techniques used in other intel fields were not being deployed to fight trafficking.

"I was doing all of the policy and intelligence work for their operations [at Orphan Secure]," she says, speaking with Computerworld UK. "They were obviously using a lot of cyber intelligence tools. On a more personal note, my husband comes from the intelligence community more broadly - so I was aware of these other methods for joining these different data points and making sense of the noise.

"I began to be really frustrated then that we weren't deploying any of the cool techniques I was seeing in other fields: image or video analysis? Why was it I was having to pull up multiple screens to look into a phone number?"


Big data analytics business Splunk's 'Splunk Pledge', part of its Splunk4Good initiative, committed to donating the equivalent of $100 million over a 10-year period to nonprofits and higher education universities, in financial contributions, licences and assistance.

The Global Emancipation Network, with Splunk's help, built a tool called Minerva - an investigative analytics platform designed for ease of use for investigators of human trafficking.

Caltagirone says the aim was to create a platform that was simple for a user to operate within a single pane of glass, to identify both victims and traffickers in a seamless way - reducing the overall time to action while also being able to measure the results of those actions.

"Traditionally nonprofits have been working in a capacity building or prevention sort of space," says Caltagirone. "Victim services, providing shelter or legal aid to victims of human trafficking - all of that is reaction - something has already happened. Or they are working on prevention, trying to make the public aware, not to fall for some Romeo's scheme on Facebook."

But a serious problem was that the data on the numbers of human trafficking was, simply, bad. The data model GEN is using helps to measure the impact of trafficking in real time, and assess whether their operations are making a difference or if they need to "take another tack".

"Whenever there's a new law that comes on or a website that has been shut down by some officials, we are able to measure the flow of the traffic from one website to another, and really drill down on the question to ask was that worthwhile," says Caltagirone. "It's an enormously informative thing. We were able to realise that just purely shutting down a website and playing whack-a-mole with the problem is not very good: it's much better to get intelligence about the users on the [web] board to bring down an entire network, before you do something like shut down the website."

Otherwise, she adds, the traffic just shifts from one place to another.

Minerva, powered by Splunk Enterprise, monitors open data as well as the dark web, including password protected spaces. All the data they can get on sex, labour, and other kinds of trafficking is "hoovered up" and then stored in Microsoft Azure, another early partner for the NGO which donates all of the cloud resources they use.

"The data is then pushed up into Splunk, where it is processed and made sense of essentially," says Caltagirone, adding that when the data arrives it's unstructured and messy. "Splunk make that job very easy."

The very technologies traffickers are using to bring their product to market, which is sadly in this case human beings, is the very same technologies that we can use against them to combat trafficking

Splunk engineers have worked closely with GEN, sitting in on weekly development meetings, checking code bases and writing the Splunk queries for the nonprofit.

"It's a very, very tight working relationship," she says. "Once the data is in Splunk we have some external enrichments we are using as well: things like image analysis programmes, because for every sex advertisement there's an average of eight photos of the victim - so we have an eight to one ratio of text and images.

"We concentrate very heavily on image analysis, and we have external APIs for things like public records, to search for information about who owns the phone numbers, who owns the addresses, and we do some things on blockchain cryptocurrency analysis."

The platform can also connect information on everything from burner phones to facial recognition to reverse image searches on the open web. Movement patterns, common modes of communication and behavioural analytics are all joined up, and Splunk Enterprise helps cut through the noise and surface the most important data.

This is all accessible via a custom user interface on Minerva which was designed to be "as easy as a Google search bar" - so no need for special queries or search strings.


Shockingly, Caltagirone says that traffickers remain brazen about their activities. They have been able to operate with such impunity that they barely bother to mask what they're doing.

"It's knowing the right places to look," she says. "It doesn't take any more than a Google search to look at an adult website. You can pull up thousands. And hidden in the noise of consensual sex workers is usually sex trafficking."

Because the traffickers have been so brazen it can be a relatively simple process to separate out the consensual workers from the victims.

One such method of identifying the pimps is building up a lexicon of the sort of coded phrases that traffickers use.

"You really don't have to look very hard at all to find it," Caltagirone says. "That's part of why it felt so terrifying. But it's also really promising that the very technologies traffickers are using to bring their product to market, which is sadly in this case human beings, is the very same technologies that we can use against them to combat trafficking."

However, there are some techniques traffickers try to use to circumvent detection, and Caltagirone anticipates that this could increase in parallel to the success of shutting these networks down.

"Already we're seeing that they are embedding less information in text within the advertisements themselves, and just putting the titles and images only," she says.

"Even within images they are starting to embed information, so we are having to do some optical character recognition techniques... and of course we are very careful about publicly talking about the methods we use, hoping to not help them anymore to know what we're doing."

With all these data points built up, GEN can use Minerva to monitor traffic flows between one website and another so that they know where to concentrate their efforts.

The NGO uses machine learning tools within the platform to monitor language as well.

Caltagirone points to a disturbing example: "Say they are using a word like 'fresh' to denote an underage victim - and this is a well-known term so I don't mind saying it - but over time they've started to shift away from the word 'fresh' and they're using other new words to denote the same thing.

"So if we're able to create the algorithms to help us identify the two words that are linked together consistently, we can put two and two together and say this other new word, symbol, unicode, or image has the exact same meaning as 'fresh'. We have really skilled ML and AI folks helping us with those sorts of problems."

Liquidating networks - a success story

The team at GEN was involved in an investigation surrounding one particular password-protected escort review forum located on the deep web, based in the USA. Using data analysis techniques they were able to identify nearly 1,000 trafficking pimps and victims in just a two-month time span.

In 2017 alone there were 100,000 victims identified, says Caltagirone - twice that of the previous year. So in that limited amount of time Caltagirone says the team was able to identify one percent of that annual total in the USA alone.

"As we continue to grow and scale you'll be able to see how effective this method is," Caltagirone says.

The organisation is already beginning to harvest data from outside the USA and Europe, starting to look at prominent websites in Asia and South America, which the team hopes will help them build a clearer picture of the world-wide networks.


The mass collection of data points has been a contentious issue, especially in recent years, underscored most publicly by Edward Snowden's NSA leaks - that revealed a global conspiracy to collect any and all data and store it in the data banks of one of America's most secretive intelligence agencies.

However, Caltagirone agrees that their work is goal and outcome-driven compared to the controversial catch-all policies of many government intelligence agencies.

"We are very careful to tread the line between privacy and security," says Caltagirone. "We do have a large sum of personally identifiable information, but there's a very overwhelming public interest in order to collect this information - and rescue victims at scale.

"We are very careful with who we expose the data to as well, so it is thoroughly vetted, and we do a lot to monitor how the users are taking advantage of the information. It's a very good example of how to [use] data in a very powerful way - but you have to do so very responsibly and take care of people and their personal information."

Caltagirone laughs when asked if the privately backed NGO is generally ahead of law enforcement in counter-trafficking efforts.

"This is the sort of question that can get me in trouble," she says. "We partner very heavily with law enforcement agencies, with government groups around the world. We're very keen to collaborate with them and put tools in their hands.

"That said, most of them are very under-resourced and under-staffed, especially if you're talking about the resources that are deliberately spent on counter human trafficking work. It's very, very little."

GEN provides Minerva and its other tools to law enforcement and government partners on a nonprofit basis.

"The tools that are even remotely similar to ours - a data-based approach to counter human trafficking - there are only two others," Caltagirone says. "One of them is free and one of them is for profit. The budgets don't really support the for-profit method at all. It's very manual for them, particularly within the image analysis space."

One partner, she adds, still manually sifts through each image.

"When you're talking about child abuse imagery that's really, really hard on people," Caltagirone says. "So the more you can automate it and reduce the amount of time a human has to spend looking at the image, it's very beneficial to people's mental health as well."

The organisation is currently exploring how it might be able to partner with companies that run airports so that they can apply their video analysis to CCTV footage - or run it by other biometrics data.

Being able to join information about tickets bought at airlines along with imagery from CCTV metrics or visa applications could further help the organisation paint a picture of the traffickers that it is targeting.  


Get the latest on digital transformation: Sign up for  CIO newsletters for regular updates on CIO news, career tips, views and events. Follow CIO New Zealand on Twitter:@cio_nz