Predicting the Unpredictable

We now know black swans exist, but Europeans once believed that spying one of their kind would be like stumbling across a unicorn in the woods—impossible. Then, Willem de Vlamingh spotted black swans in Australia, and this black bird, which once represented the impossible to Europeans, shifted to represent the unpredictable. One company now dons the name “Black Swan.” Find out how it aims to predict what we currently consider to be unpredictable.

Transcript

Ginette: “Submerse yourself in early 1600s London culture for a minute. Shakespeare’s alive and in his late career. The first permanent English settlement in the Americas just happened. Oxygen hasn’t been discovered yet. But a lesser known cultural idiosyncrasy has to do with a large white bird, the swan. In Europe, the only swans anyone had seen or heard about were white, so of course, in their minds, a swan couldn’t be any other color. From this concept, a popular saying develops, originally stemming from a poem. You use it when you want to make a point that something either doesn’t exist or couldn’t happen. You’d say something like this: ‘you’re not going to find out because it’s about as likely as seeing a black swan,’ meaning that, that thing or event was impossible.

“But then a discovery blows everyone’s minds. Dutch explorer Willem de Vlamingh is sent on a highly important rescue mission. A lost ship with 325 people on it probably ran aground near Australia, and they needed him to go rescue these people and the goods on board. While Willem and the three ships under his command go and search Australia for this lost ship, they find lots of fish; unique trees; quokka, a cat-sized kangaroo-like creature; and . . . black swans. This last discovery inevitably permanently shifts the meaning of this saying. After this, people start using it more to say when something’s highly unlikely or an unpredictable moment.

“Now this concept of an unpredictable moment is why Steve King named his company Black Swan, because they predict the seemingly unpredictable.”

Ginette: “I’m Ginette.”

Curtis: “And I’m Curtis.”

Ginette: “And you are listening to Data Crunch.”

Curtis: “A podcast about how data and prediction shape our world.”

Ginette: “A Vault Analytics production.”

Steve King: “I am Steve King; I’m the CEO of Black Swan. Black Swan is 250 people who focus on trying to predict consumer behavior using data science, artificial intelligence, and big data. We have lots of large clients. We mostly work with big companies that have big problems to solve. Our work sort of splits across the US and the UK. Black Swan is absolutely full of stories. A lot of the work we really do is finding a hard problem that no one’s really solved before and then using data science to crack it, but there always quite interesting stories because, you know, they’re stories of a little bit of adventure, luck, and skill.’”

Ginette: “The UK’s Sunday Times has consistently placed Black Swan on its lists: in 2014, it was on the ‘Ones to Watch’ list in its Tech Track. In 2015, it was ranked number one on the Start-Up Track. And in 2016, it was ranked number one in the Export Track 100, because it had the fastest growing international sales for the UK’s small to medium enterprises.

“So what’s the secret sauce to the rapid growth and success of Black Swan, a company that solves problems for large companies in many different industries? It turns out, they aim to be better than anyone else at accessing and crunching a specific datasource.”

Steve: “The reason we’re quite broad is it actually sits on one simple idea, and the simple idea really is that the Internet is really the world’s biggest data source, and we call, we call the Internet the world’s biggest focus group. So pretty much every opinion of a consumer or the open data that governments are laying out is all there for you to consume, but the, the trick is can you consume it in a way to help you find patterns so you can make predictions, so the one thing that Black Swan does really, really well is access the Internet for all its data, and then using data science techniques, understands it very well in order to make good predictions of what to do next.”

Curtis: “So what types of mysteries can Black Swan solve with this infinite source of information? They’ve been able to solve unique cultural ones, they’ve been able to help treat illness in healthcare, and Steve was even able to harness his company’s technology to help prevent a family tragedy. First, let’s talk about barbecues.”

Steve: “Actually, one of my favorite projects is a very UK-based project, and to some listeners this will probably sound incredibly dull, but in the UK, we love barbecues. There are unfortunately, due to our weather constraints, only five or six opportunities to have a barbecue in the UK every year, if we’re very lucky, and retailers around the UK have an interesting problem because it’s not just about having sunny weather. Yeah, of course, you know, having sunny weather is a bit of a standard you need to have to have a barbecue, but there also seems to be this idea of when people are ready to have this first big bbq of the year, so meat sales double for one of this weekend in a year over any other weekend, and it’s a really interesting problem for retailers because, essentially, you know the weather is going to be good because you get your weather prediction which is reasonably accurate on the Tuesday, but you also have to order your meat on the Tuesday night. So, you know you’re going to have a sunny weekend, but you have no idea whether it’s going to be that weekend when everyone is ready.

“So we wrote some technology with a retailer in the UK, which essentially looked at people’s propensity to want to have a barbecue that weekend, if I could put it into a sentence. And it was really interesting because it scanned social media, so twitter is a really good indicator. People even saying they’re not going to go to work on Monday because they intend to have a big one on the weekend is another big indicator as well. Weather is obviously a huge indicator. Sort of, Google search and shopping patterns from the past are, you know, really big indicators. We managed to get five years of sales data from a few retailers, and we overlaid them then with this noise from the Internet, like the propensity to want to barbecue, and we began to see a groundswell that would happen on the Tuesday before the weekend for people who were ready, and when you overlaid that sunny weather with the trigger for people being ready to barbecue, you can actually guess within 96 percent accuracy of how many burgers you’re going to sell on a Saturday the Tuesday before.”

Ginette: “To further illustrate this point, imagine that you’re going to have a big dinner party at your house some Saturday this month, but you don’t know which Saturday people are going to show up, and you don’t know how many people are going to come. So how do you prepare? If you choose the wrong Saturday to prepare for this party, a ton of people could end up on your doorstep, and you’ll have nothing to feed them. That’s how these retailers feel about barbecues in the UK.”

Steve: “It’s obviously a huge supply chain relief, because if you order on a Tuesday and you’re wrong, you’ve either got far too much stock, you know, a lot of dead meat, um, enough lettuce to go around the M-25 of London twice, which isn’t going to be used. If you get it wrong, and you haven’t got enough stock, you actually break supply chain for weeks and weeks after because they can never really keep up. Of course customers get quite upset when they walk in and your shelves are empty. So it’s a real, real problem, but, like, kind of a funny solution.

“What’s interesting to note about it, and probably all of Black Swan’s stuff is that marriage with Internet data with our customer’s own sales data, so we can see the correlation, so we can see the patterns working together, is really, really important. So the first time we do a big project like this, we always need to work closely with our customer to train the model to understand what’s a sales event, you know, on top of what the triggers are we find on the Internet.

“There’s a lot of human skill in working out what features are going to be put into the model at the beginning. Once you’ve done that once, it’s pretty much the same features every single time, so you get to learn what the patterns are.”

Curtis: “So the Internet can be used to calculate the propensity of people in the UK barbecue on any given weekend, to the great relief of many retailers—and also the people barbecuing, because now they know they can get enough meat at the stores when they want to have a barbecue. But their ability to use the Internet doesn’t stop there. In a project involving vitamins, they actually learned how to use mommy blog comments and Twitter comments to ultimately figure out how to predict when the flu would hit specific towns—and they predicted it better than Google Flu Trends.”

Steve: “My, my other joint favorite project then is actually on healthcare. We were working with GSK.”

Curtis: “GSK stands for GlaxoSmithKline, a large pharmaceutical company.”

Steve: “And I can talk publicly about this because it’s won a couple of awards, and we were helping them with vitamin sales. So if you imagine vitamins are reasonably difficult to get excited about. The line of sales is pretty flat. You know, there’s not really many events that do it. But one of them that tends to do it is actually, you know, cold and flu. When people realize they are coming down with something, they tend to, you know, suddenly want to eat healthy and get a little bit better, so it’s a time when people are a little more likely to commit themselves to, you know, vitamins and minerals. Um, so with that little brief, we started looking at some work, and Google had done some interesting work saying they could predict cold and flu, which actually turned out to be wrong, they actually got it wrong, but the idea was absolutely right.”

Ginette: “What Steve’s talking about is Google Flu Trends. You may or may not remember the massive waves that Google created in 2009, when it published an article in the highly influential journal Nature. Google claimed it could accurately predict flu in each region of the United States on a weekly basis with a lag time of one day. In 2013, it turns out these predictions started to fail, and so they stopped publishing predictions, but while Google didn’t get the execution of the idea right, Steve says Google had the right concept, and it turns out, Black Swan figured out a better way to approach it.”

Steve: “So we went around, and we got a load of prescription data from the UK, which is public data. We got a load of sales data from GSK our clients on when they are selling cold and flu products, and vitamin products and the like. We then overlaid it with again Twitter data, blog data for things like mum’s net or net mums where people were saying, ‘oh, my daughter’s coming down with something.’ And really interesting then, we were able to produce a model that was able to locally predict when cold and flu would hit four days before any other local authority in the world.

“And that model then was interestingly put to use by being plugged into some of the largest ad servers in the world, and essentially what we did is when we saw cold and flu coming towards a town, we would take the existing advertising spend and spend in that town like crazy. And by using APIs, we’d then sort of up the spend, and if you can imagine the creativity, it was like this really scary, scary kind of cloud of dark flu coming towards a picture of your town with the percentage of likelihood of it hitting it. Your children underneath screaming and running away from it, and then we sell vitamins, which was good. And that showed, that showed as you’d expect, the return on investment on that advertisement was well over 100 percent more than had been previously, so that model’s been incredibly successful.”

Ginette: “But the story doesn’t end with them merely using a clever algorithm to sell vitamins to consumers. It’s now being used in the UK’s A&E, which stands for ‘Accident and Emergency.’”

Steve: “For me it talks about how algorithms, you can be commercial with them, you know, because you need to make money, you need to live, but how easy they are to turn around and do something a bit different. So that same algorithm with permission from GSK, we then approached the NHS, which is the UK’s healthcare service, which is always . . . hasn’t got much money. If you go into accident emergency in the UK, expect to be there for three hours at least before you get seen by a doctor.”

“Now, interestingly, cold and flu, if you come down with influenza, and you’re 70 years old, you’re told to go straight to A&E because it’s incredibly dangerous for that kind of age, but of course, what’s actually happening is you’re going to the end of a three-hour queue. Now this algorithm is being rolled out across the NHS services now, so they can know when cold and flu is hitting to allow them to get specialist nurses, which don’t necessarily need to be expensive doctors who are super trained, but people who can know there is going to be a high volume of old people coming in with influenza so they can be treated very, very quickly indeed. And that’s exactly the same algorithm that we built and trained with GSK, which is now being used for the NHS for free for a much better use than persuading people to sort of down vitamins.

“I just love this story because (A) it was great that our customer was willing to come on that journey with us to build it in the first place, but then to be so open to say, ‘right, okay now we’ve achieved some use from this prediction that we’re able to you know, use it for some good.’ And I think that when we look at the jobs we do as data scientists and small companies, it’s really important to keep your eye open to make a difference of a positive nature, you know, not just a commercially positive nature.”

Curtis: “One of the things we really liked about interviewing Black Swan is this emphasis they put on helping people and using their algorithms to make a positive impact in the world. And this is a great example of how they’ve done that in healthcare to help people get the care they need as quickly as possible. And on top of that, I also think it’s a fascinating story because one of the most interesting parts of data science is the creative nature of it, and this is a great example of how they creatively figured out how they could use things like mommy blog comments to predict valuable things. In the next story Steve’s going to share, this idea of helping and creativity actually come to bare in a very personal way.”

Steve: “My sister, about 12 years ago, started to get quite sick. And unfortunately she went from being really, really active in her 20’s to just a year and a half ago, we were told that she wasn’t going to make it through the summer.

“Essentially, what would happen is she’d start the day off pretty well. She’d be up on her feet and everything, but within 20 minutes, she was falling over, and within an hour, she was in the wheelchair for the rest of the day. You know, by the evening, she couldn’t feed herself. We’d need to take her to the toilet, and it would just go round day-by-day, right. And then she’d be up the next day, and then she’d be in the wheelchair, and then you’d have to see her every single day, um, and the doctor said, ‘eventually she’s going to swallow her tongue because she has no control over her bodily functions.’

“We just felt so desperate we wanted to do something.”

Ginette: “What Steve ended up doing is using some advanced Natural Language Processing techniques that his company previously developed. For anyone interested, we’ll dive into more details at the very end of this episode after the credits. Now back to the story.”

Steve: “I’m a useless coder, but I was able to use the tools we put together in enough of a way that we built an NLP, which allowed us to find places on the Internet where people were having these kind of conversations, so rather than looking for keywords, we were looking for features of Julie’s, my sister’s, problem.

“What we were able to do, interestingly, was we were able to find a load of newsgroups where people were having a very similar conversations, which you wouldn’t have found without using this search technique. And we took a load of data for stats, how she was for the day, what she ate through time, and her personal story about how she felt, and then for each one of the blogs, which there were well over 10,000, I wrote a script bot, which then put this together as a package, and it posted it in every single thread I could find where people were having similar conversations.”

Ginette: “Some of you will already know what a bot is, but for those who don’t, a bot is basically code you write to do repetitive tasks for you over the Internet. In this case, instead of Steve having to go find every single website that mentioned the features of his sister’s illness, he had his bot find them and post a package of information in the comments section.

“So after the bot lays down all this information on each website, people start responding.”

Steve: “And the response was amazing. So we had over 45,000 responses. People were trying to help. They’d watch the video, they’d read the story, and they’d seen someone like it or something like that. So what we then did is we knew we had a list of rare diseases which we had managed to get from the NHS from earlier work we’d done,”

“And we looked at the amount of mentions for diseases mapped over their frequency. You know, so how rare they were against the frequency they were mentioned by people. And that actually boiled it down to four different things we’d never seen before, even though the doctor had thought of everything and kind of given up.

“So I remember rumbling into the doctor. I showed him this paper with four, four rare diseases and explained what I’d done, and he kind of like looked at me. So, anyway, he was brilliant, right. He took her onto a specialist for a few things. The first thing he took her onto was a thing called Parkinson’s Dystonia, which is what Michael J. Fox suffers with. And he sent her for a test in a nearby town. You know, we come from a small village in, in Wales. We went up to the local city, and had this test.

“I went back up to London then, and I was on the train to work, and I had a phone ringing, and this is just two days after I’d left, and it was my sister, and I couldn’t put my finger on what it was, and it was just, her voice was so clear, because normally she starts to really slur after an hour or two in the day, and I was like, ‘what’s going on.’ And she’s like, ‘it’s me, I’m, I’m walking around. I’ve been up for three hours. I’ve cleaned the house, I’ve ironed all the shirts, and I’m still going.’ I had to get off the train. I balled my eyes. It was Parkinson’s Dystonia, you know. It’s, it’s um treatable with drugs. Unfortunately, I’m having to run a marathon with her in the summer because she’s so fit and healthy, so it’s not a totally happy story for me.”

Ginette: “This story of Steve helping discover his sister’s illness is amazing. It’s an incredible example of data science at work in powerful ways. These three increasingly impressive examples exemplify how big data can help predict what is seemingly unpredictable, which is what Black Swan aims to do.”

Curtis: “This experience with Steve’s sister inspired him to start White Swan, the nonprofit arm of Black Swan, where Black Swan volunteers use their talents to help the world in similar ways.”

Steve: “There’s 50 volunteers in Black Swan who volunteer their time in the evenings and weekends to use our tech and algorithms which we agreed to our customers to use for the power of good for either no cost or at least affordable cost to try and make a bit of a difference.”

Ginette: “We were really impressed with Black Swan. If you were too and want to know more about them, go to www.blackswan.com.

“As always, go to www.vaultanalytics.com/datacrunch for our show notes, links to some of our sources, much of our music, and leaving comments. Also, if you like what we do, please go to iTunes and leave us a review, it really helps other people to find the show. If you have feedback about the show or have ideas of topics you would like us to cover, you can either send us a tweet @vaultanalytics or send me an email at ginette@vaultanalytics.com (that’s G-I-N-E-T-T-E @ vaultanalytics.com) And a huge thanks to Steve King for taking the time to speak with us!”

Our next episode is an important one, covering a topic that everyone should be aware of and how they individually can fight it. We’ll be interviewing someone who is working to educate people on modern day slavery and eliminate it from existence using art and data science. This will be one you should definitely share with your friends—the more awareness this topic has, the better we can fight it.

“For those who want to know more about what Steve did with word vectors, Curtis is up next to explain it.”

Curtis: What Steve means by looking for features instead of keywords is worth a brief explanation because it highlights a powerful way to analyze text using a method called word vectors.

“Let’s contrast this method with something you do all the time: searching for a simple keyword. Searching for a keyword is easy—it’s what you do if you are looking for a specific word in a document. You want to find the word ‘Mesopotamia’ in the article you’re reading, you just type it in the find field, and bam, ‘Mesopotamia’ shows up, highlighted in yellow or green everywhere. But what if you don’t know the word for ‘Mesopotamia’? Or what if you have a pest in your attic, and you only know how to describe the noise it makes and the sawdust it leaves, but you don’t know exactly what it is? This is one application where a direct word search might fail you and word vectors can help.

“Here’s what a word vector looks like: Imagine for a second a ginormous spreadsheet. this spreadsheet’s job is to act as a key for the computer. It tells a computer what something is and what it isn’t by ranking that thing against a list of characteristics. In the first column, you see a list of characteristics, each one listed in a separate row, words like ‘animal’ and vegetable’ as topics. At the top of the next column, you find the word ‘llama.’ Now you go down the column and compare it to these characteristics. Is the llama an animal? Yes, so it scores high in the animal category. The second characteristic on this list is ‘vegetable.’ Is the llama a vegetable? No, so here llama scores low because a llama clearly isn’t a vegetable. And let’s say the third characteristic is ‘fuzzy and lovable’, in which case, ‘llama’ scores high again. This keeps going on for hundreds or thousands of rows in tons of categories. Now let’s contrast a llama against a ‘pit viper.’ The pit viper would score high for ‘animal,’  low for ‘vegetable’, and low for ‘fuzzy and lovable’. So in this case, a pit viper differs from the llama in the ‘fuzzy and lovable’ category.

“As you go through this list and see how each word ranks in each category, you would start to see some patterns and get a pretty good idea of what llamas and a pit vipers actually are, how they are the same, and how they’re different. Then, even if you can’t remember the word for ‘llama’ or ‘pit viper,’ you could do a search for their characteristics online with their word vector, and probably find a bunch of web pages talking about what you are looking for. When word vectors are

“To save yourself a ton of time, you could even tell the computer to read through something like Wikipedia and create it’s own list of characteristics for various words based on that information.”

Ginette: Now you know so much more! See you next time.”

Some Sources:

https://www.pehub.com/2016/07/3342292/#

http://www.blackswan.com/blog/black-swan-tops-sunday-times-sage-start-up-track-15/

http://techcitynews.com/2016/07/07/black-swan-raises-6-2m-accelerate-company-growth/

http://venturebeat.com/2016/03/21/data-science-firm-black-swan-lands-4-3-million-for-u-s-and-japanese-expansion/

http://www.wired.co.uk/article/steve-king-black-swan-ai-data

https://techcrunch.com/2016/07/07/black-swan-data/

https://en.wikipedia.org/wiki/Black_Swan_Data

http://www.standard.co.uk/business/entrepreneurs-how-black-swan-data-turned-a-backofabeermat-idea-into-a-sales-force-to-be-reckoned-a3314461.html

http://www.devonlive.com/black-swan-data-open-new-regional-office-exeter/story-22898594-detail/story.html

http://www.growthbusiness.co.uk/uk-data-science-startup-secures-3m-in-japanese-investment-2511541/

http://blog.algorithmia.com/introduction-natural-language-processing-nlp/

https://en.wikipedia.org/wiki/Natural_language_processing

https://www.cnet.com/how-to/what-is-a-bot/

http://www.recode.net/2016/4/11/11586022/what-are-bots

https://www.techopedia.com/definition/24063/internet-bot

http://www.privateequitywire.co.uk/2016/03/21/237637/black-swan-secures-major-gbp3-million-investment-led-mitsui

http://www.fasttrack.co.uk/league-tables/tech-track-100/ones-to-watch/?leagueyear=2014

http://www.fasttrack.co.uk/league-tables/sme-export-track-100/

http://www.fasttrack.co.uk/league-tables/sme-export-track-100/league-table/?leagueyear=2016

https://www.blackswan.com/blog/wp-content/uploads/2015/11/Sunday-Times-Sage-Start-up-15.pdf

http://mdanderson.libanswers.com/faq/26159

https://www.quora.com/What-is-the-minimal-impact-factor-that-an-academic-journal-can-have-for-you-to-consider-it-a-good-venue-for-your-research

http://curt-rice.com/2013/02/06/why-you-cant-trust-research-3-problems-with-the-quality-of-science/

http://www.nature.com/news/when-google-got-flu-wrong-1.12413

https://www.wired.com/2015/10/can-learn-epic-failure-google-flu-trends/

https://www.google.org/flutrends/about/

https://www.forbes.com/sites/stevensalzberg/2014/03/23/why-google-flu-is-a-failure/#58624a085535

https://en.wikipedia.org/wiki/Black_swan_emblems_and_popular_culture

https://en.wikipedia.org/wiki/List_of_Latin_phrases_(R)

https://en.wikipedia.org/wiki/Black_swan_theory

https://en.wikipedia.org/wiki/The_Black_Swan_(Taleb_book)

https://en.wikipedia.org/wiki/Juvenal

https://en.wikipedia.org/wiki/Willem_de_Vlamingh

https://en.wikipedia.org/wiki/Quokka

https://en.wikipedia.org/wiki/1600s_in_England

https://en.wikipedia.org/wiki/Charter_of_1606

https://en.wikipedia.org/wiki/Jamestown,_Virginia

http://www.shakespeare-online.com/keydates/playchron.html

https://en.wikipedia.org/wiki/Roanoke_Colony

Music and Sound Sources

Music

“Angevin B” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

 

“Thatched Villagers” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

 

“Scheming Weasel Slower” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

 

“Sneaky Snitch” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

Sounds

http://www.freesound.org/people/nebulousflynn/sounds/269089/ by nebulousflynn in https://creativecommons.org/licenses/by/3.0/

http://www.freesound.org/people/soundbytez/sounds/111074/ by soundbytez in https://creativecommons.org/licenses/by/3.0/

20140525_attackloop_78BPM_4/4_Analog_2_Kit_ConstructionKit » 20140525_attackloop_78BPM_4_4_Analog_2_Kit_mixdown7.flac by Jovica in
https://creativecommons.org/licenses/by/3.0/