Not So Stupid Cupid? Dating Sites and Data Science
by Neil Huggan
Or How Data Science is being used to find Your Perfect Match
As we approach Valentine’s Day, many people will turn to the myriad of online dating sites and apps in order to find their ‘Perfect Match’. These match-making sites are making increasingly complex use of data science to help their clients find their ideal mate, but how exactly do they turn raw data into dating success?
The big players in this space are eHarmony, Match.com, OKCupid and Tinder, the latter 3 all being owned by the same company – Match Group Inc. The CEO of Match Group, Sam Yagan, acknowledges that whilst dating sites are useful for helping find people you would – or wouldn’t – be interested in, the industry is “decades away” from predicting the essential chemistry which makes or breaks a relationship. Match group was founded in 1995 by four maths majors from Harvard University, and was built from the ground up on an algorithm, data-based approach.
The complexity of these matching algorithms varies widely across the various online dating services, with some merely being based on geographic proximity, whilst others make use of deeper metrics.
However, the use of algorithms doesn’t necessarily guarantee a love match, as multiple studies have shown conflicting results as to their effectiveness in the Real World. Some of this ‘confusion’ seems to revolve around what would constitute a ‘match’, and also regarding what relationship success looks actually looks like today.
Even within the industry itself, there’s a range of opinions on the effectiveness of individual dating services’ algorithms, with the CEO of eHarmony, Neil Clark Warren, accusing Tinder of relying only on “…superficial, almost accidental compatibility. Compatibility is a serious matter, and it’s very deep and very important to figure out.”
Dating Data – eHarmony’s Approach
In a 2014 presentation at MongoDB World, “Big Dating at eHarmony,” Thod Nguyen, eHarmony’s CTO, revealed the key components that helped create the unique eHarmony offering:
- Compatibility Matching Processor (CMP Application) — the CMP creates around 3 billion potential matches per day, derived from almost 25 terabytes of user data in their matching database. This supports more than 60 million queries daily, searching across more than 250 attributes. The system stores and manages over 200 simple criteria, including a million photos with more than 15 terabytes of data in photo storage. They also deal with more than 4 billion relationship questionnaires!
- Compatibility Matching System (underlying CMS models) — eHarmony’s complex, 3-tier “secret sauce”:
- Compatibility – derived from 29 dimensions of personality and psychological traits. Employs a sophisticated bidirectional system to ensure that user preferences are met in both directions. Based on simple criteria, such as age, distance, religion, ethnicity, income, or education and employment, plus more sophisticated personality traits harvested from questionnaires.
- Affinity – predicts the probability of communication between two people.
- Match distribution – delivering the right matches to the right user at the right time; as many matches to the right user at the right time; and as many matches as possible across the entire active network.
In summarizing eHarmony’s approach, Nguyen noted that their, “…CMS Models are the ‘secret sauce’ and created by running complex multi-attribute queries to identify potential matches for the client. We only retain the candidates where the criteria are met both ways, bidirectionally. As a second step, we take the remaining candidates, and we run them through a slew of compatible models that we have accumulated over the last 14 years. Only those candidates who pass the threshold set by the CMS models are retained and positioned as potential compatible matches for the client.”
eHarmony’s use of technology, maths, algorithms and data are, on the face of it, not a particularly romantic way of finding love. However, we’re increasingly living in a world where making use of the right data and the right algorithms can help make your quest to find Your Perfect Match that little bit easier. In 2017, Cupid may not so stupid after all.
How are you using data to help your business connect with customers? We’d love to know! 🙂
Warning!! For Techie Eyes Only…!
<jargon>Nguyen also gave an insight into the programming languages eHarmony uses: “We use a lot of Scala. I’m sure a lot of you know, as a functional programming language, to implement our CMS and affinity matching models. We also use a lot of Hadoop. And with Hive, we also started exploring Spark as the interactive data analytics on top of YARN for massive data mining and data processing. And we also use a lot of R … R is a revolution as the programming language for predictive analytics in our machine learning models. Additionally, we use a lot of Node.js with HTML5 to implement our public-facing eHarmony web applications for both the mobile web and the desktop and a slew of other technologies that we’re using right now.”</jargon>