July 26, 2023
6 minutes
From Insights to Indicators: Leveraging Open-Ended Data to Identify Fraud in Conventional Survey Panels

Featuring: Tom Wells
Verasight Guest-Author and former researcher at Uber, Meta, Nielsen, and Knowledge Networks

In the world of survey research, data quality is paramount. Our latest article, written by our guest-author and industry expert Tom Wells, explores the challenges of fraudulent respondents and highlights the importance of open-ended responses to detect poor data quality. Tom is a survey researcher with 15+ years experience in the survey research industry and has worked at Uber, Meta, Nielsen, and Knowledge Networks. While some survey panels may struggle with accuracy and integrity, Verasight takes a different approach. Our robust methodologies and stringent quality control measures ensure reliable data collection, including open-ended responses that provide valuable insights.

Online survey panels have played a useful role in survey research over the past 25 years. At the same time, survey professionals have been aware of issues with traditional survey panels, such as professional respondents, fraudulent respondents, panel conditioning, accuracy of responses, etc.

Fraudulent respondents are a particular issue with opt-in survey panels. These panels are not based on probability sampling; rather they accept volunteers, advertise on the internet, and emphasize monetary incentives. Because of that, they can be infested with bots and bad actors, including those who don’t use a specific product or service but report they do so they can take surveys and collect incentives (sometimes multiple times).

Survey researchers take active measures to root out fraudulent responses and respondents from survey data. Conventional practices include using a combination of paradata (time stamps, IP addresses), closed-ended responses (straight-lining), and open-ended responses (gibberish). A recent Public Option Quarterly article (Kennedy et al 2021) emphasized the relative importance of open-ended responses to detect fraudulent respondents since those questions are harder for bad actors and survey bots to fabricate convincingly. Although most conventional data quality checks are quantitative in nature, if researchers only focus on those, fraudulent respondents may otherwise go undetected.

We investigate this further with two case studies, based on online surveys we conducted with specialized samples from two different well-known, commercial opt-in survey panels. We utilized conventional data quality checks, and in particular, looked closely at open-ended responses in an effort to flag and remove problematic respondents.

The first survey was conducted with 1000 self-reported US rideshare drivers. The survey itself was brief, mobile-friendly, and included three open-ended questions. Once data were collected, we compared panelist results to results from previous internal surveys with actual rideshare drivers and noticed some of the panel data looked suspicious. Namely, demographics were different, means of quantitative responses were different, and open-ended responses were different.

For example, we asked drivers a simple one-word open-ended question on what status level they had attained in a particular driver rewards program. Drivers are acutely aware of this (since it has monetary implications), so we utilized this question as a validation question. 35% of panelists were able to correctly name one of the actual levels, however, over 50% reported an extremely vague or outlandish, non-existent level.

In addition, as the final question in the survey, we asked an open-ended question about general rideshare driving experiences. Responses typically received from actual drivers tend to be lengthy, detailed, honest, and sometimes negative. In contrast, responses from panelists were shown to be generic (unconvincing as to whether they are real drivers), short (to get through the survey quickly), and overwhelmingly positive (perhaps in the attempt to receive more invitations for product surveys). Ultimately, suspicious open-ended responses along with speeding behavior led to the removal of 13% of panel cases. 

The second survey was conducted with 600 self-reported New York City taxi riders. The survey was a brief and mobile-friendly survey that included one open-ended question. Again, some of the responses received from panelists looked suspicious. In this case, geolocation data indicated disproportionately high numbers of respondents far from the New York area. In particular, 32 of 600 cases (5%) were traced back to Seattle, 2,500 miles away. In addition, the open-ended question about general experiences with taxi and rideshare rides produced many suspicious responses. These were conversational in nature, consisted of full sentences and phrases, but made no sense for the question being asked. Ultimately, suspicious open-ended responses along with location data led to the removal of 7% of panel cases, rather than only 2% based on location data alone.

Open-ended responses are useful for collecting valuable feedback from users in their own words, sometimes about issues that may have been unanticipated. This research also demonstrates their utility for detecting fraudulent panel respondents. When compared to open-ended responses typically received, fraudulent open-ended responses were shown to be either completely nonsensical or short, generic, not actionable, and overwhelmingly positive. If not analyzed critically, they may provide upwardly biased results and present an overly-optimistic picture of the user experience.

Finally, with these two case studies, data from fraudulent respondents were handled after data collection, through data cleaning and data scrubbing. A better approach is to prevent them from existing in the first place. This can be accomplished by using a good sample from the actual target population, preventing duplicates and bots from entering the survey, and using good survey design (i.e. avoiding long, boring surveys and burdensome questions).

Open-ended responses play a pivotal role in assessing data quality and uncovering fraudulent respondents. We are grateful to Tom for sharing his passion for this topic with our audience. At Verasight, we are committed to delivering trustworthy and actionable insights through our comprehensive approach to survey research. Our rigorous methodologies, transparent practices, and commitment to data integrity empower researchers to make well-informed decisions. We prioritize quality throughout the entire research process, from respondent recruitment to incentivizing honest responses and creating exceptional survey experiences. Contact us today to experience the Verasight difference.