skip to main content

Sharing new political research data

DIGISURVOR – Linking digital footprint data with survey data

The DIGISURVOR project, supported by the Smart Data Research UK accelerator scheme, explores the technical, scientific and ethical challenges to making valuable new sources of sensitive smart data available for open research.

The challenge

Understanding what people think about politics has always been tricky. Traditional surveys ask questions like “Do you trust the government?” or “What news do you watch?”, but people don’t always remember well or answer honestly.

Today, researchers have access to something much richer: smart data. Actual records of what people post on social media and which websites they visit. This data reveals real behaviour, not just self-reported opinions.

But combining someone’s survey answers with their browsing history creates highly sensitive data that could easily identify them. This can’t be shared and therefore isn’t generally useful to the research community.

The DIGISURVOR project, led by Professor Rachel Gibson at the University of Manchester, is working to solve this dilemma.

What they’re doing

The team is developing methods to transform sensitive linked data (surveys + smart data) into anonymised, shareable datasets that protect privacy while preserving research value.

The data they’re working with

The project uses two main types of linked datasets collected by YouGov during recent elections:

  • Twitter/X feeds linked to surveys from the 2020 and 2024 US Presidential elections
  • Web browsing histories linked to surveys from the 2022 UK elections

The datasets were originally designed by political communication scholars to answer questions like: What political content do people consume online? How does this affect their political views and voting behaviour?

Their approach

Instead of sharing raw data such as original tweets or lists of the ” 100 specific websites a respondent visited the team are generating standardised, anonymised variables that capture behaviours and attitudes of substantive interest, i.e. the ideological outlook of the news sites a respondent visited or whether they posted content on a particular topic of political interest (such as “User consumed moderate amounts of left-leaning news content”).

The project is divided into two main phases :

Phase one

Proof of concept – the team is engaged in showing it’s possible to extract useful patterns from social media and browsing data, such as how much political content someone views or what type of news sources they prefer, without revealing who they are and linking these insights to their survey responses.

Phase two

Next, the team tests the value of these combined datasets and examines how different data sources might introduce bias or help correct for each other’s limitations.

Why this matters

For researchers

The opportunity to expand conventional election studies to include a range of anonymised smart data from social media feeds and browsing histories opens up an exciting new research agenda. The chance to build a more nuanced picture of contemporary political and social attitudes and behaviours.

Traditional surveys are limited by what fits on a questionnaire and what people remember or choose to report. Adding anonymised online behaviour data would allow researchers to:

  • Compare stated opinions with actual behaviour
  • Track real-time reactions to political events
  • Build more accurate pictures of how people form political views

Currently, the sensitivity of these linked datasets and the potential for respondent identification mean that it is not possible for them to be archived in their ‘raw’ form for secondary analysis. Where they have been made available for re-analysis, only the most basic meta data are reported.

DIGISURVOR aims to bridge this gap by demonstrating how a wide range of substantively interesting but anonymised variables can be extracted from individuals’ Twitter and web browsing activities and augmented to survey responses and made accessible to the broader research community.

For society

Improving our understanding of public opinion isn’t only of academic interest. It brings with it the potential to improve people’s lives. When policymakers can more accurately gauge public reactions to policies and understand the information environments that shape opinions, they can make more informed decisions that genuinely reflect citizens’ needs and concerns.

Resources

The team is working to create:

  • Scientific guidelines outlining best practices for sharing combined survey and online data
  • Recommended variables that can be reliably extracted from social media and browsing data
  • Open-source code for generating these variables
  • Ethical protocols co-designed with the research community for sharing sensitive data

All resources will be freely available on the project’s GitHub site.

Building a community

In February 2025, the team hosted an international workshop where researchers facing similar challenges came together to discuss common problems. A follow-up workshop is scheduled for early 2026 to share findings and refine best practices.

The goal is to ensure these valuable research methods don’t remain locked away in academic silos but become accessible tools for understanding political and social behaviour in the digital age.

Social media workshop at the University of Manchester, Feb 2025

Get involved

Researchers interested in joining the DIGISURVOR community can contact Professor Rachel Gibson at the University of Manchester or visit the project’s GitHub repository for updates, resources, and code templates.

We spoke to Conor Gaughan and Marta Cantijoch from the team during the Digital Footprints Conference 2025:

Smart Data Research UK would love to hear about your research!

Do you have a smart data case study to share? Contact our team at smartdataresearch@ukri.org to showcase your work.

Newsletter Sign Up

Sign up to receive our latest news updates