IMAXT’s Nic Walton: uniting across disciplines to share learnings in data science

29 July 2021
data workshop

On the surface, astronomy has little overlap with bioinformatics, mutational analysis and other areas of cancer biology. But, as described by IMAXT’s Nic Walton, an astronomer based at the University of Cambridge, there’s much to learn from how different disciplines approach and overcome similar data science challenges.

In July 2021, Nic and other members of the Cancer Grand Challenges community held their first data analysis workshop – an opportunity to present computing and software approaches, identify common issues and discuss future ways to collaborate. Here, Nic describes the value of uniting across discipline and his takeaways from the workshop.  

My research focuses on the late stage of stellar evolution, and the structure and formation of the Milky Way more generally. I’m heavily involved in the European Space Agency’s Gaia mission – currently mapping the Milky Way in exquisite detail, providing highly accurate positions and motions of nearly 2 billion stars in our galaxy. With these measurements we can determine the chemical make-up and other properties of stars and build an annotated, 3D galaxy map – which we can use for a wide range of science, from mapping our local environment, to determining the birthplace of our sun, to learning how our galaxy has accreted many smaller local galaxies over its lifetime.  

This may seem a long way from cancer research, but the Gaia mission and the IMAXT programme share many similar aims and ambitions. Both provide annotated 3D maps – for Gaia it’s mapping the stars, their motion and chemistry; for IMAXT it’s mapping tumours at subcellular resolution, molecularly annotating each cell to determine their nature, their interactions, their microenvironment, to gain insight into cancer evolution.  

Many of the challenges we experience when analysing and integrating the huge datasets from Gaia and other large sky surveys are directly applicable to the data challenges presented in IMAXT. Over the past four years, we’ve found novel ways to overcome some of these challenges, including constructing a sophisticated system to integrate a wide range of imaging data, enabling us to build our 3D tumour atlas, and an innovative VR-based system to allow the immersive visualisation of these complex 3D datasets.  

Similarly, the 6 other Cancer Grand Challenges teams are making great progress against their respective challenges, learning valuable insights and building a range of systems to both handle and analyse their data. Prompted by our collective progress against shared challenges, we decided that now is an excellent time to start sharing knowledge between teams – leading to the first data analysis workshop.  

Uniting across discipline to resolve common obstacles

This was the first time we’ve brought together those across the Cancer Grand Challenges teams who are interested in data and software, representing an opportunity to present different approaches, discuss specific techniques and challenges, and explore important topics: data infrastructure, image analysis, machine learning and AI, and FAIR principles.  

What’s interesting is that we’re all addressing a range of cancer challenges, from a range of angles – but we clearly share common issues, including describing data (with relevant metadata), with associated complexities in discovering data more widely and possible re-use. Several of the teams have also developed advanced workflow systems in implementing their data analysis – something others may find interesting to learn from.  

Looking ahead, we’re keen to build on our discussions and will investigate how we can better share our data science products with the wider research community, and how we could support the scale-up of data infrastructure needed to meet the increased generation of experimental raw data. We’ll also explore the use of machine learning techniques, for instance in segmentation and feature characterisation, across teams.  

An enthusiasm to tackle cancer’s toughest challenges 

This was the first network meeting bringing together a subset of Cancer Grand Challenges teams outside of the annual summit – and I'm pleased that it was very successful. Organising a workshop across multiple time zones will always have its challenges, but I was delighted to see 80 participants registered and calling in from Canada, East and West coast US, UK and Europe. A number of people in California even joined at 6AM local time – a great example of the Cancer Grand Challenges community’s enthusiasm when it comes to finding solutions and making progress against some of the toughest challenges in cancer research! 

With thanks to the organising committee (listed below), and all the workshop speakers and participants.  

  • Eduardo Gonzalez, IMAXT (Institute of Astronomy, University of Cambridge) 
  • Tristan Whitmarsh, IMAXT (Institute of Astronomy, University of Cambridge) 
  • Jon Teague, Mutographs (Wellcome Sanger Institute)  
  • David Gibbs, STORMing Cancer (Institute for Systems Biology)  
  • Yuqi Tan, STORMing Cancer (Stanford University)  
  • Curtis Huttenhower, OPTIMISTICC (Harvard TH Chan School of Public Health)  
  • Brian Menegaz, PRECISION (Baylor College of Medicine)
  • Tycho Bismeijer, PRECISION (The Netherlands Cancer Institute)  
  • Misha Sheinman, PRECISION (The Netherlands Cancer Institute) 
  • Alexander Dexter, Rosetta (National Physical Laboratory, London) 

As told to Emily Farthing. Learn more about the IMAXT team.