(Image: University of Missouri)

In a major leap forward for genetic and biomedical research, two scientists at the University of Missouri have developed a powerful new artificial intelligence tool that can predict the 3D shape of chromosomes inside individual cells — helping researchers gain a new view of how our genes work.

Chromosomes are the tiny storage boxes that hold our DNA. Since each cell has about six feet of DNA packed inside it, it must be folded up tightly to fit. This folding not only saves space — it also controls which genes are active or inactive. But when the DNA doesn’t fold the right way, it can disrupt normal cell functions and lead to serious diseases, including cancer.

Jack Cheng (Image: University of Missouri)

Historically, scientists have relied on data that averaged results from millions of cells at once. That makes it almost impossible to see the unique differences between individual cells. But the new AI model developed by Yanli Wang and Jianlin “Jack” Cheng at Mizzou’s College of Engineering changes that.

“This is important because even cells from the same part of the body can have chromosomes folded in very different ways,” Wang, a graduate student and lead author of the study, said. “That folding controls which genes are turned on or off.”

Studying single cells is tricky because the data is often messy or incomplete. But the new AI tool is specially designed to work with those challenges. It’s smart enough to spot weak patterns in noisy data, and it knows how to estimate a chromosome’s 3D shape even when some information is missing.

Here is an exclusive Tech Briefs interview, edited for length and clarity, with Wang.

Tech Briefs: What was the biggest technical challenge you faced while developing this AI tool?

Wang: The central contribution of our work lies in addressing key challenges in reconstructing 3D chromosome structures from sparse single-cell Hi-C data by pioneering the use of equivariant graph neural networks (GNNs). This is the first study to successfully adapt equivariant GNNs to this domain, overcoming technical barriers that have long limited progress in the field and achieving state-of-the-art performance.

Traditionally, chromosome structure reconstruction from Hi-C data has relied heavily on optimization-based methods, with limited integration of machine learning. To date, only one previous attempt has been made to explore machine learning for this task. Our method, HiCEGNN, not only broadens the application of machine learning in this area but also clearly outperforms both classical approaches and the sole existing ML-based method, establishing a new benchmark for future research.

To address the specific challenges of this task, we introduced several key technical innovations. We engineered meaningful node features tailored for graph-based modeling of chromosomal data, and we utilized eigenvalue decomposition for 3D initialization, thereby avoiding the instability and inconsistency of random starting points. Most critically, we tackled the problem of data sparsity in single-cell Hi-C, which contains far fewer contacts than bulk Hi-C. By leveraging the symmetry-preserving properties of equivariant GNNs, our method robustly reconstructs accurate 3D structures even under these data constraints —a capability previously unattained.

We believe these contributions not only solve core technical challenges but also provide a solid foundation for expanding machine learning's role in 3D genome structure inference.

Tech Briefs: Can you explain in simple terms how it works please?

(Image: University of Missouri)

Wang: We start with single-cell Hi-C data, which tells us how different parts of a chromosome are interacting inside a cell. From this data, we create something called a 2D contact map — you can think of it like a grayscale image, where each pixel shows how strong the contact is between two regions of the chromosome. We scale all the values in this map to be between 0 and 1 to help the model learn better.

Then, we turn this contact map into a graph, kind of like a network. Each point (or “bin”) along the chromosome becomes a node, and if there’s a contact between two bins (meaning their interaction is not zero), we draw a line (or edge) between those two nodes. So the contact map tells us who’s connected to who.

Next, we give each node some features using a method called LINE, which helps the model understand the structure of the chromosome just from the contact map. We also estimate some initial 3D positions for each node using a math trick called eigenvalue decomposition — this gives us a rough idea of where each part of the chromosome might be in space.

Finally, we use a special deep learning model called an Equivariant Graph Neural Network (EGNN). It takes all this information — the nodes, edges, features, and rough 3D positions — and learns to adjust the 3D positions in a way that better reflects the real shape of the chromosome. In the end, we get a full 3D structure that’s much more accurate.

Tech Briefs: The article I read says, “The team has made the software free and available to scientists around the world. That means researchers can now use it to better understand how genes function, how diseases start and how to design better treatments.” My question is: What made you decide to make it free?

Wang: As researchers, our main goal is to make a meaningful contribution to the scientific community. We chose to make our work freely accessible because we believe it can help push the field forward and serve as a useful resource for others working on similar problems. By sharing our methods and results openly, we hope to offer insights that others can build on. Open access encourages collaboration, transparency, and innovation — all of which are key to driving progress. In the end, we simply want to support the broader research community and help create an environment where ideas can grow and move science forward.

Tech Briefs: Do you have any set plans for further research/work/etc.? If not, what are your next steps?

Wang: We don’t have any concrete plans yet, but we do have some ideas for the next steps. Building the entire genome structure means we need to establish connections across all chromosomes for a given species. To achieve this, we’re planning to construct a super graph that represents the complete genome of each species. On top of that, we intend to apply a customized E(n) Equivariant Topological Neural Network (ETNN) to these super graphs, enabling the model to be sensitive to species-specific characteristics. This direction reflects our vision for the next generation of research in this area.

Tech Briefs: Do you have any advice for researchers aiming to bring their ideas to fruition (broadly speaking)?

Wang: If you have ideas, don’t wait or hesitate to act on them. Taking action speaks louder than just having ideas. If you keep your ideas to yourself, no one else will see them, and progress will slow down. The world moves forward when people turn their ideas into reality.