|
My ICLR 2025 Experience: From Computation Trees to Singapore's Skyline
Published: April 2025
Attending ICLR 2025 in Singapore felt like stepping into the beating heart of the machine learning research community.
As a second-year undergraduate from IIT Delhi, walking through the Singapore EXPO from April 24-28 and seeing my poster
displayed among the world's best research was nothing short of surreal.
Our Research: Bonsai - Redefining Graph Condensation
Our paper, "Bonsai: Gradient-Free Graph Condensation for Node Classification," addressed a fundamental scalability challenge
in graph neural network training. Traditional graph condensation methods suffer from a critical flaw: they require training
a full GNN on the original dataset to extract gradients, defeating the very purpose of condensation.
Bonsai introduces a novel gradient-free methodology that constructs condensed graphs by selecting representative computation trees.
These trees capture how GNNs propagate information during message-passing operations. By leveraging the Weisfeiler-Lehman kernel
to measure structural similarity between computation trees, we developed a greedy selection algorithm that identifies exemplar trees.
The results were compelling. Across seven benchmark datasets including Reddit, PubMed, and OGB-ArXiv, Bonsai consistently
outperformed state-of-the-art baselines while being at least 7x faster in condensation time. The method requires no GPU acceleration,
consumes 17x less energy, and generalizes seamlessly across different GNN architectures like GCN, GAT, and GIN.
Special Encounters and Unexpected Connections
One of the most memorable moments came when I noticed Prof. Michael Bronstein
and his group positioned right across from our poster #186. After a brief conversation, he walked over to examine our work on Bonsai.
Standing there, explaining our gradient-free approach to one of the pioneers of geometric deep learning, felt like a defining moment of the conference.
The industry presence was overwhelming in the best possible way. Corporate booths from Jump Trading,
Citadel, Jane Street,
Google, Meta, and dozens of other companies
created a vibrant ecosystem where academic research met real-world applications.
Singapore: The Perfect Backdrop
Between sessions, Singapore revealed its wonders. Gardens by the Bay provided a surreal backdrop for evening strolls, where conversations
about neural architecture search continued under the towering Supertrees. The marriage of technology and nature in that space felt
symbolic of the conference itself.
ICLR 2025 was more than a conference; it was a glimpse into the future of artificial intelligence research. As I returned to Delhi,
I carried with me not just memories of presentations and conversations, but a transformed understanding of what it means to be part
of the global AI research community.
Authors: Mridul Gupta,
Samyak Jain, Vansh Ramani,
Hariprasad Kodamana,
Sayan Ranu
Paper: Bonsai: Gradient-Free Graph Condensation for Node Classification
Code: Available on GitHub
|
|
Understanding Graph Distillation: A Comprehensive Guide
Published: March 2025
Graph Neural Networks (GNNs) have revolutionized how we process relational data, but training on massive graphs
can be computationally prohibitive. Graph distillation emerges as a powerful solution to this challenge.
What is Graph Distillation?
Graph distillation seeks to create a smaller, high-quality version of a graph that retains essential information—allowing
for faster, more efficient training without needing extensive computational resources. Think of it as creating a
"summary" of your graph that maintains the learning capacity of the original dataset.
The Problem with Existing Methods
Most existing methods require a full training pass on the original graph dataset, which can undermine the goal
of efficient distillation. This is where our work on Bonsai comes in.
Introducing Bonsai: A Novel Approach
Bonsai is a unique, linear-time, model-agnostic graph distillation algorithm that overcomes traditional limitations
by distilling graphs 22x faster than previous methods. It achieves state-of-the-art results on 14 out of 18 benchmark scenarios.
Key Technical Innovations:
• Model-Agnostic Distillation: A single distilled dataset works across GNN architectures (GCN, GAT, GraphSage, GIN)
• CPU-Optimized: Designed to run efficiently without relying on expensive GPUs
• Linear-Time Performance: Bonsai mirrors the input space, bypassing full-dataset training
Implementation Details
The core insight behind Bonsai lies in leveraging computation trees and the Weisfeiler-Lehman kernel to measure
structural similarity. Our greedy selection algorithm identifies exemplar trees with high representative power and diversity.
For the complete technical implementation and code, visit our
GitHub repository.
|
|
Graph Neural Networks for Molecular Property Prediction
Published: February 2025
Predicting molecular properties is crucial in drug discovery and materials science. Traditional methods rely on
expensive experimental data or quantum chemical calculations. Our approach using Graph Neural Networks offers
a more efficient alternative.
The MolMerger Approach
We developed a novel "MolMerger" algorithm that creates virtual bonds between solute and solvent molecules,
allowing the neural network to learn from the structural nuances of molecular interactions without expensive
quantum calculations.
Key Results:
• R² score of 0.94 for aqueous solubility prediction
• R² of 0.767 and MAE of 0.78 on test set
• Average MAE of 0.79 across 65 different solvents
• SHAP analysis revealing key molecular features
Technical Implementation
Our framework combines Graph Neural Networks with attention mechanisms and GRUs. The key innovation lies in
how we represent solute-solvent pairs through virtual molecular bonds, enabling the model to capture complex
intermolecular interactions.
Published in Journal of Chemical Theory and Computation, ACS.
Read the full paper |
GitHub Code
|
|
Tackling the Curse of Dimensionality: Advanced Nearest Neighbor Search
Published: January 2025
During my internship at the University of Copenhagen,
I worked on developing Panorama, an exact k-nearest neighbor search algorithm that mitigates the curse of
dimensionality through innovative approaches.
The Challenge
High-dimensional data complicates similarity search operations, making traditional kNN and RkNN computationally
expensive. This is particularly problematic in applications like location-based services, social networks, and
recommendation systems.
Our Solution: Panorama
Panorama achieves O(n log d) query complexity by progressively pruning distance computations in high-dimensional
spaces (d ≈ 10⁶). The algorithm demonstrates a 10× speedup over ANNOY and HNSW baselines on image classification
and location-based service datasets.
Key Technical Innovations:
• DCT energy compaction for dimension reduction
• Cauchy-Schwarz bounds for efficient distance computation
• Parallel architecture achieving k-fold speedup with k workers
• Particularly effective for transient data in ML distillation pipelines
This work is currently under review at ICML 2025.
|
|