Publications.
Peer-reviewed publications from the MINE lab. For the most up-to-date list, see Prof. Zhang's Google Scholar and homepage.
My Favorite Streamer is an LLM: Discovering, Bonding, and Co-Creating in AI VTuber Fandom
CHI 2026 (Honourable Mention Award)
Preference Leakage: A Contamination Problem in LLM-as-a-Judge
ICLR 2026 (Best Paper Award, ICML 2025 DIG-BUG Workshop)
Benchmarking Large Language Models on Safety Issues in Scientific Labs
Nature Machine Intelligence, Jan 2026
Highlighted by New Scientist and Science News.
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking
NeurIPS 2025
DyFlow: Dynamic Workflow Framework for Agentic Reasoning
NeurIPS 2025
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
NeurIPS 2025
TraffiDent: A Dataset for Understanding the Interplay Between Traffic Dynamics and Incidents
NeurIPS 2025 Datasets and Benchmarks Track
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
NeurIPS 2025 Datasets and Benchmarks Track
Unveiling the Dynamics of Multi-Dimensional Filter Bubbles in News Recommendation
IEEE BigData 2025 (Short Paper)
New Paradigm for Evaluating Scholar Summaries: A Facet-aware Metric and a Meta-evaluation Benchmark
ACM Transactions on Information Systems, Vol. 43, Issue 4, 2025
Towards Generalized Urban Computing: Pretraining a Spatial-Temporal Model for Diverse Urban Tasks
IEEE Transactions on Mobile Computing, Vol. 24, Issue 10, October 2025
Machine learning for 2D material-based devices
Materials Science and Engineering R, Vol. 166, 2025
Quest2DataAgent: Automating End-to-End Scientific Data Collection
EMNLP 2025 (System Demonstrations)
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
EMNLP 2025 Findings
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking
EMNLP 2025
Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions
CIKM 2025 (full paper)
Towards Few-shot Chemical Reaction Outcome Prediction
CIKM 2025 (full paper)
Jailbreaking LLMs via Misalignment on Out-of-Distribution Inputs
CIKM 2025 (short paper)
Think it Image by Image: Multi-Image Moral Reasoning of Large Vision-Language Models
CIKM 2025 (short paper)
Fair Online Influence Maximization
TMLR 2025
Exposing and Patching the Flaws of Large Language Models in Social Character Simulation
COLM 2025
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models
NAACL 2025 Demo
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
ACL 2025
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
ACL 2025
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
ACL 2025, Findings
Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models
ACL 2025
Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models
ACL 2025
Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond
IJCAI 2025 Survey Track
Evaluating and Mitigating Bias in AI-based Medical Text Generation
Nature Computational Science, 2025
WildlifeLookup: A Chatbot Facilitating Wildlife Management with Accessible Data and Insights
WSDM 2025 (Demo)
Improving Reaction Prediction through Chemically Aware Transfer Learning
Digital Discovery, 2025
DataGen: Unified Synthetic Dataset Generation via Large Language Models
ICLR 2025
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
ICLR 2025
UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs
TheWebConf 2025
WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs
TheWebConf 2025
Towards Fair Graph Learning without Demographic Information
AISTATS 2025
Unlocking the Potential of Black-box Pre-trained GNNs for Graph Few-shot Learning
AAAI 2025
Knowledge Distillation on Graphs: A Survey
ACM Computing Surveys (To appear)
Unveiling the power of language models in chemical research question answering
Communications Chemistry, volume 8, Article number: 4 (2025)
Machine learning assisted plasmonic metascreen for enhanced broadband absorption in ultra-thin silicon films
Light: Science & Applications, 14, 42 (2025)
Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation
NeurIPS 2024 Datasets and Benchmarks Track as a spotlight
Defending Jailbreak Prompts via In-Context Adversarial Game
EMNLP 2024 Main
Zero-Shot Relational Learning for Multimodal Knowledge Graphs
IEEE Big Data 2024 (Regular paper)
Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels–Alder Reaction Outcomes
Journal of the American Chemical Society Vol 146, Issue 23
Application of Large Language Models in Chemistry Reaction Data Extraction and Cleaning
CIKM 2024 Short Research Papers Track
FaDE: A Face Segment Driven Identity Anonymization Framework For Fair Face Recognition
CIKM 2024 Full Research Paper
TrustLLM: Trustworthiness in Large Language Models
ICML 2024, HuggingFace Daily Paper No.1, Highlighted by United States Department of Homeland Security (DHS), Invited Talk at IBM Research
Large Language Model Based Multi-Agents: A Survey of Progress and Challenges
IJCAI 2024 (Survey Track)
Are We Making Much Progress? Revisiting Chemical Reaction Yield Prediction from an Imbalanced Regression Perspective
TheWebConf 2024 (Short paper)
A Property-Guided Diffusion Model for Generating Molecular Graphs
IEEE ICASSP 2024
Causality-Based Fair Multiple Decision by Response Functions
ACM Transactions on Knowledge Discovery from Data (TKDD), 2024
Gradient-Based Local Causal Structure Learning
IEEE Transactions on Cybernetics, 2024
Personalized Federated Few-Shot Learning
IEEE Transactions on Neural Networks and Learning Systems, 2024
Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
Forgetting User Preference in Recommendation Systems with Label-Flipping
IEEE International Conference on Big Data 2023 (Regular Paper)
What can large language models do in chemistry? A comprehensive benchmark on eight tasks
NeurIPS 2023 Datasets and Benchmarks track
Compositional Mathematical Encoding for Math Word Problems
Findings of ACL 2023
Few-shot Low-resource Knowledge Graph Completion with Reinforced Task Generation
Findings of ACL 2023
Graph-based Molecular Representation Learning
IJCAI 2023 (Survey Track)
A Topic-aware Summarization Framework with Different Modal Side Information
SIGIR 2023 (Full paper)
LogicRec: Recommendation with Users' Logical Requirements
SIGIR 2023 (Short paper)
Few-shot News Recommendation via Cross-lingual Transfer
The Web Conference 2023
Learning MLPs on Graphs: A Unified View of Effectiveness, Robustness, and Efficiency
ICLR 2023 (Notable Top 25%)
AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs
IEEE Big Data 2022 (Regular Paper)
QuoGNN: Quotient Graph Neural Network for Urban Flow Forecasting
IEEE Big Data 2022 (Short Paper)
Few-shot Heterogeneous Graph Learning via Cross-domain Knowledge Transfer
KDD 2022 Research Track
Data-Driven Oracle Bone Rejoining: A Dataset and Practical Self-Supervised Learning Scheme
KDD 2022 Applied Data Science Track
Few-Shot Learning on Graphs: A Survey
IJCAI 2022 (Survey Track)
MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving
Findings of NAACL 2022
Graph Alignment with Noisy Supervision
TheWebConf 2022
Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order
ACM Transactions on Information Systems (TOIS), February 2022
HGATE: Heterogeneous Graph Attention Auto-Encoders
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2022
Uniting Heterogeneity, Inductiveness, and Efficiency for Graph Representation Learning
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Low resistance asymmetric III-nitride tunnel junctions designed by machine learning
Nanomaterials, 2021, 11(10), 2466
Data-Efficient Language Shaped Few-shot Image Classification
Findings of EMNLP 2021
GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation
CIKM 2021 (Full Paper)
Set-aware Entity Synonym Discovery with Flexible Receptive Fields
IEEE Transactions on Knowledge and Data Engineering (TKDE)
PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization
ICML 2021 (Long Presentation)
Capturing Relations between Scientific Papers: An Abstractive Model for Related Work Section Generation
ACL-IJCNLP 2021 Main Conference
Rise and Fall of the Global Conversation and Shifting Sentiments During the COVID-19 Pandemic
Humanities and Social Sciences Communications, 2021
The soundscape of the Anthropocene ocean
Science, Vol 371, Issue 6529, 05 February 2021
Graph Embedding for Recommendation against Attribute Inference Attacks
The Web Conference 2021 (WWW 2021)
Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation
The Web Conference 2021 (WWW 2021)
Deterministic and probabilistic deep learning models for inverse design of broadband acoustic cloak
Physical Review Research, Vol. 3, Issue 1, 013142, 2021
DDHH: A Decentralized Deep Learning Framework for Large-scale Heterogeneous Networks
IEEE ICDE 2021 (Short Paper)
PINE: Universal deep embedding for graph nodes via partial permutation invariant set functions
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2021
DeepIDA: Predicting Isoform-Disease Associations by Data Fusion and Deep Neural Networks
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021
Approximately Counting Butterflies in Large Bipartite Graph Streams
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Multiview Multi-Instance Multilabel Active Learning
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Point-of-Interest Recommendation with Global and Local Context
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Flexible Cross-Modal Hashing
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Representation Learning with Multi-level Attention for Activity Trajectory Similarity Computation
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
CrowdWT: Crowdsourcing via Joint Modeling of Workers and Tasks
ACM Transactions on Knowledge Discovery from Data (TKDD), 2021
CMAL: Cost-effective Multi-label Active Learning by Querying Subexamples
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
CRSAL: Conversational Recommender Systems with Adversarial Learning
ACM Transactions on Information Systems (TOIS), 2021
T-PAIR: Temporal Node-pair Embedding for Automatic Biomedical Hypothesis Generation
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2020
Partial Multi-Label Learning using Label Compression
IEEE ICDM 2020 (Regular Paper)
Deep Incomplete Multi-View Multiple Clusterings
IEEE ICDM 2020 (Regular Paper)
Mate-path Hierarchical Heterogeneous Graph Convolution Network for High Potential Scholar Recognition
IEEE ICDM 2020 (Short Paper)
Multi-type Objects Multi-view Multi-instance Multi-label Learning
IEEE ICDM 2020 (Short Paper)
Decentralized Embedding Framework for Large-Scale Networks
DASFAA 2020 (Best Student Paper Award)
Multi-modal Network Representation Learning
KDD 2020 Tutorial
Jointly Learning Representations of Nodes and Attributes for Attributed Networks
ACM Transactions on Information Systems (TOIS), 2020
Accurately Estimating User Cardinalities and Detecting Super Spreaders over Time
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2020
Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data
IEEE Transactions on Signal Processing, 2020
Attention-Aware Answers of the Crowd
SDM 2020
GraPASA: Parametric Graph Embedding via Siamese Architecture
Information Sciences, 512: 1442-1457 (2020)
Gaussian Mixture Embedding of Multiple Node Roles in Networks
World Wide Web Journal, 23(2): 927-950 (2020)
Collaborative Graph Walk for Semi-supervised Multi-Label Node Classification
IEEE ICDM 2019 (Regular Paper)
Dataset Recommendation via Variational Graph Autoencoder
IEEE ICDM 2019 (Regular Paper)
Cross-modal Zero-shot Hashing
IEEE ICDM 2019 (Regular Paper)
AMENDER: an Attentive and Aggregate Multi-layered Network for Dataset Recommendation
IEEE ICDM 2019 (Short Paper)
Multi-View Multiple Clustering
IJCAI 2019
Individuality and Commonality based Multi-View Multi-Label Learning
IEEE Transactions on Cybernetics, 2019
Privacy Risk Analysis and Mitigation of Analytics Libraries in the Android Ecosystem
IEEE Transactions on Mobile Computing, 2019
Co-Embedding Attributed Networks
WSDM 2019
Mining top-k Popular Datasets via a Deep Generative Model
IEEE Big Data 2018
SNOD: A Fast Sampling Method of Exploring Node Orbit Degrees for Large Graphs
Knowledge and Information Systems (KAIS), 61(1): 301-326, 2019
Mining Streaming and Temporal Data: From Representation to Knowledge
IJCAI-ECAI 2018 (Early Career Spotlight)
Transfer Collaborative Filtering from Multiple Sources via Consensus Regularization
Neural Networks, Volume 108, December 2018, Pages 287-295
Protecting Multi-party Privacy in Location-Aware Social Point-of-Interest Recommendation
World Wide Web Journal, April 2018
Use of Unmanned Aerial Vehicles for Efficient Beach Litter Monitoring
Marine Pollution Bulletin, Vol. 131, Part A, 2018
A Privacy-Preserving Framework for Trust-Oriented Point-of-Interest Recommendation
IEEE Access, Vol. 6, 2018
CreditCoin: A Privacy-Preserving Blockchain-based Incentive Announcement Network for Communications of Smart Vehicles
IEEE Transactions on Intelligent Transportation Systems, Vol. 19, No. 7, 2018
Detecting Android Malicious Apps and Categorizing Benign Apps with Ensemble of Classifiers
Future Generation Computer Systems, Volume 78, 2018
Efficient Task Assignment in Spatial Crowdsourcing with Worker and Task Privacy Protection
GeoInformatica, Volume 22, Issue 2, 2018
Efficient Evaluation of Shortest Travel-Time Path Queries through Spatial Mashups
GeoInformatica, Volume 22, Issue 1, 2018
Discovering and Understanding Android Sensor Usage Behaviors with Data Flow Analysis
World Wide Web Journal, Volume 21, Issue 1, 2018
Exploiting Reject Option in Classification for Social Discrimination Control
Information Sciences, Volume 425, 2018
Abstracting Massive Data for Lightweight Intrusion Detection in Computer Networks
Information Sciences, Volumes 433-434, 2018
Approximately Counting Triangles in Large Graph Streams Including Edge Duplicates with a Fixed Memory Usage
VLDB 2017, Volume 11, Issue 2
MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs
IEEE Transactions on Knowledge and Data Engineering (TKDE), 30(1): 73-86, 2017
Delve: A Dataset-Driven Scholarly Search and Analysis System
SIGKDD Explorations, Vol. 19, Issue 2, 2017
An Up-to-date Comparison of State-of-the-art Classification Algorithms
Expert Systems with Applications, 82: 128-150, 2017
Web-ADARE: A Web-aided Data Repairing System
Neurocomputing, Volume 253, 2017
Characterizing Android Apps' Behavior for Effective Detection of Malapps at Large Scale
Future Generation Computer Systems, Volume 75, 2017
Privacy-preserving Task Assignment in Spatial Crowdsourcing
Journal of Computer Science and Technology, Volume 32, Issue 5, 2017
KDE-Track: An Efficient Dynamic Density Estimator for Data Streams
IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 29, No. 3, 2017
An Effective Suggestion Method for Keyword Search of Databases
World Wide Web Journal, Volume 20, Issue 4, 2017
The Interaction between Schema Matching and Record Matching in Data Integration
IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 29, No. 1, 2017
Flash Flood Detection in Urban Cities Using Ultrasonic and Infrared Sensors
IEEE Sensors Journal, Vol. 16, No. 19, 2016
Large Margin Classification with Indefinite Similarities
Machine Learning Journal, 103(2): 215-237, 2016
Modeling and Predicting AD Progression by Regression Analysis of Sequential Clinical Data
Neurocomputing, Volume 195, 2016
Optimizing Cost of Continuous Overlapping Queries over Data Streams by Filter Adaptation
IEEE Transactions on Knowledge and Data Engineering (TKDE), 28(5): 1258-1271, 2016
Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?
ECML PKDD 2016
TRIP: An Interactive Retrieving-Inferring Data Imputation Approach
IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(9): 2550-2563, 2015
Exploring Permission-Induced Risk in Android Applications for Malicious Application Detection
IEEE Transactions on Information Forensics and Security (TIFS), 9(11): 1869-1882, 2014
Autonomic Intrusion Detection: Adaptively Detecting Anomalies over Unlabeled Audit Data Streams in Computer Networks
Knowledge-Based Systems, Volume 70, November 2014, Pages 103-117
Anti-discrimination Analysis Using Privacy Attack Strategies
ECML PKDD 2014
Maximum Error-Bounded Piecewise Linear Representation for Online Stream Approximation
The VLDB Journal, Volume 23, Issue 6, December 2014
Data Stream Clustering with Affinity Propagation
IEEE Transactions on Knowledge and Data Engineering (TKDE), 26(7): 1644-1656, 2014
TideWatch: Fingerprinting the Cyclicality of Big Data Workloads
IEEE INFOCOM 2014
Cost Reduction for Web-based Data Imputation
DASFAA 2014
Foreword to the Special Focus on Mathematics, Data and Knowledge
Mathematics in Computer Science, Volume 7, Issue 4, 2013
Controlling Attribute Effect in Linear Regression
IEEE ICDM 2013
Automated Mining of Disease-Specific Protein Interaction Networks Based on Biomedical Literature
Book Chapter in "Biological Data Mining and Its Applications in Healthcare", World Scientific, 2013
Securing Recommender Systems against Shilling Attacks Using Social-based Clustering
Journal of Computer Science and Technology, Volume 28, Issue 4, July 2013
Video Quality Prediction over Wireless 4G
PAKDD 2013
Decision Theory for Discrimination-aware Classification
IEEE ICDM 2012
Virtual Machine Migration in an Over-committed Cloud
IEEE NOMS 2012
Understanding and Analyzing Network Traffic
IEEE Network, January 2012
Network Traffic Monitoring, Analysis and Anomaly Detection
IEEE Network, May 2011
Scaling Analysis of Affinity Propagation
Physical Review E, Vol. 81, 066102, 2010
Adaptively Detecting Changes in Autonomic Grid Computing
IEEE GRID 2010 Workshop
Contributions to Large Scale Data Clustering and Streaming with Affinity Propagation. Application to Autonomic Grids.
Ph.D. Thesis, INRIA and Université Paris-Sud 11, 2010
Constructing Attribute Weights from Audit Data for Effective Intrusion Detection
Journal of Systems and Software, Vol. 82, No. 12, 2009
G-StrAP: A 2-level Real-time Grid Monitoring System
CAp 2009
Multi-scale Realtime Grid Monitoring with Job Stream Mining
IEEE/ACM CCGrid 2009
Fast Intrusion Detection Based on a Non-negative Matrix Factorization Model
Journal of Network and Computer Applications, Vol. 32, No. 1, 2009
Data Streaming with Affinity Propagation
ECML PKDD 2008
Frugal and Online Affinity Propagation
CAp 2008
Modelling the Jobs of a Grid System
RFIA 2008
Processing of Massive Audit Data Streams for Real-Time Anomaly Intrusion Detection
Computer Communications, Vol. 31, No. 1, 2008
Toward Behavioral Modeling of a Grid System: Mining the Logging and Bookkeeping Files
IEEE ICDM Workshop DSMM 2007
Tomography Experiment of an Integrated Circuit Specimen Using 3 MeV Electrons in the Transmission Electron Microscope
Review of Scientific Instruments, Vol. 78, 013701, 2007
Profiling Program Behavior for Anomaly Intrusion Detection Based on the Transition and Frequency Property of Computer Audit Data
Computers & Security, Vol. 25, No. 7, 2006