Yibo Wen's Personal Website

Yibo Wen

Computer Science PhD Student @ Northwestern

I am a 2nd year Computer Science Ph.D. student at Northwestern University, advised by Han Liu. My current research focuses on developing virtual cell models in collaboration with the Chan Zuckerberg Biohub and advancing AI-driven drug discovery in partnership with AbbVie. I completed my undergraduate studies also at Northwestern, with prior research experience in Computer Graphics at the USC Institute for Creative Technologies and Sony Research.

I am particularly interested in computational molecular and antibody design, structure-based drug discovery, and building virtual cell models for cell reprogramming and broader biomedical applications. Beyond research, I am passionate about tennis, digital design and startups.

News

[2026/05]

Two paper were accepted to ICML 2026.

[2026/04]

Two paper were accepted to ACL 2026.

[2025/09]

Our paper AlignAb was accepted to NeurIPS 2025.

[2025/06]

I joined the Chan Zuckerberg Biohub Chicago as a research intern.

[2024/12]

I presented our work on model alignment at AbbVie.

[2024/09]

I started my Ph.D. journey in the Computer Science Department at Northwestern University.

[2022/05]

I joined the Vision & Graphics Lab at the USC Institute for Creative Technologies as a research intern.

Publications

Virtual Cells Need Context, Not Just Scale

Payam Dibaeinia, Sudarshan Babu, Mei Knudson, Ali ElSheikh, Yibo Wen, Han Liu, Jason Perera, Aly A. Khan

ICML 2026

arXiv

We present a position paper arguing that building accurate Virtual Cell models requires contextual diversity and causal representation learning, not simply larger models or more data. Through analysis of a state-of-the-art model on a 22-million-cell immunology dataset, we show that simple baselines match sophisticated architectures within a given biological context, while current models fail to generalize across contexts. We connect this failure mode to the causal inference literature on transportability and call for the field to prioritize causal coverage over scale.

Genome-Factory: An Integrated Library for Tuning, Deploying, and Interpreting Genomic Models

Weimin Wu^*, Xuefeng Song^*, Yibo Wen^*, Qinjie Lin, Zhihan Zhou, Jerry Yao-Chieh Hu, Zhong Wang, Han Liu

ICML 2026

arXiv

Genome-Factory is an integrated Python library that streamlines the end-to-end workflow for genomic foundation models—from data acquisition and quality control to tuning, inference, benchmarking, and interpretation. It supports full-parameter, LoRA, and adapter tuning across diverse models, provides embedding extraction and DNA sequence generation, and includes two built-in benchmarks with a plug-in interface for new tasks. A sparse autoencoder-based interpreter maps near-monosemantic units to biologically meaningful features via external readouts. We validate compatibility, benchmark performance, and interpretability, demonstrating practical utility for real-world genomic analysis.

POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization

Ziqing Wang^*, Yibo Wen^*, William Pattie, Xiao Luo, Weimin Wu, Jerry Yao-Chieh Hu, Abhishek Pandey, Han Liu, Kaize Ding

KDD 2026 (Oral)

arXiv

We propose POLO, a RL approach that trains LLMs to optimize drug candidates by learning from full optimization trajectories. The method introduces Preference Guided Policy Optimization, which combines trajectory level reinforcement with turn level preference learning to use each oracle evaluation more effectively. POLO achieves 84% success on single property tasks and 50% on multi property tasks with 500 evaluations, improving sample efficiency in lead optimization.

AlignAb: Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies

Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu, Han Liu

NeurIPS 2025

arXiv

We propose a three-stage framework combining language model pre-training, diffusion-based optimization, and energy-aligned fine-tuning. By extending Direct Preference Optimization for multi-objective alignment and introducing an iterative learning paradigm, our method achieves state-of-the-art performance in generating stable, high-affinity, and structurally realistic antibody candidates.

Cell-JEPA: Latent Representation Learning for Single-Cell Transcriptomics

Ali ElSheikh, Rui-Xi Wang, Weimin Wu, Yibo Wen, Payam Dibaeinia, Jennifer Yuntong Zhang, Jerry Yao-Chieh Hu, Mei Knudson, Sudarshan Babu, Shao-Hua Sun, Aly A. Khan, Han Liu

In Submission

arXiv

We introduce Cell-JEPA, a joint-embedding predictive architecture for single-cell transcriptomics that learns by predicting cell-level embeddings from partial observations rather than reconstructing sparse, noisy counts. By exploiting the redundant encoding of cell identity across genes, our method learns dropout-robust representations that achieve a 36% relative improvement in zero-shot cell-type clustering over scGPT.

Options Flow

Daily snapshot of unusual, high-notional option trades with features like auto-tagged whale activity, volume ≫ OI detection, near-term expiry tracking, and actionable tables—helping traders quickly spot where big money is moving in the market.

Convoice AI

Convoice enables you to build and deploy custom voicebots with ease, automating both inbounding and outbounding calls.

Built an engineering team of 4 person from scratch, designed all branding and interfaces, collaborated with large call centers and health industry leaders to automate customer service and sales support.

Ctrl+F Platform

The mission of Ctrl+F Platform is to connect students with on-campus academic opportunities. Together with 4 other USC students, we helped to connect students with 100+ research and teaching assistant opportunities on campus.

We won the first place in 2022 Viterbi ABC Innovation Prize and received $6000 prize money from Viterbi School of Engineering.

Website Design & Development

Personal Projects

Modeling & Rendering

Student Union @ Shanghai High School

Investment Brochure for Xujing District, Shanghai

If you need help with digital design or website development, feel free contact me!

Which one is better?