Zhongyi (James) Guo 过仲懿🌈

About Me | Education | Publications | Experience | Technical Summary | Personal Life ↗


Get In Touch

outreach at gzy1s dot me

M+P-AMA-DG1J (text me to set up a call)

Skillset


🤖 Machine Learning

👨🏻‍💻 Data Science

📊 Statistical Modeling

📈 Data Visualization

📝 Grant Proposal Writing

🖥️ High Performance Computing (HPC)

Certifications


SAS

Coursera Badges & Courses

DeepLearning.AI

IBM AI Engineering

Neural Networks with PyTorch (In progress)

Stanford

Writing in the Sciences (In progress)

Languages


  • English
  • Chinese (Mandarin)

Log


03/26/2026: Added animation and redesigned homepage layout by vide coding using Claude.

11/20/2024: Updated research experience. I will update project experience and work experience shortly.

11/12/2024: Minor updates applied to the certification section.

08/12/2024: I redesigned my website. Now I'm slowing migrating project experience and work experience over.

About Me


My name is James Guo 过仲懿. I am a Wuxiness (无锡人)🇨🇳 living in Bay Area, California 🇺🇸.

I am a data scientist with 5 years of experience turning complex data into clear, actionable insights through mining and modeling.

Outside of work, I love body building, swimming, and MORE on my Instagram ↗!

Let's keep in touch!

Education



Case Western Reserve
Biochemistry
GPA: 4.0
Honors: Dean's List
Core Courses:
  • Microeconomics
  • Calculus I, II, III
  • College Writing
  • Intro-level pre-med
Cornell
BS in Statistics and Biology
GPA: 3.57
Honors: Dean's List
Core Courses:
  • Data Science
  • Data Mining & Machine Learning
  • Statistical Computing
  • Computing using Python
  • Object-oriented Programming and Data Structure
  • UNIX Tools and Scripting
  • Linear Algebra
  • Linear Models with Matrices
  • Theory of Statistics
  • Probability Models and Inference
  • Probability
  • Macroeconomics
  • Medical Statistics
  • Biological Statistics
  • Advanced Epidemiology
  • Quantitative Genomics and Genetics
  • Population Genetics
  • Lecture and lab in Genetics and Genomics
Stanford
MS in Epidemiology and Clinical Research
GPA: 3.86
Core Courses:
  • Deep Learning
  • Data Management and Analysis in SAS
  • Data Visualization
  • Biostatistics
  • Probability
  • Causal Inference
  • Bioinformatics
  • Design and Conduct of Clinical Trials
  • Epidemiologic Research Methods
  • Meta-analysis
2019 – 2020
2020 – 2023
2023 – 2025

Experience


Filter by
Incoming Biostatistics Intern
(est.) 06/2026 – 08/2026
Corcept Therapeutics  ·  Redwood City, CA
Statistical Programming SAS
Click to expand ↓
  • To be updated...
Incoming Election Aide Worker (temporary)
(est.) 04/2026 – 06/2026
Registrar of Voters Office, County of Santa Clara  ·  San Jose, CA
Public Service
Click to expand ↓
  • To be updated...
Data Scientist I (part-time)
03/2026 – Present
W74  ·  Remote
Data Science Energy Consulting Building Energy Consumption Python Time-series XGBoost Random Forest Deep Neural Network LSTM Transformer Sklearn Pytorch Github Codespace Claude CLT
Click to expand ↓
  • Provided statistical and deep learning expertise to inform clients’ decision on energy consumption.
Computational Researcher
07/2025 – Present
the Gladstone Institutes  ·  San Francisco, CA
R Package Dev Differential Gene Expression Analysis Variant Effect Prediction Multi-omics scRNA-seq scATAC-seq R Python Permutation Testing Pseudobulk Case-control Transformer devtools Seurat ArchR tidyverse ggplot2 AlphaGenome Synapse
Click to expand ↓
  • Pioneered the development of an R package (permuteDE) with a permutation testing-based pipeline to reduce false positives in differential gene expression analysis in Alzheimer's disease.
  • Collected and integrated eight massive-scale public scRNA-seq datasets from Synapse to demonstrate the package's usability.
  • Applied AlphaGenome ↗ to study how the MAPT gene variants might alter splicing patterns.
  • Removed doublets, performed dimensionality reduction and clustering of scATAC-seq data.
  • Communicated complex statistical findings and effective data visualizations clearly to diverse audiences.
Student Life Assistant
02/2024 – 01/2025
Stanford Online High School  ·  Remote
Administrative Support Schedule Planning Event Coordination Student Records Management Presentation Slide Creation Google Sheet/Excel Google Slide/PowerPoint
Click to expand ↓
  • Supported various event planning such as meeting agenda, student activities, arrival/departure bus schedules, etc.
  • Designed engaging slides for weekly themed class meetings.
  • Maintained student records in Google Sheets, using data entry and Excel functions with strong attention to details.
Graduate Student Researcher
10/2023 – 06/2025
Stanford University and UCSF  ·  Stanford, CA
Biomedical Research Disease Modeling Multi-tasking scRNA-seq scATAC-seq LC-MS Metabolomics R Python Statistical Testing Deep Learning Seurat ArchR tidyverse ggplot2 MetaboAnalystR
Click to expand ↓
Mobile (iOS) Development Intern
07/2022 – 08/2022
Match Group  ·  Remote
iOS App Dev Code Optimization Swift Protocol-oriented Programming SwiftUI
Click to expand ↓
  • Revised the "Enhanced Interests" feature in SwiftUI, allowing users to personalize their tags with text and emojis for richer self-expression.
  • Improved code efficiency by migrating UIKit components to SwiftUI.
Undergraduate Teaching Assistant
01/2021 – 05/2023
Cornell University  ·  Ithaca, NY
Teaching Grading Canvas Gradescope
Click to expand ↓
  • Taught a variety of courses.
  • Led discussions, graded homework and assignments, held office hours, and bridged communication between professors and students.

  • Maintained course materials and student records in Canvas and Gradescope.
Student Food Assistant
02/2022 – 05/2022
Cornell Dining  ·  Ithaca, NY
Customer Service Dietary Protocol
Click to expand ↓
  • Replenished food in the Morrison dining hall.
  • Made pizzas and chow meins to feed hungry students.
  • Learned about dietary guidelines and protocols.
Data Analyst Project Intern
07/2021 – 09/2021
Tencent  ·  Remote
Machine Learning Prediction E-commerce Sales Data Python Exploratory Data Analysis Regression Data Visualization numpy pandas Sklearn matplotlib Beautiful Soup
Click to expand ↓
  • Scraped e-commerce sales data with Python and built linear regression models via Sklearn to forecast sales trends across product categories.
  • Delivered evidence-based marketing reports with matplotlib visualizations.
Filter by
Computational Researcher  ·  the Gladstone Institutes (the Corces Lab) 07/2025 – Present
permuteDE: Permutation Testing for Differential Gene Expression Analysis
R Package Dev Case-control Statistical Testing Differential Gene Expression Analysis sc/snRNA-seq R Pseudobulk Permutation Testing Seurat BPCells edgeR DESeq2 tidyverse ggplot2
`permuteDE` logo `permuteDE` Volcano Plot `permuteDE` Histogram
Click to expand ↓

Differential expression analyses are susceptible to false positives. permuteDE uses permutation testing to identify which comparisons have a higher number of significant differentially expressed features than would be expected by chance.

  • Developed the permuteDE R package, overseeing maintenance, debugging, and feature development.
  • Collected and integrated eight large-scale public snRNA-seq datasets from Synapse, performing rigorous data cleaning, metadata curation, and structuring data into Seurat objects and BPCells matrices.
  • Applied the permuteDE pipeline across eight datasets to demonstrate its real-world utility.

Note: Currently we are finalizing the package and analyses; permuteDE will be released upon manuscript publication.

Graduate Student Researcher  ·  Stanford University (the Kundaje Lab) 04/2024 – 06/2025
Deep Learning + Single-Cell Multi-Omics to Study Schizophrenia & Bipolar Disorder
Multi-omics Integration Case-control Statistical Testing Differential Expression & Accessibility Analyses Variant Effect Prediction scRNA-seq scATAC-seq R Python Pseudobulk Wilcoxon Rank-sum Test Wald Test MAST Seurat ArchR DESeq2 ChromBPNet DeepLIFT/SHAP TF-MoDISco finemo-gpu
Chromatin Accessibilities across Cell Types ChromBPNet Model Architecture Disruptive Effect of a Variant

Figure Source: Emani et al., Science 2024  ·  Pampari et al., bioRxiv preprint 2025  ·  Ameen et al., Cell 2022

Click to expand ↓

I joined the PsychENCODE project at Prof. Anshul Kundaje's lab. I led this project analyzing scRNA-seq and scATAC-seq multiome data from three brain regions to investigate cell type-specific chromatin accessibility and gene expression patterns underlying schizophrenia and bipolar disorder to better understand their pathogeneses.

  • Preprocessed high-dimensional multiome data through quality control, dimensionality reduction, clustering, doublet detection, marker identification, and cell type annotation.
  • Performed differential gene expression & peak accessibility analyses (pseudobulk: Wald test; single-cell: Wilcoxon + MAST) across multiple cell types.
  • Trained cell type-specific ChromBPNet models to identify causal genomic variants and their downstream effects on chromatin accessibility and gene regulation.
Graduate Student Researcher  ·  UCSF (the Graff Lab) 10/2023 – 06/2025
Black–White Metabolomic Disparities in Prostate Cancer
Health Disparities Case-control Statistical Testing LC-MS Metabolomics Metabolon Panels R t-test linear regression logistic regression PCA PLS-DA Random Forest Support Vector Machine Pathway Analysis ChemRICH WGCNA MetaboAnalystR tidyverse ggplot2
PCA Incidence Rates PCA Incidence and Mortality Rates WGCNA

Figure Source: National Cancer Institute 2019  ·  Lowder et al., Cancer Letters 2022  ·  Langfelder & Horvath, BMC Bioinformatics 2008

Click to expand ↓

Black men face higher prostate cancer incidence and mortality than White men. This project leverages LC-MS metabolomics on a cohort of 34 men (17 Black, 17 White) from the IRONMAN Registry ↗ to characterize metabolites associated with such disparities. I presented this project as my Master's thesis.

  • Applied t-test, PCA, PLS-DA, random forest, logistic & linear regression, chemical similarity enrichment analysis, pathway analysis, and WGCNA to LC-MS metabolomics data, characterizing metabolites associated with Black-White prostate cancer disparities.
  • Interpreted complex statistical findings and communicated them through clear scientific writing and effective data visualizations; defended before Stanford Epidemiology faculty and students.
Class Project  ·  Stanford University 10/2024 – 12/2024
Interactive Data Visualization Dashboards: Global Disparities in Central Death Rates
Data Visualization Interactive Website Socio-economic Progress Demographic Data Javascript D3
Global Central Death Rates x Developmental Indicators Correlation between Mortality and Developmental Conditions Dynamic Bubble Chart over Time
Click to expand ↓

Website Link: https://dkristheltorres.github.io/CS-448B/ ↗

This project visualized the interplay between life expectancy (measured by central death rates) and economic development, healthcare access, energy production, and consumption patterns.

  • Developed interactive data visualization dashboards (global heatmaps, cross-country correlation analyses, and time-evolving bubble charts) using JavaScript D3.
Class Project  ·  Stanford University 10/2023 – 12/2023
Subarachnoid Hemorrhage Covariate Association and Risk Modeling
Data Management Disease Modeling Case-control Statistical Testing Clinical and Demographic Data SAS t-test chi-square Univariate and multivariate logistic regression Mantel–Haenszel test SAS OnDemand
Boxplot of Body Mass Index Summary Table 1 Summary Table 2
Click to expand ↓

This project evaluated associations between multiple exposures (age [categorical and continuous], sex, race, and smoking) and assessed alcohol as an effect modifier of the smoking–SAH relationship.

  • Cleaned and managed case-control datasets in SAS OnDemand: resolved duplicate records, standardized missing values, and created derived variables.
  • Conducted descriptive analyses (chi-square tests and t-tests) and applied univariate and multivariable logistic regression to estimate crude and adjusted odds ratios for smoking.
  • Performed stratified analysis using Mantel–Haenszel testing to adjust for alcohol use as a confounder.
  • Generated summary tables and data visualizations for reporting.
Beta Tester & Teaching Assistant
INFO 2950: Introduction to Data Science
Cornell University
01/2023 – 05/2023

Led discussions, graded homework and exams, held office hours, and proofread problem sets before release.

Grader
BTRY 3080: Probability Models and Inference
Cornell University
08/2022 – 08/2022

Graded homework and exams.

Teaching Assistant (Summer)
Introductory Biology & Physics I
JNC Study Abroad Platform
07/2022 – 08/2022

Led discussions, graded homework and assignments, held office hours, and bridged communication between professors and students.

Teaching Assistant
BIOMG 2801: Laboratory in Genetics and Genomics
Cornell University
01/2021 – 05/2021

CRISPR/Cas9 knockout work on fruit flies; guided 20 students in mutation analysis on the UCSC Genome Browser.

Publications


[1] Shin, J.‡, Brady, E.‡, Chen, C., Lauderdale, K., Agrawal, A., Zhang, Y., Jiang, X., Nambiar, P., Herbert, J., Mallen, D., Ly, K., Guo, Z., Sant, C., Thomas, R., Miller, S., Cobos, I., Palop, J.. APOE4 and Aβ synergize to drive neuronal network dysfunction and lysosomal-ER proteostasis dysregulation in the preclinical stages of Alzheimer's disease, Nature Neuroscience, 2025. Submitted ↗

[2] Poster session presenter and first author, Causal effect of type II diabetes on prostate cancer in the East Asian population: A two-sample Mendelian randomization study, AACR Special Conference: Aging and Cancer, 2022. Published ↗


Manuscripts in preparation

[1] (Aiming for BMC Medicine) Guo, Z.‡, Chen, D.‡, Stopsack, K. H., Soule, P., Ajit, D., Ramamoorthy, P., Hoffmann, T. J., Chan, J. M., Mucci L. A., Graff, R. E.. Metabolomic Disparities Between Black and Non-Hispanic White Men with Metastatic Hormone-Sensitive Prostate Cancer: A Pilot Study.(Title is tentative.)

[2] (Aiming for Cell) Qu, P., Wang, T., Jessa, S., Guo, Z., Guo, H., Purmann, C., Monte, E., Jiang, L., Yang, X., Zhou, B., Kundu, S., Kundaje, A., Wong, W., Hallmayer, J. F., Urban, A. E., Snyder, M. P.. Multi-modal functional genomics analysis of bipolar disorder and schizophrenia. (Title is tentative.)

[3] (Aiming for Nature Neuroscience) Sant, C., Guo, Z., Corces, M. R.. Preventing false discoveries in Alzheimer's disease single-cell sequencing data using permutation testing. (Title is tentative.)

‡ indicates co-first authorship.

Technical Summary


Programming Languages

Markup/Styling Languages

IDEs Command-Line Tools Version Control Visual Display