Chengyuan Liu

DS & Biostatistics Student

Welcome to my personal website! I am Chengyuan Liu, a Master‘s student in Data Science & Biostatistics at Cornell University, deeply engrossed in the world of data analysis and its applications in real-world scenarios. My journey in data science is marked by diverse experiences, ranging from groundbreaking microbiome research at Weill Cornell Medicine to strategic financial analyses at China Fortune Securities. Skilled in Python, SQL, and various data analysis tools, I am currently seeking exciting opportunities in data science internships for Summer 2024. Join me as I explore the intersection of data, technology, and innovation.

profile pic

Professional Experiences

company logo

Research Assistant @ Weill Cornell Medicine

New York City, New York -- Nov 2023 - Present

  • Handling datasets with over 1000 samples and 30,000+ variables in application of AlphaFold in microbiome research.
  • Employing hierarchical clustering and classification using Scikit-Learn and PyTorch, analyzing high-dimensional data.
  • Engaging in the implementation of interpretable learning modules to understand AlphaFold's predictions.
  • Conducting extensive literature reviews to latest developments in AlphaFold applications and microbiome data analysis.
  • Collaborating in a multidisciplinary team, contributing to discussions and strategies for future research directions.
company logo

Data Analyst Intern @ Kingdom Consulting

East Windsor, NJ -- Dec 2022 – Aug 2023

  • Gathered and cleaned user engagement data in SQL, conducted EDA, performed statistical analysis, created datavisualizations, built and tested XGBoost & K-means model in Python to provide prediction of customer behavior, withover 90 % AUC-ROC
  • Aligned and tracked key performance indicators (KPIs) with business objectives, facilitating data-driven decisionmaking and performance monitoring
  • Constructed and tested data pipeline and fully automated & interactive Tableau dashboards for internal stakeholders, enabling future research on product optimization
  • Collaborated closely with cross-functional product and engineering team to perform rigorous data analysis for validating business hypotheses, delivering evidence-based insights and guiding the formulation of strategic recommendations
company logo

Data Scientist Intern @ Societe Generale

New York City, NY -- Jun 2022 – Sept 2022

  • Performed SQL queries to build data collection, storage, and processing infrastructure. Analyzed business problems and converted report data to actionable items; delivered simplified and visualized analytical outputs
  • Conducted statistical analysis (e.g., Pearson's chi-squared test), built predictive models (e.g., Random Forest, Logistic Regression and SVM) in python to detect fraud, resulting in a 30% increase in detection accuracy
  • Designed and implemented ETL solutions with over 200GB data. Maintain data infrastructure including building more than 40 data tables, reconstructed the databased structure, improving the query efficiency about 50%
  • Conducted and analyzed 10+ A/B testing experiments to evaluate policy efficiency improvement
company logo

Data Analyst Intern @ China Fortune Securities Co Ltd

Beijing, CN -- May 2021 – Aug 2021

  • Analyzed historical share transactions and market fluctuations to simulate risk for share pledge negotiations with a Fortune 500 company in China
  • Facilitated collaboration between a private equity quantitative funds, optimizing trading strategies and coordination
  • Utilized Python Matplotlib to visually represent financial data for diverse investors, delivering presentations to over 50 investors and establishing a streamlined presentation process
company logo

Research Assistant @ New York University

New York, NY -- Jun 2021 - Jan 2022

  • Developed an LSTM algorithm to forecast healthcare sector stock prices, with Min-Max Normalization to compute cellstates to mitigate gradient vanishing and explosion, achieving a prediction accuracy of 90% for the next five days
  • Designed and implemented a customized stock-picking strategy that significantly enhanced investment performance within the healthcare sector
  • Established a robust validation framework to ensure the reliability and resilience of the forecasting model

Publications & Projects

Image for Publication

Publication

This study uses the Long Short-Term Memory (LSTM) algorithm to predict short-term returns of six stable, high-volume healthcare sector stocks, based on five years of data, assessing accuracy with root mean square error (RMSE).

Image for Interactive Book Search and Rating Console Program

Interactive Book Search and Rating Console Program

Our program is a console-based application for book searches and ratings. It uses a text file database and provides a user-friendly terminal interface for searching by author or title, with clear messages for no results. Users can also rate books, with the ratings displayed in the book list. A demo video is available online.

Image for Publication

Publication

Amidst COVID-19's impact on the food industry, this study focuses on predicting the weekly customer demand at Byen Bakeri, a modern Nordic Cafe in Seattle, for Thanksgiving and Christmas 2022. Utilizing probability concepts, the research offers two planning methods for inventory management during the holiday season based on past sales data.

Image for Publication

Publication

This study presents an advanced mathematical model to better understand HIV virus dynamics, focusing on the interaction between HIV and CD4+ T cells and how it can inform treatment strategies. It critiques a simpler model and suggests improvements to more accurately represent viral reproduction.

—— More About Me ——

Tech Skills

Email

chl4034@med.cornell.edu