Chengyuan Liu
DS & Biostatistics Student
Welcome to my personal website! I am Chengyuan Liu, a Master‘s student in Data Science & Biostatistics at Cornell University, deeply engrossed in the world of data analysis and its applications in real-world scenarios. My journey in data science is marked by diverse experiences, ranging from groundbreaking microbiome research at Weill Cornell Medicine to strategic financial analyses at China Fortune Securities. Skilled in Python, SQL, and various data analysis tools, I am currently seeking exciting opportunities in data science internships for Summer 2024. Join me as I explore the intersection of data, technology, and innovation.
Professional Experiences
Research Assistant @ Weill Cornell Medicine
New York City, New York -- Nov 2023 - Present
- Handling datasets with over 1000 samples and 30,000+ variables in application of AlphaFold in microbiome research.
- Employing hierarchical clustering and classification using Scikit-Learn and PyTorch, analyzing high-dimensional data.
- Engaging in the implementation of interpretable learning modules to understand AlphaFold's predictions.
- Conducting extensive literature reviews to latest developments in AlphaFold applications and microbiome data analysis.
- Collaborating in a multidisciplinary team, contributing to discussions and strategies for future research directions.
Data Analyst Intern @ Kingdom Consulting
East Windsor, NJ -- Dec 2022 – Aug 2023
- Gathered and cleaned user engagement data in SQL, conducted EDA, performed statistical analysis, created datavisualizations, built and tested XGBoost & K-means model in Python to provide prediction of customer behavior, withover 90 % AUC-ROC
- Aligned and tracked key performance indicators (KPIs) with business objectives, facilitating data-driven decisionmaking and performance monitoring
- Constructed and tested data pipeline and fully automated & interactive Tableau dashboards for internal stakeholders, enabling future research on product optimization
- Collaborated closely with cross-functional product and engineering team to perform rigorous data analysis for validating business hypotheses, delivering evidence-based insights and guiding the formulation of strategic recommendations
Data Scientist Intern @ Societe Generale
New York City, NY -- Jun 2022 – Sept 2022
- Performed SQL queries to build data collection, storage, and processing infrastructure. Analyzed business problems and converted report data to actionable items; delivered simplified and visualized analytical outputs
- Conducted statistical analysis (e.g., Pearson's chi-squared test), built predictive models (e.g., Random Forest, Logistic Regression and SVM) in python to detect fraud, resulting in a 30% increase in detection accuracy
- Designed and implemented ETL solutions with over 200GB data. Maintain data infrastructure including building more than 40 data tables, reconstructed the databased structure, improving the query efficiency about 50%
- Conducted and analyzed 10+ A/B testing experiments to evaluate policy efficiency improvement
Data Analyst Intern @ China Fortune Securities Co Ltd
Beijing, CN -- May 2021 – Aug 2021
- Analyzed historical share transactions and market fluctuations to simulate risk for share pledge negotiations with a Fortune 500 company in China
- Facilitated collaboration between a private equity quantitative funds, optimizing trading strategies and coordination
- Utilized Python Matplotlib to visually represent financial data for diverse investors, delivering presentations to over 50 investors and establishing a streamlined presentation process
Research Assistant @ New York University
New York, NY -- Jun 2021 - Jan 2022
- Developed an LSTM algorithm to forecast healthcare sector stock prices, with Min-Max Normalization to compute cellstates to mitigate gradient vanishing and explosion, achieving a prediction accuracy of 90% for the next five days
- Designed and implemented a customized stock-picking strategy that significantly enhanced investment performance within the healthcare sector
- Established a robust validation framework to ensure the reliability and resilience of the forecasting model
Publications & Projects
—— More About Me ——
Tech Skills
chl4034@med.cornell.edu