During my master's program, I focused on text generation techniques, including Large Language Models, Hierarchical clustering, and Hierarchical Attentions.
I published two papers in top journals and led a research team to achieve top performance in international challenges. My skills include Python, C++, JavaScript, and deep learning frameworks like PyTorch and TensorFlow.
I am currently seeking opportunities in related fields. If you have any relevant positions or p
rojects, please contact me at leoyang881122@gmail.com.
Recent Work
Using LLM + RAG to Solve Hallucination Issues
By building a database through web scraping, the most relevant vector data is retrieved and sent to the LLM to expand the context and prevent hallucinations. The image shows the results.
Connecting to Windows WSL2 from Mac via SSH
Guide to set up WSL2 on Windows and access CUDA from a Mac via SSH. This involves installing WSL2 on Windows and connecting to it from a Mac to utilize CUDA capabilities. Read more.
Early Diagnosis with Brain Network Transformer
Utilized Brain Network Transformer at Taipei Veterans General Hospital for early diagnosis of major depression, bipolar disorder, and schizophrenia by analyzing fMRI brain network data. The image shows the attention heatmap.
BioLaySumm 2023 Shared Task: Lay Summarization of Biomedical Research Articles
This shared task surrounds the abstractive summarization of biomedical articles, with an emphasis on controllability and catering to non-expert audiences. And we won 1st place on the Leaderboard. Read more.
SemEval 2023 Task 8: Causal Medical Claim Identification and related PICO Frame Extraction from Social Media Posts
This task involved causal medical claim identification and PICO frame extraction from social media posts, where we secured 2nd place on the leaderboard. Read more.
Train a Chinese Medical QA Large Language Model for Llama Using LoRa
Scrape medical QA questions and train Llama2 using LoRa training methods for application in medical QA. The example shows the results after translation and conversion. Read more.
Psychpark Multi-Label Classification Task
In this experiment, the dataset consists of students' mood-expressing articles as input, with each article assigned multiple labels representing 21 different emotions. Fine-tuning and novel Prompt-tuning techniques were used to achieve better results compared to related research.
Chinese Medical QA System with ASR
Integrates National Yang Ming Chiao Tung University's Chinese ASR system to convert spoken Chinese into text in real-time, applicable to any voice input scenario.
Calligraphy Font Generation with GANs
Utilizes the Pix2Pix architecture, a type of Conditional GAN, to generate calligraphy fonts. Pix2Pix improves upon traditional GANs by using a U-Net based Generator with skip connections and a PatchGAN Discriminator, enhancing image quality and accuracy.
CAPTCHA Recognition Using CNN
Utilizes a CNN model for CAPTCHA image recognition. The model extracts features through multiple convolutional and pooling layers, flattens the feature maps, applies dropout to prevent overfitting, and outputs character probabilities using a softmax activation function through 9 dense layers.
More projects coming soon...
Publications
- Chao-Yi Chen, Jen-Hao Yang, and Lung-Hao Lee. 2023. NCUEE-NLP at BioLaySumm Task 2: Readability-Controlled Summarization of Biomedical Articles Using the PRIMERA Models. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 586–591, Toronto, Canada. Association for Computational Linguistics.
- Lung-Hao Lee, Yuan-Hao Cheng, Jen-Hao Yang, and Kao-Yuan Tien. 2023. NCUEE-NLP at SemEval-2023 Task 8: Identifying Medical Causal Claims and Extracting PIO Frames Using the Transformer Models. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 312–317, Toronto, Canada. Association for Computational Linguistics.