Course ECS 260: Software Enginerring (UC Davis)

Instructor Prof. Vladimir Filkov

Team Muhamamd Hassnain, Zeerak Babar, Nafiz Imtiaz Khan, Prem Gorde, Disha

Quarter Winter 2024

Large Language Models (LLMs) have shown great promise in various NLP tasks. They have been used by developers to write code, generate documentation, and even write tests. However, there seems to be a monoppoly of commercialized LLMs. These models are expensive in budget constraints and are not accessible to everyone. Therfore, we aim to explore how well these local LLMs can perform in assisting developers.

We start out by picking up data from popular sites where devevlopers ask questions (StackOverflow, Reddit, StackExchange etc). We only picked the questions that have been answered and the answers were confirmed to be helpful by the op. We than evaluated local LLMs such as Mistral, Ollama etc on this dataset and compared the results of these local LLMs with gold standard human answers. We do statistical analysis to see how well these local LLMs perform in comparison to human answers.

Extensions

After the final submission, this project was extended to see how local LLMs can be improved to perform better. We used some machinne learning techniques like RAG , fine-tuning etc to see how well these local LLMs can be improved. We evaluated it on a StackExchange dataset and compared how much performance improvement we can get by using these techniques. This was submitted to MSR 2025 and is currently under review. Please check back later for more details.

Todo :

Add more details and references.