Zero-Shot Code Generation and Repair using ChatGPT
- Applied ChatGPT on questions sampled from competitive coding benchmark datasets to identify a low success rate of 30%.
- Implemented a Python pipeline that generates code using ChatGPT API, runs it against given test cases and creates signals form failed test cases for ChatGPT to iteratively repair non-functioning code.
Python
PyTorch
ChatGPT
Large Language Models
Natural Language Processing
Adversarial Perturbations for Robustness of Large Language Models
- Tuned BERT and GPT2 on Author Sentiment Analysis task using PerSent dataset and evaluated their performance on adversarially perturbed and unperturbed datasets.
- Simple character and word-level perturbations were sufficient to reduce the accuracy and f1-score of state-of-the-art language models by almost 4%, thereby showing a lack of robustness of these models.
Python
PyTorch
Transformers
TextAttack
Large Language Models
Natural Language Processing
Deep Neural Machine Translation for Indian Languages
- Trained a word-level LSTM Neural Machine Translation system on 70,000 lines of English-Hindi corpus and transferred the weights to tune on 30,000 lines of English-Marathi text to get the final BLEU score of 0.43.
Python
TensorFlow
Pandas
Keras
Matplotlib
LSTM
Natural Language Processing
Hindi Word2Vec Embeddings
- Generated 128-dimensional embeddings for 843,415 most common words in 5,000,000 lines of monlingual Hindi corpus from CFILT IIT Bombay by training Skip-gram and CBOW Word2Vec models from scratch in Tensorflow.
Python
TensorFlow
Scikit-Learn
Matplotlib
Deep Learning
Natural Language Processing
Model vs Modalities
- Explored the effect of the statistical properties of data on the performance of learning models as a function of their complexities.
- Compared Vector AutoRegression (linear dynamics), Dynamic Mode Decomposition (linear approximation of non-linear dynamics) and LSTM (non-linear dynamics) on short-term intraday stock prices and long-term GDP rates.
Python
TensorFlow
Statsmodels
Scikit-Learn
Matplotlib
Regression
LSTM
DMD
Extractive Text Summarization
- Applied the TextRank algorithm on Glove-embedded vectors of sentences and computed weighted average with sentence-level Feature Terms to include the amount of information conveyed and the relevance of each sentence.
Python
NLTK
Spacy
NetworkX
NLP
Image Compression using Principal Component Analysis
- Built an RMarkdown notebook demonstrating the research and implementation of Principal Component Analysis for dimensionality reduction, applied towards image compression use-case.
Python
NLTK
Spacy
NetworkX
NLP