Zero-Shot Code Generation and Repair using ChatGPT
  • Applied ChatGPT on questions sampled from competitive coding benchmark datasets to identify a low success rate of 30%.
  • Implemented a Python pipeline that generates code using ChatGPT API, runs it against given test cases and creates signals form failed test cases for ChatGPT to iteratively repair non-functioning code.

Python

PyTorch

ChatGPT

Large Language Models

Natural Language Processing

Adversarial Perturbations for Robustness of Large Language Models
  • Tuned BERT and GPT2 on Author Sentiment Analysis task using PerSent dataset and evaluated their performance on adversarially perturbed and unperturbed datasets.
  • Simple character and word-level perturbations were sufficient to reduce the accuracy and f1-score of state-of-the-art language models by almost 4%, thereby showing a lack of robustness of these models.

Python

PyTorch

Transformers

TextAttack

Large Language Models

Natural Language Processing

Deep Neural Machine Translation for Indian Languages
  • Trained a word-level LSTM Neural Machine Translation system on 70,000 lines of English-Hindi corpus and transferred the weights to tune on 30,000 lines of English-Marathi text to get the final BLEU score of 0.43.

Python

TensorFlow

Pandas

Keras

Matplotlib

LSTM

Natural Language Processing

Hindi Word2Vec Embeddings
  • Generated 128-dimensional embeddings for 843,415 most common words in 5,000,000 lines of monlingual Hindi corpus from CFILT IIT Bombay by training Skip-gram and CBOW Word2Vec models from scratch in Tensorflow.

Python

TensorFlow

Scikit-Learn

Matplotlib

Deep Learning

Natural Language Processing

Model vs Modalities
  • Explored the effect of the statistical properties of data on the performance of learning models as a function of their complexities.
  • Compared Vector AutoRegression (linear dynamics), Dynamic Mode Decomposition (linear approximation of non-linear dynamics) and LSTM (non-linear dynamics) on short-term intraday stock prices and long-term GDP rates.

Python

TensorFlow

Statsmodels

Scikit-Learn

Matplotlib

Regression

LSTM

DMD

Extractive Text Summarization
  • Applied the TextRank algorithm on Glove-embedded vectors of sentences and computed weighted average with sentence-level Feature Terms to include the amount of information conveyed and the relevance of each sentence.

Python

NLTK

Spacy

NetworkX

NLP

Image Compression using Principal Component Analysis
  • Built an RMarkdown notebook demonstrating the research and implementation of Principal Component Analysis for dimensionality reduction, applied towards image compression use-case.

Python

NLTK

Spacy

NetworkX

NLP