Jasmine Collins


Protein-Ligand Scoring Meets Machine Learning

Advisor: Dr. David Koes, Assistant Professor of Computational and Systems Biology (University of Pittsburgh)
Collaborators: Matt Ragoza and Noah Bastola

Accurate prediction of protein-ligand interaction potency is an essential part of effective computational drug discovery. Despite the large amount of activity data available from high throughput screens, most existing scoring functions have not been parameterized using this data due to the lack of structural information and large amounts of noise in the dataset. In this project I used machine learning methods such as linear and logistic regression to develop novel protein-ligand scoring functions from our large and challenging data source. As an exciting next step, we are working on using convolutional neural networks applied to the same problem.

Below is a screencast that I made a while ago as a submission for the 2016 NCWIT Collegiate Award (which I actually ended up winning - woohoo!!) In it, I give an overview of the project and the results I achieved with linear and logistic regression.

Matt and I with our poster at the 251st American Chemical Society National Meeting & Exposition in San Diego, CA (March 2016):