Primary Purpose

I am creating this blog to archive and publicize my thought processes as I explore career paths. As a recent statistics PhD graduate from a top 10 university, I have focused on researching probabilistic models for text in social networks. Data sets I have worked extensively with include a corpus of political blog posts and a tree of Reddit comments.

For my fifth and final year of graduate school, I divided my work hours up as follows. During fall semester I spent approximately 15 hours a week TAing and 25 were divided between finishing a paper from this summer, statistics consulting class, and dissertation writing. During spring semester, about 25 hours a week were spent on dissertation writing and 15 on TAing. Sometimes a few hours of defense preparation replaced dissertation work. To add more purpose to my presentation, I signed up to give talks about my research in the university network analysis center and machine learning lab.

This blog will be dedicated to pursuing interests which are separate from my job. These include sports analytics, addiction, brain machine interface, and social networks. My plan is to utilize the skills I have acquired through my academic training to share insights about these areas. This will include correcting misconceptions and sharing strategy through data summary and visualization.

Brain-to-text will be the first topic I cover. Here the goal is as follows: develop an algorithm which takes as input sequences of brain activity, via EEG, and outputs a corresponding intended sequence of text. A reasonable approach may be to read and summarize work done by Neurosky. I ultimately expect some kind of multivariate-sequence-to-sequence classifier will be helpful, perhaps by a recurrent neural network.

NBA analytics will be the second topic I cover. The main data of interest here are sportVU player tracking data, box scores, possession logs, shot logs, and rebound logs. Within this category two existing areas of research include possession classification and expected point valuation. A solution recommending possession type for given game clock and score contexts would be of use to coaches, as would a solution which classifies and characterizes movements which lead to increases in expected point value.

Regarding addiction, I offer this advice to recover. Find a reason to quit that matters (don’t want to lead my brother into addiction) and consistently remind it to yourself. Be with a support system when quitting, whether a hospital or family. Tell doctors, psychologists, and psychiatrists the struggle, and take medication as directed. Get a regular bed time and wake up time. Eat three balanced meals a day. Go to at least three recovery meetings per week. Exercise at least three times per week. Take time away from work if necessary.

Finally, in social networks, an app which recommends who to talk with about different topics is interesting. Recommendations can take into account amount of experience and path length (1 = friend or 2 = friend of friend) availability to meet. LinkedIn data would be good for work related purposes, and Facebook would be for non-work. Wikipedia or Reddit articles would be good corpora from which to learn a topic hierarchy and top words. These pages can be used to automate the composition of conversation among users.

With recent experience as a data scientist at a water technology company, I am better prepared to contribute successfully in an industry setting. The emphases on statistical learning, reproducible research reports, and software development have me aimed toward a job at a start up or in a government contracted small business. Please wish me luck!


My well being partly depends on g1fts from generous supporters. Please send a friend whatever you can to help by P@yPal at Thanks for any help!