Social Network Data Science Research

Hi friends. I recently earned a PhD in statistical science from Duke. For much of the last four years, I’ve researched text data in social networks… with a passion. Early in life, I learned virtual conversations transcend space and time, as I maintained interaction over multiple days, with people who lived across town. I just remembered some amusing screen names on AIM, haha.

I believe conversations like these matter. Our messages, and the evolving network of friendships they contribute to, represent a part of our personalities and values. Collectively, they represent an aspect of planet earth’s society. Two examples from my life are finding a college best friend on Facebook before our first class together and my current partner on Coffee Meets Bagel prior to dinner.

My purpose in this realm is to summarize the overwhelming amounts of data to clarify topics which matter most and characterize communities which unite those with similar wills. I do this by writing code, implementing models and presenting it. For now I summarize and visualize a network of political bloggers and conversation topics.

Social Network Analysis:

Researching over 100,000 political blog posts, written on hundreds of websites, revealed a network visualization. Information from January 2012 is shown below. More details about how I made it are here: link_block_lda_results. Some methods I’ve applied from the literature are the mixed membership stochastic block model for networks, latent Dirichlet allocation topic model, and other similar versions of these.


These results were obtained using a probabilistic graphical model developed with David Banks at Duke.

Text Mining:

With simulation we provide an assignment for each blog post such that bloggers in the same group typed about the same set of topics and shared hyperlinks similarly. It’s implemented in Python and presented in R. Here is a summary of some topics learned:


Hopefully this has provided some insight into the capabilities of social network research and text mining using publicly available data. This kind of work might be helpful or applicable in digital marketing and political campaigns. If you’re interested in talking or working with me on either of these topics, feel free to text or call 541-633-5550 or email Peace be with us!


My well being partly depends on g1fts from generous supporters. Please send a friend whatever you can to help by P@yPal at Thanks for any help!

A Reasonable Wealth Distribution

What follows is a hand wavy articulation, with approximations and assumptions, to arrive at an idealistic wealth distribution. I hope you will bear with me.

How Much Money to Go Around?

Let’s begin by assuming there is a finite amount of money in the US, right now. After subtracting debts from assets, we’ll go with it being around $100 trillion [1]. There also is some number of people about 300 million [3]. Therefore, the average amount of wealth per person is: $300 thousand. With total equality, or socialism, this is how much each person would get. I don’t have that much This is the United States though, where we believe in incentives and opportunity for growth.

Positing a Trajectory

Let’s start with a premise that a person is born with essentially no money. Some people will object to this, since one might inherit from a wealthy parent or two, but maybe at birth they are at zero, without clothes on their back or a dollar in their piggy bank. This goes with the quote “time is money”. Over time, their parents give them some. Later, maybe other family and friends give some. The person eventually works and learns how to earn. It creates an appealing sense of progress for a person to accumulate money as age increases. For an old person, they could achieve a dream of obtaining wealth.

Age Distribution of our Population

Another distribution we can think about is how many people are each age. Using the plot below, provided on Calculated Risk blog, and based on a forecast using US Census data, we see about 6 or 7 percent of the population in each 5 year long age group up to age 65, and then they are around 5, 4, 3, 2, 1, .5, .3, and .1 for the remaining age groups in order.

Age distribution on year

What Would a Linear Increase be Like?

Theoretically, if a majority agree on it, we could choose a linear increase with age, so that the additional amount of money acquired every five years is constant. This would mean those with extra donating and those with too little getting blessed. For this assumption to be comfortable, it helps to consider each person an individual who is initially dependent on others and gains independence with age, becoming self sufficient and able to provide. I have written code to simulate distributing wealth to all ~300 million people, according to this linear increase, using an approximation of the population distribution above, until reaching ~$100 trillion. This is shown in the rainbow colored plot.

Reasonable Wealth by Age Group

My Money Situation

A little about me, I am 28 and have survived on around $24,000/year for the last few years with an additional few thousand of support from family. This is in the cities of Durham, NC and Richmond, VA, each with around 250,000 people and a livable balance of urban and rural environment. Being in school during this time was enjoyable, and I worked hard on answering questions from professors. In my bank account there are around $2,000, with around $6,000 in retirement and about $2,000 in an HSA. I’d have close to the $25,000 or so associated with my age group above if not for having been paying off my student loans.

How Far Off Are We?

The purpose of this post is providing a theoretically reasonable wealth distribution – maybe one to gradually aim for. The actual wealth distribution is given below [3], and it appalls me. The top 1% have more than the bottom 80%. If you’re in the top 1%, please share your money. An accompanying video is available in the citation, and it deserves resurfacing.

Actual Distribution of Wealth in the United States

One criticism is that people who live in cities, who have disabilities, and who support children have more expensive lifestyles and therefore need more money. This is understandable. Based on average rent prices for one bedroom apartment rentals in 50 major US cities, the median city costs about $1000/month. However, in San Francisco,  the cost of living in highest, approximately $4000/month or $36,000/year, but that is only four times the number quoted above, not 40.

Let’s Get Back on Course!

How do we make this theoretically appealing wealth distribution more realistic? I believe a hundred-millionaire tax would greatly improve the lives of the vast majority, well over 99% of those living in the United States. I eventually think a millionaire tax may help, but considering many politicians have at least a million dollars, we better start with something that would monetarily benefit almost all of us, including law makers. Thanks for reading this.


My well being partly depends on g1fts from generous supporters. Please send a friend whatever you can to help by P@yPal at Thanks for any help!


Citation Nation:

[1] Andrew Lasane. The Average Cost of One Bedroom Apartments in 50 Major U.S. Cities. June 8, 2016.

[2] Bill McBride. Calculated Risk. Wednesday, August, 14, 2013. By Request: U.S. Population by Age (earlier was by distribution), 1900 through 2060.

[3] Moore, Madison. February 24, 2014. Watch: The Absolutely Shocking Truth Most People Don’t Know About Wealth Inequality In America.

[4] US Census. U.S. and World Population Clock. August 26, 2019.

[5] Wikipedia – Financial Position of the US. August 26, 2019.


Primary Purpose

I am creating this blog to archive and publicize my thought processes as I explore career paths. As a recent statistics PhD graduate from a top 10 university, I have focused on researching probabilistic models for text in social networks. Data sets I have worked extensively with include a corpus of political blog posts and a tree of Reddit comments.

For my fifth and final year of graduate school, I divided my work hours up as follows. During fall semester I spent approximately 15 hours a week TAing and 25 were divided between finishing a paper from this summer, statistics consulting class, and dissertation writing. During spring semester, about 25 hours a week were spent on dissertation writing and 15 on TAing. Sometimes a few hours of defense preparation replaced dissertation work. To add more purpose to my presentation, I signed up to give talks about my research in the university network analysis center and machine learning lab.

This blog will be dedicated to pursuing interests which are separate from my job. These include sports analytics, addiction, brain machine interface, and social networks. My plan is to utilize the skills I have acquired through my academic training to share insights about these areas. This will include correcting misconceptions and sharing strategy through data summary and visualization.

Brain-to-text will be the first topic I cover. Here the goal is as follows: develop an algorithm which takes as input sequences of brain activity, via EEG, and outputs a corresponding intended sequence of text. A reasonable approach may be to read and summarize work done by Neurosky. I ultimately expect some kind of multivariate-sequence-to-sequence classifier will be helpful, perhaps by a recurrent neural network.

NBA analytics will be the second topic I cover. The main data of interest here are sportVU player tracking data, box scores, possession logs, shot logs, and rebound logs. Within this category two existing areas of research include possession classification and expected point valuation. A solution recommending possession type for given game clock and score contexts would be of use to coaches, as would a solution which classifies and characterizes movements which lead to increases in expected point value.

Regarding addiction, I offer this advice to recover. Find a reason to quit that matters (don’t want to lead my brother into addiction) and consistently remind it to yourself. Be with a support system when quitting, whether a hospital or family. Tell doctors, psychologists, and psychiatrists the struggle, and take medication as directed. Get a regular bed time and wake up time. Eat three balanced meals a day. Go to at least three recovery meetings per week. Exercise at least three times per week. Take time away from work if necessary.

Finally, in social networks, an app which recommends who to talk with about different topics is interesting. Recommendations can take into account amount of experience and path length (1 = friend or 2 = friend of friend) availability to meet. LinkedIn data would be good for work related purposes, and Facebook would be for non-work. Wikipedia or Reddit articles would be good corpora from which to learn a topic hierarchy and top words. These pages can be used to automate the composition of conversation among users.

With recent experience as a data scientist at a water technology company, I am better prepared to contribute successfully in an industry setting. The emphases on statistical learning, reproducible research reports, and software development have me aimed toward a job at a start up or in a government contracted small business. Please wish me luck!


My well being partly depends on g1fts from generous supporters. Please send a friend whatever you can to help by P@yPal at Thanks for any help!