About the Project

With elections around the corner, we wondered how machine learning could help us understand the voting populace. Election predictions are typically determined by polling data, but they often do not really capture the nuances of how people feel. Given the difficulty and complexity of predicting election results, we decided to explore social media comments about presidential candidates in order to find additional insights into voters decision-making.

We used a pre-trained natural language processing (NLP) model to predict the sentiment of individual reddit comments that included the keywords "Joe Biden" or "Donald Trump".

This process assigned the goodness level of individual words in a reddit comment on a score from -1 to 1 (negative sentiment to positive) and calculated the average sentiment for each user's comment.

Sentiment analyses are used to understand the general feelings that people have about a certain topic or brand in order to drive changes processes to increase.

Our work can be used to understand how people feel about the presidential candidates in order to:

  • Change campaign tactics to shift people who are on the fence
  • Find common ground and identify what's important to voters
  • Filling in information gaps left behind by polling


Project Contributors

Aranza Ballesteros

Aranza, short for Aranzazu, is a project manager who focuses on operational effectiveness and efficiency. She believes that understanding our current landscape though clear data is the first step towards making meaningful changes in our processes and systems.

Her interest in data arose during her time as a journalist from her admiration of data activists' work to drive government accountability. Now, she uses her love for storytelling to tell data stories about people, from diversity in the workplace to the spread of misinformation in social media.

Aranza deployed this project's sentiment analysis and served as the Tableau consultant.

Sara Rosario

With a background in chemistry and a passion for liquid chromatography mass spectrometry and proteomics, Sara manages a research laboratory focused on cardiovascular disease prevention.

Sara developed a solid understanding of statics and strong data visualization skills through her work in research. Her deep desire to understand large datasets brought her to develop her machine learning and big data skills with the goals of having better data-informed insights into chemistry, disease and humanity.

Sara served as the lead researcher and machine learning model subject matter expert for this project.


Helen Nunez

Image of Project Contributor Helen Nunez

Helen is an Engineering Program Manager and a problem-solver at heart. She likes to makes decisions based on data and objectivity. While managing simultaneous projects from multiple stakeholders, it's critical for her to be able to visualize the data for fast and accurate decision making. She has acquired the programming and data analytics skills to be more effective with the data captured in her projects. Next, she is interested in learning how Machine Learning can help her predict and add more value to projects in her role.

For this project, Helen served as the polling data researcher and lead Tableau designer.

Laura Nicholson

photo of project member Laura Nicholson

Laura is lost without a problem to solve. Her focus is on getting data right and using it effectively to understand the world and to tell stories. She brings to the team a master's of environmental science and management, a bachelor of arts in interdisciplinary science, and years working in renewable energy, energy efficiency, and new technologies for a smarter electric grid. She connects with this project at the level of trying to understand language, big data, and modeled propensities, to name a few.

Laura served as the lead website designer and lead debugger for this project and supported with model-training efforts.