3

I have downloaded comments from a website which asked people whether they supported or opposed the implementation of a certain political policy related to immigration.

I would like to get any resources or ideas on how to extract aggregate support/opposition to this policy.

In particular, I need a method that correctly identifies that all of the following comments are "anti-immigration" in some senses such as:

  1. We need to give Americans first priority in the job market.
  2. We should not let Americans suffer more unemployment on account of immigrants.
  3. Immigrants should not be allowed to take American jobs.

Similarly the method should be able to identify "pro-immigration" comments, such as:

  1. Providing this service to immigrants will be good for the economy.
  2. The American economy will suffer if immigrants aren't allowed to continue to work here.
  3. I do not think that passing this law will be detrimental to American jobs.
eliasah
  • 545
  • 5
  • 16
ved
  • 131
  • 2
  • So is the goal to create a general algorithm that will tell you if a statement is in support/opposition of an arbitrary topic. Or is the the goal to have your algorithm work for a single topic (immigration for example)? – Armen Aghajanyan Jan 05 '16 at 18:33
  • Lets go for a single topic. FYI the topic is more specific than simply immigration- more like a bill whose impact is on students in STEM fields with F1 visas in USA. – ved Jan 05 '16 at 21:15

2 Answers2

1

I agree with @thebiro - you could start by classifying a sample of the comments as being either opposed to or in support of the policy. If you only need basic for/against classification, you can do binary classification (e.g. against = 0, for = 1). If you need specify the degree to which a statement is for or against the policy, then you could define a scale that indicates the degree to which a comment supports the policy (e.g. strongly against = -3, strongly for = 3). Once you have chosen your scale and manually classified your sample comments, you have a dataset that you can use to train a model.

Once you have your training data, you need to come up with a numerical representation of each comment. There are too many possible approaches to enumerate here, but some basic concepts are bag of words and word vectors(aka word embeddings). This Kaggle tutorial may help explain those concepts.

Finally, you need to train a model that takes the numerical representation of each comment as an input and outputs a sentiment score (0 or 1 for binary or a number on your scale). You have a lot of choices for the type of model you use. If you are working in Python, you can use one of the supervised learning methods in Scikit-learn. Scikit-learn also has a tutorial on working with text. You train your model on the sample you manually classified (Something like model.fit(training_data_inputs) in Scikit-learn) and then predict the output for the rest of your dataset (model.predict(test_data_inputs) in Scikit-learn).

Some people are also using neural networks for sentiment analysis. Keras is a great Python library for building neural networks and has sentiment analysis examples available on Github.

Andrew
  • 256
  • 2
  • 4
0

You need first to classify these comments into pro and anti immigrants.

Then, the next step should be analyzing your documents with text mining tools.

TheBiro
  • 181
  • 10