In this post I will show how one can use natural language processing to extract keywords (aspects) from a product review. The idea is to essentially try to replicate what Amazon does with its reviews. For example in the image below, you can see that, from reviews of a given product, Amazon, extracts key-words and then allows users to search comments by these key-words.
Aspect Based Search in Amazon.in
I will try to replicate the process by which these key-words are generated which can then be used to do a variety of tasks ranging from aspect based search to aspect based sentiment analysis. Let’s get started.
NLP: What is it?
Before we get to the task of aspect extraction. Let’s understand what Natural Language Processing is.NLP, is an attempt to make a computer understand human language. Computers can easily understand programming languages. But how do we make sure that computers are able to understand human language? To understand NLP, let’s understand what major tasks can be classified as NLP tasks.
1. Tokenization: Humans can read and understand languages because we can easily identify words, sentences, paragraphs etc in a given document. Most NLP frameworks allow computers to understand what parts of a text are words, sentences or paragraphs.
2. Parts of Speech Tagging: Another, characteristic of language understanding is the ability of humans to be able to identify grammatical element in language. For example we can easily find out in a given sentence which word acts as a verb, a noun or a pronoun etc. NLP frameworks allow, computers to identify the grammatical function of each word in a text.
3. Dependency Parsing: When we look at any sentence, we are able to identify not only the grammatical elements but also how these are related to each other in the form of what is the “subject” and what is the “object” in a given sentence. We also understand what is the noun phrase in a sentence and how does it relate to other phrases as well as words in a given sentence. NLP tool-kits help in doing this task as well.
4. Co-reference Resolution: Humans are able to easily decipher how pronouns are related to different grammatical elements in a sentence. For example in the text
“Modi accuses opposition of double standards. He made this accusation in parliament today”
We know that “He” in the second sentence refers to Modi. Using NLP frameworks one can easily build rules to understand which pronoun in the text refers to which noun or is related to which noun phrase.
5. Named Entity Recognition: Its very natural for us to figure out if a word in a sentence refers to a person, a place, date, corporate-entity etc. Even when we have not seen a word before we are able to correctly guess which entity is referred to by that word. For example in the sentence below:
“Concordia announces $30 Million dividend for its shareholders”
Although we may never have heard that “Concordia” is a company but we can still reasonably say that it is a corporate entity from the context of the sentence. NLP frameworks also, help a computer to understand which ‘entity’ a given word refers to.
Extracting Keywords (Aspects)
To replicate what amazon does, I will show how to extract key words. I will rely on a rule based approach, that will exploit the grammatical structure of reviews. The assumption for this approach to work appropriately is that, the comments in general are written, in a manner that respect rules of grammar. The grammatical rule that we will use is:
“Most frequent nouns in the text from which commonly used words have been removed, will reveal the key-words (aspects) in the text”
To implement this rule over a corpus of product review comments, following pre-processing will be needed.
- 1.Extract word tokens from the corpus.
- 2.Remove common words
- 3.Extract all the nouns
- 4.Find out top 5, most frequent nouns, these will be the key-words/aspects
I used spacy to implement my NLP pipeline.
Below is the function I wrote to extract aspects from reviews about a particular product, a very popular brand of mobile phones. The reviews were collected by a team of students working on a term project at Jigsaw Academy Bangalore. (https://www.jigsawacademy.com/)
Live Demo:
You can see a live demo here http://ec2-18-222-173-193.us-east-2.compute.amazonaws.com:8051/ (This is a relatively small machine, don’t tax it too much!!!)
Next steps?
Once you are able to identify aspects from a product review, you can try to build an aspect based search or can even attempt to do an aspect based sentiment analysis. An aspect based sentiment analysis can be used to find out what people feel about different features of a product. For example, are people generally happy about battery life of a mobile phone or not. You can extend this code to this.