Named Entity Recognition and Knowledge Graph for Natural Language Understanding
When handling textual descriptions in data science tasks, numerous approaches exist to comprehend them, including topic modeling, text classification, and sentiment analysis etc. However, these methods often necessitate additional explanation for audiences to grasp the results fully, particularly for individuals lacking technical expertise.
In this post, a novel method will be introduced, aiming to interpret relationships between elements in an easily understandable manner. This approach utilizes named entity recognition
and knowledge graphs
, offering a more accessible way to comprehend connections within the data.
1 Named Entity Recognition
Named Entity Recognition (NER) is a subfield of natural language processing (NLP) that deals with the identification and classification of named entities in text. Named entities are real-world objects or concepts that are mentioned in text, such as people, places, organizations, and products.
Here are a few well-known NER solution built with Python:
- spaCy: spaCy is a widely used and efficient library for natural language processing in Python. It provides a pre-trained statistical model for named entity recognition and allows customization of NER pipelines.
- NLTK (Natural Language Toolkit): NLTK is a comprehensive library for working with human language data. It includes various tools and modules for NER, albeit it might require more manual configuration compared to spaCy.
- Hugging Face Transformers: Hugging Face Transformers is a popular library providing various pre-trained language models for NER and other NLP tasks. It's known for its extensive collection of state-of-the-art models and easy-to-use pipelines.
- AllenNLP: AllenNLP is a powerful NLP library built on PyTorch, offering pre-built models and tools for various NLP tasks, including NER.
- Flair: Flair is an NLP library that allows for easy use of pre-trained word embeddings and state-of-the-art NLP models. It provides functionality for NER among other tasks.
- Stanford NER: Stanford NER is a widely-used Java-based tool that also offers Python bindings. It uses a statistical model to recognize named entities in text.
Here the spaCy
is used to perform the NER task of below text:
texts = [
"Apple is planning to release a new iPhone model next month.",
"Samsung unveiled its latest flagship smartphone at the tech conference.",
"Google is investing heavily in artificial intelligence research.",
"Microsoft announced a major update for its Windows operating system.",
"Amazon launched a new line of smart home devices.",
"Google faced criticism over its data privacy policies.",
"Tesla's CEO Elon Musk tweeted about the company's future plans.",
"Uber is expanding its services to additional cities worldwide.",
"Netflix released a teaser for its upcoming original series.",
"Apple introduced new features for its social media platform."
]
And the spaCy pipeline is applied and the results are converted into a Pandas DataFrame:
python -m spacy download en_core_web_md
in the terminal if you have not downloaded en_core_web_md
before.This is the output:
index | text | entities | labels | |
---|---|---|---|---|
0 | 0 | Apple is planning to release a new iPhone mode... | [Apple, iPhone, next month] | [ORG, PRODUCT, DATE] |
1 | 1 | Samsung unveiled its latest flagship smartphon... | [Samsung] | [ORG] |
2 | 2 | Google is investing heavily in artificial inte... | [Google] | [ORG] |
3 | 3 | Microsoft announced a major update for its Win... | [Microsoft, Windows] | [ORG, PRODUCT] |
4 | 4 | Amazon launched a new line of smart home devices. | [Amazon] | [ORG] |
5 | 5 | Google faced criticism over its data privacy p... | [Google] | [ORG] |
6 | 6 | Tesla's CEO Elon Musk tweeted about the compan... | [Tesla, Elon Musk] | [ORG, PERSON] |
7 | 7 | Uber is expanding its services to additional c... | [Uber] | [ORG] |
8 | 8 | Netflix released a teaser for its upcoming ori... | [Netflix] | [ORG] |
9 | 9 | Apple introduced new features for its social m... | [Apple] | [ORG] |
2 Graphs Analytics
2.1 Create the Relationship DataFrame
This step is to create a DataFrame that has the source and target:
The relationship_df is like below, the Index
works as source and Entities
is the target.
Index | Text | Entities | Relationship | |
---|---|---|---|---|
0 | 0 | Apple is planning to release a new iPhone mode... | Apple | ORG |
1 | 0 | Apple is planning to release a new iPhone mode... | iPhone | PRODUCT |
2 | 0 | Apple is planning to release a new iPhone mode... | next month | DATE |
3 | 1 | Samsung unveiled its latest flagship smartphon... | Samsung | ORG |
4 | 2 | Google is investing heavily in artificial inte... | ORG | |
5 | 3 | Microsoft announced a major update for its Win... | Microsoft | ORG |
6 | 3 | Microsoft announced a major update for its Win... | Windows | PRODUCT |
7 | 4 | Amazon launched a new line of smart home devices. | Amazon | ORG |
8 | 5 | Google faced criticism over its data privacy p... | ORG | |
9 | 6 | Tesla's CEO Elon Musk tweeted about the compan... | Tesla | ORG |
10 | 6 | Tesla's CEO Elon Musk tweeted about the compan... | Elon Musk | PERSON |
11 | 7 | Uber is expanding its services to additional c... | Uber | ORG |
12 | 8 | Netflix released a teaser for its upcoming ori... | Netflix | ORG |
13 | 9 | Apple introduced new features for its social m... | Apple | ORG |
2.2 Create Graphs
Given above relationships, graphs can be created like below:
The output is:
Nodes: [0, 'Apple', 'iPhone', 'next month', 1, 'Samsung', 2, 'Google', 3, 'Microsoft', 'Windows', 4, 'Amazon', 5, 6, 'Tesla', 'Elon Musk', 7, 'Uber', 8, 'Netflix', 9]
Edges:
(0, 'Apple', {'relationship': 'ORG', 'desc': 'Apple is planning to release a new iPhone model next month.'})
(0, 'iPhone', {'relationship': 'PRODUCT', 'desc': 'Apple is planning to release a new iPhone model next month.'})
(0, 'next month', {'relationship': 'DATE', 'desc': 'Apple is planning to release a new iPhone model next month.'})
(1, 'Samsung', {'relationship': 'ORG', 'desc': 'Samsung unveiled its latest flagship smartphone at the tech conference.'})
(2, 'Google', {'relationship': 'ORG', 'desc': 'Google is investing heavily in artificial intelligence research.'})
(3, 'Microsoft', {'relationship': 'ORG', 'desc': 'Microsoft announced a major update for its Windows operating system.'})
(3, 'Windows', {'relationship': 'PRODUCT', 'desc': 'Microsoft announced a major update for its Windows operating system.'})
(4, 'Amazon', {'relationship': 'ORG', 'desc': 'Amazon launched a new line of smart home devices.'})
(5, 'Google', {'relationship': 'ORG', 'desc': 'Google faced criticism over its data privacy policies.'})
(6, 'Tesla', {'relationship': 'ORG', 'desc': "Tesla's CEO Elon Musk tweeted about the company's future plans."})
(6, 'Elon Musk', {'relationship': 'PERSON', 'desc': "Tesla's CEO Elon Musk tweeted about the company's future plans."})
(7, 'Uber', {'relationship': 'ORG', 'desc': 'Uber is expanding its services to additional cities worldwide.'})
(8, 'Netflix', {'relationship': 'ORG', 'desc': 'Netflix released a teaser for its upcoming original series.'})
(9, 'Apple', {'relationship': 'ORG', 'desc': 'Apple introduced new features for its social media platform.'})
2.3 Visualization
The linkage can be visualized as below:
The output is:
Connected components:
Component 1: {0, 9, 'next month', 'Apple', 'iPhone'}
Component 2: {1, 'Samsung'}
Component 3: {2, 5, 'Google'}
Component 4: {'Microsoft', 3, 'Windows'}
Component 5: {4, 'Amazon'}
Component 6: {'Elon Musk', 'Tesla', 6}
Component 7: {'Uber', 7}
Component 8: {8, 'Netflix'}
3 Conclusion
Unlike other NLP tasks highlighted earlier, this NER+Graph approach unveils crucial information within the text, leveraging graph structures to interconnect shared information. This method tends to be more deterministic, sidestepping lengthy learning curves and potential controversies when conveying insights to an audience.
Moreover, numerous graph algorithms can be employed to delve deeper into the connections among these elements. These explorations will be detailed in subsequent posts.