NATURAL LANGUAGE PROCESSING

We have been conducting research focusing on the “language” that people use everyday.

AINU LANGUAGE

Research Background and Issues

It is believed that about 90% of the world’s languages, including the Ainu language, are in danger of extinction. The Ainu language, the language of the Ainu people living in Okhotsk
There are no written forms. All records of the language are oral and written, passed down from generation to generation, a fact that makes it difficult to preserve the language.

Research Objective

The aim of this project is to support the maintenance and recovery of crisis languages such as Ainu.

Initiatives to Solve Issues

Development of technology for Ainu language learning and research development

Collection and digitization of Ainu language materials
Development of optimal language processing technology for the Ainu language
Public disclosure of developed technologies

Achievements to Date

Our aim is to realize “machine translation” from “dictionary construction” in natural language processing. At present, more than 1,750,000 Ainu texts and their Japanese counterparts are collected as digital data and used for the development of natural language processing techniques for Ainu language processing. This has led to the successful development of Ainu notation + word segmentation, grammatical analysis (part-of-speech tagging) and word-level automatic translation.

Future Perspective

Construction and maintenance of a natural language processing environment for the Ainu language.
Collection and analysis of Ainu texts
Construction of a digitalized corpus of the Ainu language including word and part-of-speech information
Development of technology for analyzing Ainu texts

HARMFUL INFORMATION DETECTION

Research Background and Issues

With the spread of the Internet, “cyber bullying” has become a social problem. Harmful words posted on the Internet can be signs of damage caused by cyberbullying or threats of damage that could become serious in a short period of time, such as criminal threats.

Research Objective

This project aims to develop assistive technologies for net patrol activities.

Initiatives to Solve Issues

Development of technology for automatic detection of harmful posts

Reduce the burden of Internet patrol activities
Quantification of harmfulness of online postings
Detection of posts that exceed a certain level of harmfulness

Achievements to Date

Our goal is to discover sentences that contain harmful words with high accuracy by applying natural language processing and deep learning techniques. Currently, we have proposed a category-based relevance maximization method that determines the harmfulness of words contained in sentences by considering three categories of words: defamatory words, words that provoke violence, and obscene words, and have successfully determined harmful posts using deep learning with approximately 88% accuracy.

Future Perspective

Development of automatic judgment method
Application to the real world and verification of effectiveness

COMPUTATIONAL MODEL OF METAPHOR

Research Background and Issues

In different countries, even synonymous words may have different interpretations due to cultural and historical differences. Therefore, we focus on the fact that the image of a target word (query) can be expressed in a broad sense by tracing the metaphorical relationship between words, and by conducting this in multiple languages on the Internet, we compare the semantic interpretation of queries and analyze the differences.

Research Objective

This project aims to eliminate interpretation discrepancies due to language differences.

Initiatives to Solve Issues

Application of Metaphorical Drawing Techniques in English

Determination of Metaphor Index
Determination of search engines to be used as corpora
Grammar-aware collection of knowledge fragments
Comparison with multilingual methods

Achievements to Date

We have successfully built a knowledge acquisition system (Murasaki) using a computational model of metaphor, implemented Murasaki in multiple languages (Japanese, Vietnamese, Korean, Chinese, and English), and conducted a comparative analysis of “word images” in multiple languages. We have also conducted extraction experiments using English figurative indexes, re-extracting correct figurative expressions among the extracted expressions, and investigating expressions that should be excluded, such as personal names and pronouns.

Future Perspective

Development of a model to quantify figurativeness of language
Development of figurative drawing methods
Development of web applications based on figurative drawing
Multilingualization of figurative drawing techniques
Analyze figurative expressions using figurative drawing techniques.

EMOTIONAL INFORMATION PROCESSING

Research Background and Issues

With the spread of smartphones, the dissemination of text information has become familiar. Under such circumstances, there is a growing interest in the automatic analysis of user behavior and intentions. We are developing unprecedented open software for sentiment analysis, aiming for a technology that can appropriately recognize user sentiment in real time from text information.

Research Objective

This project aims to eliminate interpretation discrepancies and differences by automatically analyzing user behavior and intentions.

Initiatives to Solve Issues

Development of Emotional Information Open Software

Analyze input text
Evaluate the emotional meaning of each word
Assign appropriate emotional information

Achievements to Date

The emotion analysis system ML-Ask originated in this laboratory. This research has provided a baseline for similar studies and is being used as an experimental tool. This laboratory has also applied this to research on sentiment analysis during posting on Twitter, and has developed a basic Tweet acquisition program.

Future Perspective

Application to corpus creation with emotional information
Improvement and expansion of in-system dictionaries