Application of a text analysis approach in O2O customers service records categorization
Zhang, Yichi (2018)
Zhang, Yichi
2018
Tietojenkäsittelytieteiden tutkinto-ohjelma - Degree Programme in Computer Sciences
Luonnontieteiden tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-07-31
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201808162367
https://urn.fi/URN:NBN:fi:uta-201808162367
Tiivistelmä
With the O2O industry entering the oligarchic era, many O2O companies in China like DiDi face the challenges of acquiring the helpful information to improve the service quality and obtain the new requirements. As the big data processing and machine learning are getting popular, one of the direct to fulfill the needs of DiDi is to acquire the information from the customer service records. With that purpose, automatic classification of the customer service records data is the initial step.
This thesis aims to find a solution for DiDi to categorize their customer service records data into the pre-defined categories. The solution is based on the traditional text categorization flow and introduces a way to build an enhancement feature collection instead of using the original feature collection. Classic supervised learning algorithms in the traditional text categorization flow are also demonstrated in the thesis.
According to the regulation of the expression of customer service records data. The pre-defined syntax pattern is introduced to build the enhancement feature collection. The text analysis approach Dependency Parser is first introduced to obtain the syntax pattern from the document.
Tests are conducted to compare the performance of most of the algorithms mentioned in the thesis and the best ones are chosen to be applied in the final solution. The performance tests are also made to prove the better performance in using the enhancement feature collection than the original feature collection.
This thesis aims to find a solution for DiDi to categorize their customer service records data into the pre-defined categories. The solution is based on the traditional text categorization flow and introduces a way to build an enhancement feature collection instead of using the original feature collection. Classic supervised learning algorithms in the traditional text categorization flow are also demonstrated in the thesis.
According to the regulation of the expression of customer service records data. The pre-defined syntax pattern is introduced to build the enhancement feature collection. The text analysis approach Dependency Parser is first introduced to obtain the syntax pattern from the document.
Tests are conducted to compare the performance of most of the algorithms mentioned in the thesis and the best ones are chosen to be applied in the final solution. The performance tests are also made to prove the better performance in using the enhancement feature collection than the original feature collection.