Develop a Part-of-Speech Tagger and a Tagger-Maker
Algorithms, Implementations, Results, and APIs
- Editore:
LAP Lambert Academic Publishing
- EAN:
9783659376221
- ISBN:
3659376221
- Pagine:
- 68
- Formato:
- Paperback
- Lingua:
- Tedesco
Descrizione Develop a Part-of-Speech Tagger and a Tagger-Maker
This project is aimed to build an efficient, scalable, portable, and trainable part-of-speech tagger. Using 98% of Penn Treebank-3 as the training data, it builds a raw tagger, using Bayes¿ theorem, a hidden Markov model, and the Viterbi algorithm. After that, a reinforcement machine learning algorithm and contextual transformation rules were applied to increase the tagger¿s accuracy. The tagger¿s final accuracy on the testing data is 96.51% and its speed is about 26,000 words per second on a computer with two-gigabyte random access memory and two 3.00 GHz Pentium duo processors. The tagger¿s portability and trainability are proved by the tagger-maker¿s success in building a new tagger out of a corpus that is annotated with the tagset different from that of Penn Treebank.