文字分割
文字分割(英文:text segmentation)係指將要處理嘅一段字分割做若干嚿各自有意思嘅單位,方便做一步嘅分析或者其他處理。常見嘅有將段字切割做句子或者個別嘅字呀噉。
即係例如[1]:
- Input:San Pedro is a town on the southern part of the island of Ambergris Caye in the Belize District of the nation of Belize, in Central America. According to 2015 mid-year estimates, the town has a population of about 16, 444.
- Output:
- San Pedro is a town on the southern part of the island of Ambergris Caye in the 2.Belize District of the nation of Belize, in Central America.
- According to 2015 mid-year estimates, the town has a population of about 16, 444. It is the second-largest town in the Belize District and largest in the Belize Rural South constituency.(分割咗做唔同句子)
睇埋
編輯攷
編輯- ↑ Freddy Y. Y. Choi (2000). "Advances in domain independent linear text segmentation". Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-00). pp. 26–33.