What is Text Preprocessing?

Text is a kind of unstructured data. Before doing any NLP modelings, we have to do crunching. This is called text preprocessing. The aim of text preprocessing is to extract interesting and non-trivial knowledge from unstructured text data and retrieve to satisfy a user’s need for information.

For my own preference, the general ways to preprocess text include noise removal, text cleansing, text normalization and tokenization.

#text #preprocessing


Comments