Back to Glossary

What is text annotation?

Text annotation is the process of adding helpful and informational tags to raw text so that NLP models can understand and learn from it. It involves marking words, phrases, or whole documents with labels such as category, sentiment, intent, or named entities. This makes the original text into training data for NLP models.

In commerce and search, text annotation powers use cases like catalog curation (extracting brand, color, size from titles/descriptions), search & product recommendations (training ranking models on queries, reviews, and clicks), product comparison (aligning like-for-like attributes), and multilingual assistants (intent and entity tagging across languages)

Common types:

  1. Text classification: label an email or review as “support request,” “spam,” or “feature feedback.”
  2. Sentiment analysis: mark a sentence as positive, neutral, or negative, sometimes by aspect (delivery, quality, price).
  3. Named Entity Recognition (NER): highlight spans like brands, people, products, places, dates.
  4. Intent and relations: capture what the user wants to do, or how two entities are related.

Teams use text annotation for search relevance, catalog enrichment, product recommendations, customer support routing, and compliance monitoring. Projects work best when you start with a clear label list (taxonomy), simple rules, examples of edge cases, and a review step to keep quality high.

Read here, how to work with image annotation UI step by step on Taskmonk.

Example

Scenario: You want to improve search and conversions on an e-commerce site.

  1. Take 10,000 product reviews and 5,000 product titles.
  2. Annotators highlight entities in titles: brand, color, size, material.
  3. They label each review with sentiment, and add aspect sentiment where useful, for example “comfort: positive,” “delivery: negative.”
  4. They classify the review’s intent, such as “pre-purchase question,” “product feedback,” or “shipping issue.”

Result: You get two JSONL files. One contains entity spans with normalized values that match your catalog. The other contains sentiment and intent labels with confidence scores. Search now understands real product attributes, filters work better, recommendations improve, and support tickets route to the right queue.