The Hierarchy of NLP Tasks

There are many things that are considered “NLP” today. The Call for Papers at ACL contains over 27 tracks that do not seem to follow any structure. Or is there?

Traditional NLP

Systems are considered to be an NLP one if they use language knowledge to process language data. While “counting the number of characters in a text file” is not an NLP task, counting the number of words in that same text file is an NLP task due to the requirement to know the definition of a “word” in languages. So if all NLP problems and thus systems require linguistic knowledge, we may be able to organize them using the type of knowledge that they require. To do this, we can go back to the field of linguitics (or perhaps just a school of it), which has a way to organize their knowledge based on roughly how “big” a unit the problem at hand is concern with. It goes as the following levels (quoted from SLP2, p.4):

Phonetics and phonology: knowledge about linguistic sounds (close to speech processing today?)
Morphology: knowledge about meaningful components of words (nowadays called “subwords”; such as the prefix “de-“ means reversing, the “‘t” means not, and s/es means plurality)
Syntax: knowledge about structural relationship between words (e.g., English follows the S-V-O structure.)
Semantics: knowledge of meaning (quite broad?)
Pragmatics: knowledge of the relationship of meaning to the goals and intentions of the speaker. (Intention is the keyword here.)
Discourse: knowledge about linguistic units larger than a single utterance (such as essays and dialogues)

This is what NLP before 2010 looks like. Almost every problem falls somewhere into this framework. My advisor works on coreference resolution for his 2002 PhD thesis, which belongs to discourse and pragmatics (because it goes beyond sentence-level understanding, linking expressions with real-world entities). The SLP2 textbook, written in 2008, was organized pretty much around this framework. Notice the humble (but exciting) Applications section at that time.

SLP2 table of contents

Contemporary NLP

But taxonomies are not fixed. Rather, they are interpretations of the epistemologists about their research area. Over time, such ares evolve. So do the taxonomies.

Let’s look at the 27 areas in ACL’25 call for papers:

    Computational Social Science and Cultural Analytics
    **Dialogue and Interactive Systems**
    **Discourse and Pragmatics**
    Efficient/Low-Resource Methods for NLP
    Ethics, Bias, and Fairness
    Generation
    Human-Centered NLP
    *Information Extraction*
    Information Retrieval and Text Mining
    Interpretability and Analysis of Models for NLP
    Language Modeling
    Linguistic theories, Cognitive Modeling and Psycholinguistics
    Machine Learning for NLP
    *Machine Translation*
    Multilinguality and Language Diversity
    Multimodality and Language Grounding to Vision, Robotics and Beyond
    NLP Applications
    **Phonology, Morphology and Word Segmentation**
    *Question Answering*
    Resources and Evaluation
    **Semantics: Lexical and Sentence-Level**
    Sentiment Analysis, Stylistic Analysis, and Argument Mining
    **Speech recognition, text-to-speech and spoken language understanding**
    *Summarization*
    **Syntax: Tagging, Chunking and Parsing**
    Special Theme: Generalization of NLP Models

Note that these ares are organized by alphabetical order. Why doesn’t the ACL group them in a more meaningful way to make the field less confusing to the novice? I guess it is because such groupings are subjective and hard to defend. (Time to appreciate textbook writers!) I attempted to annotate on the list by boldening foundational tasks (i.e., appearing as a foundational task in SLP2), italicizing old applications (i.e., appearing in SLP2 Applications chapter), and leaving as is the other stuffs. We can see that there are quite a few “other” stuffs that don’t yet fit to the textbook:

    Computational Social Science and Cultural Analytics
    Efficient/Low-Resource Methods for NLP
    Ethics, Bias, and Fairness
    Generation
    Human-Centered NLP
    Information Retrieval and Text Mining
    Interpretability and Analysis of Models for NLP
    Language Modeling
    Linguistic theories, Cognitive Modeling and Psycholinguistics
    Machine Learning for NLP
    Multilinguality and Language Diversity
    Multimodality and Language Grounding to Vision, Robotics and Beyond
    NLP Applications
    Resources and Evaluation
    Sentiment Analysis, Stylistic Analysis, and Argument Mining
    Special Theme: Generalization of NLP Models

These topics are not new, because many of them already appeared in ACL’08 call for papers. In fact, the call for papers that year did not put everything into a flat alphabetical list, but did some grouping and meaning organization. I want to analyze them, but maybe not in this post.

All of this differences and evolution in taxonomies are what makes research (specifically NLP) interesting. The reality does not care how the heck researchers organized their problem space. The only thing that matters is whether things get done, pain points get resolved, and humans suffer less. When real-world demand shifts and our technical understanding evolve, these taxonomies also change. The conference proceedings are the best reflection of the active research areas, where our understanding has not met the real-world demand. The textbooks, meanwhile, is a record of knowledge, thus also cover past advances in fields where demand has lowered.

As an anecdote on epistemology, I was once sitting in my AI class but not following the lectures. Instead, I was scribbling about the organization of research areas in AI based on the table of contents of Peter and Norvig’s textbook. My friend Sharon saw that and later asked me why I did so instead of locking in the lectures. I didn’t know exactly why, but I think it important to understanding such. To become a thought leader in a field, perhaps the first thing is to reverse engineer how the field’s knowledge has come to place. Then the second step is to stay grounded to the reality, listen to new problems, and formulate new research directions accordingly. This is exactly the description of position papers, which are one of the most interesting types of papers, usually written by thought leaders in a field.

P/S: I may have not used the word epistemology correctly here. I want to say “the organization of knowledge”. But anyway, just saying the word makes me feel fabulous.