What Is TF-IDF?

Written by Coursera Staff • Updated on May 19, 2025

TF-IDF is a machine learning method that helps a computer or robot understand the words integral to interpreting a document. Explore how you can use this metric for various purposes.

Term frequency-inverse document frequency (TF-IDF) is a machine learning method that helps artificial intelligence (AI) models understand how relevant words are in understanding what a given text is about. It’s an essential metric you can use in various ways, including ranking results in search engines and machine learning methods, including natural language processing (NLP). Learn more about TF-IDF, how to calculate term frequency-inverse document frequency, and explore careers that use this technique.

What is TF-IDF?

Term frequency-inverse document frequency (TF-IDF) is a metric you can use in machine learning to create a numerical representation of the words in a text document that demonstrates how relevant those words are within the text as a whole. TF-IDF is an important component of information retrieval and natural language processing because it can help AI models analyze the keywords and phrases within documents, providing a more nuanced understanding of what the text says.

Term frequency refers to how often a term is used in a body of text. Inverse document frequency looks at how many documents in a body of documents contain that term. To use a metaphor, term frequency considers how many times the word appears in the book, and inverse document frequency looks at how many books in the library use that term. In your local library, nearly every book would use words like “the” or “and” frequently. These words would have both a high term frequency (TF) and inverse document frequency (IDF), so an AI model would understand that those words weren’t particularly helpful for understanding what the document is about.

However, a word like “skydiving” would not appear in as many books. An AI model could use TF-IDF to determine that “skydiving” is a more important word for deciding what a document is about. Books with a higher TF for “skydiving” would be more relevant for people who want to check out a book about preparing for a skydive.

What is TF-IDF used for?

You can use TF-IDF for three primary purposes: information retrieval, keyword extraction, and machine learning.

Information retrieval: TF-IDF allows an AI model to retrieve information from a massive text library. For example, a search engine uses TF-IDF to determine which documents to provide as the result of a search query.

Keyword of feature extraction: TF-IDF provides a mechanism for an AI model to determine the most important words in a text, which the model can then use to summarize the document.

Machine learning: TF-IDF is an important part of natural language processing, a machine learning technique that allows computers and AI models to interpret human language. TF-IDF is important for training models to understand the patterns behind human language and the importance of individual words.

Calculate term frequency and inverse document frequency

If you wanted to calculate TF-IDF, you would first need to calculate term frequency, then inverse document frequency, and finally TF-IDF. The equations for these calculations include three variables:

t: term
d: document
D: set of documents

The equation to find term frequency is tf(t,d)=log⁡(1+freq(t,d))

The equation to find inverse document frequency is idf(t,D)=log⁡(N/count(d∈D:t∈d)).

To put them together and find TF-IDF, the equation is tfidf(t,d,D)=tf(t,d).idf(t,D).

Your final results will measure how important the term is within the entire body of documents. The higher the score, the more relevant the AI model will consider that term.

Pros and cons of TF-IDF

TF-IDF is important to machine learning because it provides a scalable metric independent of written or spoken language to determine individual words' importance. You can use TF-IDF no matter what language your training materials are in because it’s a statistical calculation. The metric is effective on smaller and larger data sets, giving you flexibility in applying it. It’s also a simple calculation that gives you a starting point for more advanced calculations.

At the same time, TF-IDF has limitations. One such limitation is that, as a mathematical calculation, the technique can’t understand that a word is the same when used in different tenses or with different forms. TF-IDF would classify “create,” “created,” “creates,” and “creating” as four different words when, for most analytical purposes, they are the same. Another problem lies in compound nouns like the “White House.” TF-IDF doesn’t have a mechanism to understand that these words are related. You can use other machine learning techniques to help get more accurate results using TF-IDF for these situations.

Who uses TF-IDF?

TF-IDF is a technique that professionals like data scientists, natural language processing research scientists, and information retrieval specialists use. If you want to explore a career using TF-IDF, learn more about what these job roles do, the job outlook for each role, and what you can expect as an average salary.

Data scientists

Average salary in the US (Glassdoor): $118,694 [1]

Job outlook (projected growth from 2023 to 2033): 36 percent [2]

As a data scientist, you will use math and statistics to help companies make sense of their data. You work with data in various ways, including collecting, processing, and analyzing it. You will provide a report or visualizations of your findings to company stakeholders to help them make data-driven decisions. In this role, you can work in many different industries, including scientific research, designing computer systems, and working in other industries like business, finance, government, and more.

Natural language processing (NLP) researchers

Average salary in the US (Glassdoor): $135,624 [3]

Job outlook (projected growth from 2023 to 2033): 26 percent [4]

As a research scientist focusing on natural language processing, you will have the opportunity to work on different kinds of projects advancing the field of AI and NLP. You will work with other scientists and teams to create and conduct experiments that find new ways to work with NLP or design new applications for this technology. You may work on creating new NLP models. In this role, you may also share your research with the greater scientific community through published articles or conferences.

Document retrieval specialists

Average salary in the US (Glassdoor): $53,015 [5]

Job outlook (projected growth from 2023 to 2033): 16 percent [6]

A document retrieval specialist, also known as an information retrieval specialist, is a professional who creates or manages a computer information system and can provide information to other team members as needed. Document retrieval specialists in the health care industry sort and manage information like patient charts to provide medical professionals with current medical research and other information. They can also work in fields like law enforcement, helping officers locate evidence like surveillance footage.

Learning about TF-IDF with Coursera

TF-IDF is a tool you can use in machine learning to understand patterns within text and assign a numeric value to each word to signify how relevant it is for understanding the text. If you want to learn more about TF-IDF and explore machine learning skills, consider learning about them on Coursera.

You could enroll in the Machine Learning Specialization from Stanford and DeepLearning.AI to learn about machine learning algorithms, neural networks, mathematics, and other fundamentals. You could also enroll in the IBM Machine Learning Professional Certificate to build skills to help you prepare for an entry-level job in machine learning.

Article sources

Glassdoor. “Salary: Data Scientist in the United States, https://www.glassdoor.com/Salaries/data-scientist-salary-SRCH_KO0,14.htm.” Accessed February 20, 2025.

Updated on May 19, 2025

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.