{"id":112,"date":"2026-04-04T11:23:56","date_gmt":"2026-04-04T11:23:56","guid":{"rendered":"https:\/\/gigz.pk\/ml\/?post_type=lesson&#038;p=112"},"modified":"2026-04-09T08:03:08","modified_gmt":"2026-04-09T08:03:08","slug":"tf-idf","status":"publish","type":"lesson","link":"https:\/\/gigz.pk\/ml\/lesson\/tf-idf\/","title":{"rendered":"TF-IDF"},"content":{"rendered":"\n<p><strong>TF-IDF (Term Frequency \u2013 Inverse Document Frequency)<\/strong> is a technique used in Natural Language Processing to convert text into <strong>numerical features<\/strong> while giving importance to meaningful words. Unlike Bag of Words, TF-IDF reduces the weight of common words and highlights words that are more informative.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why TF-IDF is Important<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identifies <strong>important words<\/strong> in a document<\/li>\n\n\n\n<li>Reduces the impact of common words like \u201cthe\u201d, \u201cis\u201d<\/li>\n\n\n\n<li>Improves performance of text-based Machine Learning models<\/li>\n\n\n\n<li>Widely used in search engines and document ranking<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Key Concepts<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. Term Frequency (TF)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Measures how often a word appears in a document<\/li>\n\n\n\n<li>Formula:<br>TF = (Number of times a word appears in a document) \/ (Total number of words in the document)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2. Inverse Document Frequency (IDF)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Measures how important a word is across all documents<\/li>\n\n\n\n<li>Words that appear in many documents get lower importance<\/li>\n\n\n\n<li>Formula:<br>IDF = log (Total number of documents \/ Number of documents containing the word)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3. TF-IDF Score<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Combines TF and IDF to assign a weight to each word<\/li>\n\n\n\n<li>Formula:<br>TF-IDF = TF \u00d7 IDF<\/li>\n\n\n\n<li>Higher score means the word is more important in that document<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Example<\/h2>\n\n\n\n<p>Text Data:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Document 1: \u201cI love machine learning\u201d<\/li>\n\n\n\n<li>Document 2: \u201cMachine learning is powerful\u201d<\/li>\n\n\n\n<li>Words like \u201cmachine\u201d and \u201clearning\u201d appear in both documents, so they get <strong>lower weight<\/strong><\/li>\n\n\n\n<li>Words like \u201clove\u201d and \u201cpowerful\u201d appear less frequently, so they get <strong>higher weight<\/strong><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Advantages<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Highlights important and unique words<\/li>\n\n\n\n<li>Reduces noise from common words<\/li>\n\n\n\n<li>Improves model performance compared to simple word counts<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Limitations<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Still ignores word order and context<\/li>\n\n\n\n<li>Does not capture semantic meaning<\/li>\n\n\n\n<li>Can create large feature spaces<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Implementation Example (Python using Scikit-learn)<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">from sklearn.feature_extraction.text import TfidfVectorizer# Sample text data<br>documents = [<br>    \"I love machine learning\",<br>    \"Machine learning is powerful\"<br>]# Create TF-IDF model<br>vectorizer = TfidfVectorizer()# Transform text into TF-IDF features<br>X = vectorizer.fit_transform(documents)# Vocabulary<br>print(vectorizer.get_feature_names_out())# TF-IDF matrix<br>print(X.toarray())<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Applications<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Search engines (ranking relevant documents)<\/li>\n\n\n\n<li>Text classification and clustering<\/li>\n\n\n\n<li>Spam detection<\/li>\n\n\n\n<li>Sentiment analysis<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Best Practices<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Combine with text preprocessing steps like stopword removal<\/li>\n\n\n\n<li>Limit vocabulary size to reduce computation<\/li>\n\n\n\n<li>Use n-grams to capture more context<\/li>\n\n\n\n<li>Compare performance with Bag of Words for your dataset<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>TF-IDF is a powerful text representation technique that improves upon Bag of Words by assigning importance to words based on their frequency and uniqueness. It is widely used in NLP tasks to create better-performing Machine Learning models.<\/p>\n\n\n\n<div class=\"schema-faq wp-block-yoast-faq-block\"><div class=\"schema-faq-section\" id=\"faq-question-1775721769948\"><strong class=\"schema-faq-question\"><\/strong> <p class=\"schema-faq-answer\"><\/p> <\/div> <\/div>\n\n\n<div class=\"yoast-breadcrumbs\"><span><span><a href=\"https:\/\/gigz.pk\/ml\/\">Home<\/a><\/span> \u00bb <span class=\"breadcrumb_last\" aria-current=\"page\">Advanced Machine Learning > NLP > TF-IDF<\/span><\/span><\/div>","protected":false},"menu_order":68,"template":"","class_list":["post-112","lesson","type-lesson","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>TF-IDF - Machine Learning Mastery<\/title>\n<meta name=\"description\" content=\"Learn TF-IDF for NLP: convert text to numerical features, highlight important words, and improve ML model performance.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/gigz.pk\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"TF-IDF - Machine Learning Mastery\" \/>\n<meta property=\"og:description\" content=\"Learn TF-IDF for NLP: convert text to numerical features, highlight important words, and improve ML model performance.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/gigz.pk\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Mastery\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-09T08:03:08+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":[\"WebPage\",\"FAQPage\"],\"@id\":\"https:\\\/\\\/gigz.pk\\\/ml\\\/lesson\\\/tf-idf\\\/\",\"url\":\"https:\\\/\\\/gigz.pk\\\/\",\"name\":\"TF-IDF - Machine Learning Mastery\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/gigz.pk\\\/ml\\\/#website\"},\"datePublished\":\"2026-04-04T11:23:56+00:00\",\"dateModified\":\"2026-04-09T08:03:08+00:00\",\"description\":\"Learn TF-IDF for NLP: convert text to numerical features, highlight important words, and improve ML model performance.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/gigz.pk\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/gigz.pk\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/gigz.pk\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/gigz.pk\\\/ml\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Advanced Machine Learning > NLP > TF-IDF\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/gigz.pk\\\/ml\\\/#website\",\"url\":\"https:\\\/\\\/gigz.pk\\\/ml\\\/\",\"name\":\"Machine Learning Mastery\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/gigz.pk\\\/ml\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"TF-IDF - Machine Learning Mastery","description":"Learn TF-IDF for NLP: convert text to numerical features, highlight important words, and improve ML model performance.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/gigz.pk\/","og_locale":"en_US","og_type":"article","og_title":"TF-IDF - Machine Learning Mastery","og_description":"Learn TF-IDF for NLP: convert text to numerical features, highlight important words, and improve ML model performance.","og_url":"https:\/\/gigz.pk\/","og_site_name":"Machine Learning Mastery","article_modified_time":"2026-04-09T08:03:08+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["WebPage","FAQPage"],"@id":"https:\/\/gigz.pk\/ml\/lesson\/tf-idf\/","url":"https:\/\/gigz.pk\/","name":"TF-IDF - Machine Learning Mastery","isPartOf":{"@id":"https:\/\/gigz.pk\/ml\/#website"},"datePublished":"2026-04-04T11:23:56+00:00","dateModified":"2026-04-09T08:03:08+00:00","description":"Learn TF-IDF for NLP: convert text to numerical features, highlight important words, and improve ML model performance.","breadcrumb":{"@id":"https:\/\/gigz.pk\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/gigz.pk\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/gigz.pk\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/gigz.pk\/ml\/"},{"@type":"ListItem","position":2,"name":"Advanced Machine Learning > NLP > TF-IDF"}]},{"@type":"WebSite","@id":"https:\/\/gigz.pk\/ml\/#website","url":"https:\/\/gigz.pk\/ml\/","name":"Machine Learning Mastery","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/gigz.pk\/ml\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/gigz.pk\/ml\/wp-json\/wp\/v2\/lesson\/112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gigz.pk\/ml\/wp-json\/wp\/v2\/lesson"}],"about":[{"href":"https:\/\/gigz.pk\/ml\/wp-json\/wp\/v2\/types\/lesson"}],"wp:attachment":[{"href":"https:\/\/gigz.pk\/ml\/wp-json\/wp\/v2\/media?parent=112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}