Skip to content

NLP related tasks including text scraping, preprocessing, topic modeling & NER.

Notifications You must be signed in to change notification settings

rupeshghimire7/Text-Processing-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-Processing-Modeling

Welcome to the Text-Processing-Modeling repository! This series is designed to guide you through various aspects of Natural Language Processing (NLP), from text extraction and cleaning to advanced tasks like Named Entity Recognition (NER) and Topic Modeling. Whether you're a beginner or an experienced practitioner, you'll find valuable insights and practical examples in each directory.

Directory Structure:

1. Extract_Clean:

This directory is dedicated to the initial steps of text processing. You'll find scripts and tools for text scraping, extraction, and cleaning. The goal is to transform raw text data into a structured format, including the creation of a Document-Term Matrix (DTM) for further analysis.

2. Explore:

Explore is where you dive into the world of text data through exploration and visualization. Uncover patterns, trends, and anomalies in your text data using various visualization techniques. This section provides a solid foundation for understanding the characteristics of your corpus.

3. NER:

Named Entity Recognition (NER) is a fundamental task in NLP. In this directory, you'll find scripts leveraging spaCy to perform basic NER on your text data. The result is a DataFrame containing named entities per document, providing valuable insights into the entities mentioned in your text.

4. Topic Modeling:

Topic Modeling is a powerful technique to extract topics from a collection of documents. The scripts in this directory use Gensim's Latent Dirichlet Allocation (LDA) to identify and analyze topics within your text corpus. Gain a deeper understanding of the themes present in your data.

5. Resume_Parser:

This directory focuses on a specific and practical application of NLP - resume parsing. Learn how to train a custom NER model to extract key information from resumes. The Resume Parser included categorizes data into labels such as skills, courses, experience, tenure, organization, education, involvements, socials, and other default NER labels.

Happy NLP modeling! 🚀

About

NLP related tasks including text scraping, preprocessing, topic modeling & NER.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published