🛠️ Urdu Roman Dictionary Toolkit
Process and enhance Urdu Roman text data for dictionary creation and NLP tasks
About the Toolkit
This toolkit provides modular tools for processing Urdu Roman text. You can use individual tools or run a complete pipeline to transform your text data.
Input Text
Results
🔧 Separator
Splits compound entries or sentences into individual words.
Input: "mera naam Akbar hai"
Output: ["mera", "naam", "Akbar", "hai"]
🧹 Cleaner
Removes unwanted characters, duplicates, or formatting issues.
Input: "naam!!, Akbar, akbar, naam"
Output: ["naam", "Akbar"]
🔄 Translator
Converts Urdu Roman to Urdu script or vice versa.
Input: "mera naam Akbar hai"
Output: "میرا نام اکبر ہے"
📊 Sorter
Alphabetically or frequency-wise sorts the word list.
Input: ["Akbar", "naam", "mera", "hai"]
Output: ["Akbar", "hai", "mera", "naam"]
⚙️ Normalizer
Standardizes spelling variations (e.g., "kya", "kia", "kiaa" → "kya").
📈 Frequency Counter
Counts word occurrences in a corpus.
Output: { "mera": 5, "naam": 3, "hai": 7 }
0 Comments