List

Spell checking

We create an AWK program for spell checking.

$wget nishantmunjal.com/dataset/spellcheck.awk
BEGIN {
    count = 0
    
    i = 0
    while (getline myword <"/usr/share/dict/words") {
        dict[i] = myword
        i++
    }
}

{
    for (i=1; i<=NF; i++) {
    
        field = $i
    
        if (match(field, /[[:punct:]]$/)) {
            field = substr(field, 0, RSTART-1)
        }
    
        mywords[count] = field
        count++
    }
}

END {

    for (w_i in mywords) { 
        for (w_j in dict) { 
            if (mywords[w_i] == dict[w_j] || 
                        tolower(mywords[w_i]) == dict[w_j]) {
                delete mywords[w_i]
            }
        }
    }

    for (w_i in mywords) { 
        if (mywords[w_i] != "") {
            print mywords[w_i]        
        }
    }
}

The script compares the words of the provided text file against a dictionary. Under the standard /usr/share/dict/words path we can find an English dictionary; each word is on a separate line.

BEGIN {
    count = 0
    
    i = 0
    while (getline myword <"/usr/share/dict/words") {
        dict[i] = myword
        i++
    }
}

Inside the BEGIN block, we read the words from the dictionary into the dict array. The getline command reads a record from the given file name; the record is stored in the $0 variable.

{
    for (i=1; i<=NF; i++) {
    
        field = $i
    
        if (match(field, /[[:punct:]]$/)) {
            field = substr(field, 0, RSTART-1)
        }
    
        mywords[count] = field
        count++
    }
}

In the main part of the program, we place the words of the file that we are spell checking into the mywords array. We remove any punctuation marks (like commas or dots) from the endings of the words.

END {

    for (w_i in mywords) { 
        for (w_j in dict) { 
            if (mywords[w_i] == dict[w_j] || 
                        tolower(mywords[w_i]) == dict[w_j]) {
                delete mywords[w_i]
            }
        }
    }
...
}    

We compare the words from the mywords array against the dictionary array. If the word is in the dictionary, it is removed with the delete command. Words that begin a sentence start with an uppercase letter; therefore, we also check for a lowercase alternative utilizing the tolower() function.

for (w_i in mywords) { 
    if (mywords[w_i] != "") {
        print mywords[w_i]        
    }
}

Remaining words have not been found in the dictionary; they are printed to the console.

$ awk -f spellcheck.awk text
consciosness
finaly

We have run the program on a text file; we have found two misspelled words. Note that the program takes some time to finish.

Leave a Reply

Your email address will not be published. Required fields are marked *

  Posts

1 2 3
November 2nd, 2020

Churn Emails – Count Number Domain

Write a function count_message_from_domain which reads the file /cxldata/datasets/project/mbox-short.txt. This function builds a histogram using a dictionary to count how many messages have […]

November 2nd, 2020

Churn Emails – Count Number of Messages

Python Project – Churn Emails – Count Number of Messages From Each Email Address Write a function count_message_from_email which reads the file /datasets/project/mbox-short.txt.  […]

November 2nd, 2020

Churn Email: Day of the Week

Python Project – Churn Emails – Find Which Day of the Week the Email was sent Write a function find_email_sent_days which reads […]

October 20th, 2020

Compute the Compound Interest.py

Write a function with name compound_interest that takes three arguments: principle, rate and years in order. the rate is float […]

October 18th, 2020

String Data Type

A string is a sequence of characters. String Data Type str1=”hello” print(type(str1)) Ans: <class ‘str’> str2=’123′ print(type(str2)) Ans: <class ‘str’> […]

October 18th, 2020

String Library

String Replace str=”Hello Bob” print(str) rstr=str.replace(‘Bob’, ‘James’) print(rstr)   This will replace the Bob with James and store it in […]

October 17th, 2020

‘in’ statement in String

  fruits=’banana’ bana in fruits Ans: True Python Function to confirm he vowel in the given input. def is_vowel(l): return […]

September 24th, 2020

awk Marking keywords

Marking keywords In the following example, we mark Java keywords in a source file. $wget nishantmunjal.com/dataset/mark_keywords.awk # the program adds […]

September 24th, 2020

awk Rock-paper-scissors

Rock-paper-scissors Rock-paper-scissors is a popular hand game in which each player simultaneously forms one of three shapes with an outstretched […]

September 24th, 2020

awk Spell Checking

Spell checking We create an AWK program for spell checking. $wget nishantmunjal.com/dataset/spellcheck.awk BEGIN { count = 0 i = 0 […]