List

Marking keywords

In the following example, we mark Java keywords in a source file.

$wget nishantmunjal.com/dataset/mark_keywords.awk
# the program adds tags around Java keywords
# it works on keywords that are separate words

BEGIN {

    # load java keywords
    i = 0
    while (getline kwd <"javakeywords2") {
        keywords[i] = kwd
        i++
    }
}

{
    mtch = 0
    ln = ""
    space = ""
    
    # calculate the beginning space
    if (match($0, /[^[:space:]]/)) {
        if (RSTART > 1) {
            space = sprintf("%*s", RSTART, "") 
        }
    }     
    
    # add the space to the line
    ln = ln space
    
    for (i=1; i <= NF; i++) {
    
        field = $i
         
        # go through keywords   
        for (w_i in keywords) { 
        
            kwd = keywords[w_i]
            
            # check if a field is a keyword
            if (field == kwd) {
                mtch = 1     
            } 
        }
        
        # add tags to the line        
        if (mtch == 1) {
            ln = ln  "<kwd>" field  "</kwd> "   
        } else {
            ln = ln field " " 
        }
        
        mtch = 0
            
    }
    
    print ln
}

The program adds <kwd> and </kwd> tags around each of the keywords that it recognizes. This is a basic example; it works on keywords that are separate words. It does not address the more complicated structures.

# load java keywords
i = 0
while (getline kwd <"javakeywords2") {
    keywords[i] = kwd
    i++
}

We load Java keywords from a file; each keyword is on a separate line. The keywords are stored in the keywords array.

# calculate the beginning space
if (match($0, /[^[:space:]]/)) {
    if (RSTART > 1) {
        space = sprintf("%*s", RSTART, "") 
    }
}        

Using regular expression, we calculate the space at the beginning of the line if any. The space is a string variable equaling to the width of the space at the current line. The space is calculated in order to keep the indentation of the program.

# add the space to the line
ln = ln space   

The space is added to the ln variable. In AWK, we use a space to add strings.

for (i=1; i <= NF; i++) {

field = $i
...
}

We go through the fields of the current line; the field in question is stored in the field variable.

# go through keywords   
for (w_i in keywords) { 

    kwd = keywords[w_i]
    
    # check if a field is a keyword
    if (field == kwd) {
        mtch = 1     
    } 
}

In a for loop, we go through the Java keywords and check if a field is a Java keyword.

# add tags to the line        
if (mtch == 1) {
    ln = ln  "<kwd>" field  "</kwd> "   
} else {
    ln = ln field " " 
}

If there is a keyword, we attach the tags around the keyword; otherwise we just append the field to the line.

print ln

The constructed line is printed to the console.

$ awk -f markkeywords2.awk program.java 
<kwd>package</kwd> com.zetcode; 

<kwd>class</kwd> Test { 

     <kwd>int</kwd> x = 1; 

     <kwd>public</kwd> <kwd>void</kwd> exec1() { 

         System.out.println(this.x); 
         System.out.println(x); 
     } 

     <kwd>public</kwd> <kwd>void</kwd> exec2() { 

         <kwd>int</kwd> z = 5; 

         System.out.println(x); 
         System.out.println(z); 
     } 
} 

<kwd>public</kwd> <kwd>class</kwd> MethodScope { 

     <kwd>public</kwd> <kwd>static</kwd> <kwd>void</kwd> main(String[] args) { 

         Test ts = <kwd>new</kwd> Test(); 
         ts.exec1(); 
         ts.exec2(); 
     } 
} 

A sample run on a small Java program.

Leave a Reply

Your email address will not be published. Required fields are marked *

  Posts

1 2 3
October 20th, 2020

Compute the Compound Interest.py

Write a function with name compound_interest that takes three arguments: principle, rate and years in order. the rate is float […]

October 18th, 2020

String Data Type

A string is a sequence of characters. String Data Type str1=”hello” print(type(str1)) Ans: <class ‘str’> str2=’123′ print(type(str2)) Ans: <class ‘str’> […]

October 18th, 2020

String Library

String Replace str=”Hello Bob” print(str) rstr=str.replace(‘Bob’, ‘James’) print(rstr)   This will replace the Bob with James and store it in […]

October 17th, 2020

‘in’ statement in String

  fruits=’banana’ bana in fruits Ans: True Python Function to confirm he vowel in the given input. def is_vowel(l): return […]

September 24th, 2020

awk Marking keywords

Marking keywords In the following example, we mark Java keywords in a source file. $wget nishantmunjal.com/dataset/mark_keywords.awk # the program adds […]

September 24th, 2020

awk Rock-paper-scissors

Rock-paper-scissors Rock-paper-scissors is a popular hand game in which each player simultaneously forms one of three shapes with an outstretched […]

September 24th, 2020

awk Spell Checking

Spell checking We create an AWK program for spell checking. $wget nishantmunjal.com/dataset/spellcheck.awk BEGIN { count = 0 i = 0 […]

September 24th, 2020

awk Pipes

Pipes AWK can receive input and send output to other commands via the pipe. $ echo -e “1 2 3 […]

September 24th, 2020

awk Passing variables

Passing variables to AWK AWK has the -v option which is used to assign values to variables. For the next program, we […]

September 24th, 2020

awk more

Passing variables to AWK Pipes-awk awk Spell Checking Rock Paper Scissor Marking Keywords