KSH coding standard – by Stephen Phil Hill

This post was originated by Stephen Phil Hill (Stephen.p.Hill@alcatel-lucent.com) on Oct 8, 2008. As an experienced developer in SU (software upgrade), Phil has spent years on writing KSH and TCL scripts. More ever, this KSH coding standard has been a necessary manual/rules in ALU for all developers who write shell scripts. This post is allowed by Stephen Hill himself. Any question, comment or suggestion, please leave a message here or email to me (dave.tian@alcatel-lucent.com) or Phil.

NOTE: These suggestions, recommendations, and rules are not actual requirements of the project. Instead they are a combination of recommendations that help improve code efficiency and robustness, recommendations that help smooth the code inspection process, and even some style suggestions. Do not be concerned that you must follow these exactly – but be prepared for a slower code inspection if you disregard them totally.

NOTE: These standards assume ksh93 is being used. At the moment all recommendations and code used illustrating them are also compatible with ksh88. Compatibility with bash is not guaranteed but many of the rules apply equally well to bash code.

To determine the ksh version you are using from the cmd-line, simply enter Ctrl-v. This works in either EMACS or VI line-editing mode.

You can also execute:

    print ${.sh.version}

for the same result.

If your program needs to verify this at run time one other way is to check on specific functionality:

    if [[ $'x' == x ]] ; then
        echo "is using ksh93 parsing"
    fi

some very minimal Style recommendations


RULE : indent via tabs, not spaces or a mixture of tabs and spaces.

RECOMMENDATION: an exception – wrapping long lines. It can look very readable if you indent the wrap the same number of tabs as the first line, plus 4 spaces.

Eg:

   command -a -b -c -d --arg1 stuff --arg2 stuff 
       --arg3 stuff --arg4 stuff

   next command

So the first line is indented one tab. The continuation on the second line is by one tab then 4 spaces. The following line returns to one tab indentation.

This preserves a reasonable alignment for people that use different tabstop settings.


RULE : declare functions before ‘main body’ code in your programs. Don’t mix and match.

Exceptions – maybe loading libraries, typesets of variables, etc.


RECOMMENDATION : Label the main body of the program

At the point of the “main body” start put a ksh comment that includes the word “main”. It helps us old-timer C-programmers out. Eg:

  # Begin "main()" program body.

RULE : at least some minimal function prologs

RECOMMENDATION : It can be pretty brief. Just a sentence or two on what it does.

RULE : Don’t put too much. You don’t want to give explicit detail there and have to update it every time you change one line of code. Be realistic – it won’t be updated and it will be wrong in no time.


Where to put the then or do
if [[ condition ]]
then
   stuff
fi

while [[ condition ]]
do
   stuff
done

-VS-

if [[ condition ]] ; then
   stuff
fi

while [[ condition ]] ; do
   stuff
done

RECOMMENDATION: I have a personal preference for the second/shorter method, but I won’t try to make any rule on that. The separate line version doesn’t seem to add any readability or anything else I can see though…

RULE: Just don’t switch mid-stream. If you are adding 10 lines in an existing function then follow the method used in the rest of that function.

If you are adding a new function then do what you please.


Ordering functions

When creating a library of functions for use by others, order the functions in some logical way – either by functionality of alphabetically.


Coding standards and recommendations


RULE : use the ksh syntax checker

Always test your code with the build-in syntax checker and fix those warnings before you even try the code. Eg:

  ksh -n myscript.ksh

RULE : use the ksh syntax checker on bash code too

If writing bash code, try out the ksh syntax checker on it. You will have to discard some false positives but it is still better than the built-in bash syntax checking.

But also try the bash built-in syntax checker.


RULE : Use [ ] sparingly.

Use of single-square-brackets is somewhat deprecated, largely replaced with double-square-brackets for string and double-paren (( )) for math ops.

It should technically still be allowed for simple tests ( eg, if [ -f myfile ] ) but you will find that many reviewers have a religious zeal for eliminating all uses of [ ]. Use at your own risk.


RULE : Use (( )) for arithmetic expressions.

Using (( )) is almost identical to using ‘let’. Use <>=! style comparison operators. Eg:

   if (( $i > 3 )) ; then

   if (( i > 3 )) ; then

     # note those two forms - using "$i" and using "i"
     # are identical except using "i" is faster

   (( i-- ))

   (( i++ ))

   cmd
   if (( $? != 0 )) ; then

Note: I don’t know of any benefit for using (( )) around simple assignments, eg:

   i=7

RULE : Use double square brackets for string operations.

( couldn’t get the characters through the wiki formatter )

Eg:

   if [[ ${HOST_SIDE} != "A" ]]; then
   if [[ -e ${VALDATA} ]]; then

RULE : Do not use the old Bourne shell command substitution syntax
     x=`cmd`

Instead use the newer KSH syntax for command substitution:

     x=$(cmd)

This is much more flexible, nestable, etc.


RULE : Use the {} in variable references almost all the time.
     Wrong: $x

     Right: ${x}
     Right: ${1}
Also Right: $1

It is hard to justify this rule with technical reasons – in 99.9% of usage it won’t matter at all. It actually comes down to a “Because we said so!” kind of rule, and in SU code you need to match existing style somewhat.


RULE : typeset locals

Prevent collision between the global and function-local namespaces – typeset all your variables in functions.


RULE : global/local variable naming conventions

Make globals distinguishable from locals by name, make global variable names in ALL CAPS.

Locals should be all lower case, though mixed case can be used if you feel it is appropriate.


RULE : no littering the filesystem

no tmp files in random directories – put them somewhere specific, not $CWD

   BAD: cmd > out
   BAD: do_stuff_with out
   BAD: rm -f out

   BETTER: outfile=/var/tmp/out.$$
   BETTER: cmd > ${outfile}
   BETTER: do_stuff_with ${outfile}
   BETTER: rm -f ${outfile}

RULE : clean up tmp files

clean up after yourself, especially in error legs. trap’s are good for this.

    trap "rm -f some_tmp_file ; print -u2 'IT FAILED BADLY'" EXIT

    code
    more code

    # done, clean up and exit.  Clear trap to avoid error message.
    rm -f some_tmp_file
    trap "" EXIT
    print "ALL DONE"
    exit 0

RULE : get return codes from rsh’d commands

'rsh' doesn’t return the remote return code to you. For that reason we have the wrapper function 'remote_sh' for you to use. There is also the ksh script 'remote_shell'.

Rule of thumb: if you are using it multiple times in one script, definitely use remote_sh. If you are only calling it once you can use remote_shell.


RULE : LEARN ABOUT KSH PATTERN MATCHING!!!!!!!!!!!!

*NEVER EVER use*

   echo ${ACRNLIST} | grep [b-km-zA-Z].: > /dev/null
   if [ $? -eq 0 ]; then

instead use ksh pattern matching.

     if [[ "${ACRNLIST}" == [b-k][m-z][A-Z]: ]]; then

RULE : LEARN ABOUT KSH PATTERN SUBSTITUTION!

*Don’t do this, with an external sed process to launch*

   ACRNLIST=$(echo "${ACRNLIST}" | sed "s/,/ /g")

instead use ksh pattern substitution

     ACRNLIST=${ACRNLIST//,/ }

In this example the double-slash means the same as the ‘g’ did for sed. See the manpage.


RULE : LEARN ABOUT KSH SUBSTRING SUPPORT:

*Don’t do this, with an external cut process to launch*

   x=$(echo ${blah} | cut -c3-10)
   y=$(echo ${blah} | cut -c10-)

instead use ksh substring support

   x=${blah:2:8}
   y=${blah:9}

Remember in converting these that it goes from 1-based to 0-based, and from start-to-end to start-and-length. See the manpage.

Anecdote: I spotted a script with one of those ‘echo | cut’ statements in it. It was inside a nested loop and was being hit about 4400 times to parse some data at script startup. It consistently took 55 seconds to run just that loop on ihgp. Changing that ONE LINE appropriately made the nested loop complete in less than 1 second.


RULE : Check user input data types

If the user is typing an answer to a prompt and the script is using it in a way that bad data may cause a ksh syntax error, check that the data is valid.

Eg, verify that it is numeric:

/bin/echo -e "Enter number of apples desired:c"
read A
if [[ "${A}" != +([[:digit:]]) ]];then
    error "answer was not a number"
fi

RULE : Don’t over use ‘rm -fr’

Never use 'rm -fr' on a file, only use the 'r' on a directory.


RULE : Exit with a useful value

Scripts should always exit with a meaningful exit value.


RULE : Return with a useful return value

Functions should almost always return with a meaningful return value.


RULE : Case statements should handle default (usually)

Case statements should almost always include a default case ( with * ) to catch unexpected errors.


RULE : Safer string handling

When comparing strings, use quotes to prevent errors due to unusual strings. Eg:

  if [[ "${my_var}" == "done" ]]; then

NOTE: quoting changes it from a pattern match to a string comparison. If you have a pattern you must NOT quote it but you must escape special characters – eg, whitespace.

  if [[ "${my_var}" == *it is done ]]; then

RULE : checking if a process is alive by PID

When you have the PID of a process and you need to check if it is still running, some people are tempted to construct elaborate pipelines of ps, grep, sed, and cut. Do not! This is done exactly the same way you would in C, with signal 0:

   kill -0 ${pid}
   if (( $? != 0 )) ; then
      # process is finished
      wait ${pid}
      echo "process exited with $?"

RULE : use getopts for argument parsing

Possibly KSH93-specific syntax here?

For parsing arguments, usually use getopts. There are exceptions but most scripts that have their own parsing code end up being messy to use and messy to extend later. A simple example follows illustrating several good points:

  • Supports longname aliases for the short names ( eg, -v is the same as –verbose )
  • Supports longname options that do not have short names ( eg, –help )
  • No #feature dependency concentration on the option spec string since it is multi-line
#
# getopts options spec.
#

OPT_SPEC=":"
OPT_SPEC+="[-][99:-help] "
OPT_SPEC+="[-][98:-examples] "
OPT_SPEC+="[-][v:-verbose] "
OPT_SPEC+="[-][s:-set]: "
OPT_SPEC+="[-][h:-hostname]: "

typeset -u SET_ARG
typeset -i VERBOSE

SET_ARG=""
VERBOSE=0
HOST_ARG=""

# parse command-line
while getopts "${OPT_SPEC}" arg ; do
    case "${arg}" in
    99)
        # --help
        print "${USAGE}"
        scriptexit "${CURR_CMD}" 0
        ;;
    98)
        # --examples
        print "${USAGE}"
        print "${USAGE2}"
        scriptexit "${CURR_CMD}" 0
        ;;
    v)
        ((VERBOSE++))
        ;;
    s)
        SET_ARG=${OPTARG}
        SET="Y"
        ;;
    h)
        HOST_ARG=${OPTARG}
        ;;
    *)
        print "ERROR: Invalid Argument"
        print "${USAGE}"
        scriptexit "${CURR_CMD}" 1
        ;;
    esac
done
shift $(($OPTIND-1))

About daveti

Interested in kernel hacking, compilers, machine learning and guitars.
This entry was posted in Programming and tagged . Bookmark the permalink.

2 Responses to KSH coding standard – by Stephen Phil Hill

  1. Stephen Hill / Phil says:

    Strangely, your posting of this (and Google) helped me reconnect with a friend I had lost touch with 10 years ago. There was enough specific information here including the use of my nickname ‘Phil’ so he found me easily when he searched.

    Thanks!

    • daveti says:

      Very very interesting – what a small world, right:) Anyway, it is nice that your ‘lost’ friend gets back and thank you for sharing this excellent summary again!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.