SgDotNet
Singapore Professional .NET User Group -For Cool Developers

SDBoK Illustrated (1) Introduction and Text Manipulation

Said in earlier posts that I may not have the luxury of time to elaborate each bullet I have put down in the Body of Knowledge, which were from two reasons I consider valid, (1) they are merely intended as pointers although already strutured in a significantly organized (if not ordered) way, to provide a roadmap or guideline. You can use them as search keywords in Google or Wikipedia to find more relevant and detailed information. The motivation for doing that is I shared much growing frustration too in my early days: I do not know what I do not know - I had always wished at a point of time, someone could lead me through the darkness before the dawn, if not a companion, a map would do. Not to mention the lack of convinience in the pastt from Google and Wikipedia as compared to today. (2) Once you have the pointers, you should be able to explore articles and resources that cover far more depth that I could ever do, from experts in the respective field all over the world.

In short, this is intended for the broadness rather than the depth: you need to know most of it in order to be really fluent in the business for engineering software. But to be fluent itself requires the visit of those topics on each own.

But friends told me that bullets alone may not be good enough, I agree too. An organized map might make sense to whoever already knows most of them, but could hardly make any impact to those who are new to them. Some sort of "trailers" are needed to really raise the curiosity then the awareness thus the subsequent active exploration.

This sounds great. I will try to put mini-examples or mini-articles for each topic to enrich an otherwise skeleton. They will not be here over nights but I will try my best to steal time to have them sooner.

Take note that the information presented here are mostly technology or product neutral, the "trick" (well, someone might argue "weakness", this largely depends on the perspective you take) to stay afloat in the software field is to harness the foundation very well so you can leverage them in any future missions: demystify the hype now that easily inflict many product and technology offerings, distill the problem part and the solution part, keep a calm mind and observe sometimes the hefty shaking of this interesting field.

Another note is that the examples presented could be more related to my own experience, this could be both a good point (example is more convincing) but it could also have limitations in their "immediate" applicability or being hitting the nail on the head.

I will start to touch on "Text Manipulation, Regular Expression, and Scripting", they appear in part (1) Joy of Cooking, or Tools of Trade.

Text Manipluation
  text editing (vi, vim, emacs etc)
  text transformation
  code generation
Regular Expression (power of search)
  grep, find
"Uniquitous" Automation with Scripting
  Windows Shell (including VBScript)
  Unix Shell
  Portable Scripts
  JavaScript? (a regular expression engine is available)
  Perl (classic regular expression engine)
  Python, Ruby (fast prototyping)

Text Manipulation. This is a huge topic. Hugeness does not come from that it is complicated but it is everywhere, so summarizing it may not be as simple as one thinks.

Text Editing is the first aspect. In terms of software, this could simply mean typing/editing source code, modifying configuration file, browsing through source files.

A good editor and fluency could lend enormous productivity gain so you can move faster and keep moving on a bigger task. The trouble with editors is the more powerful they are, the stiffer is the learning curve. Not to mention absorbing all the features, and hopefully retain some fluency. It is always a long shot, so you need to be prepared for the self-training in years (not, days, weeks or months) to come. But be rest-assured that the learning would be fruitful and you will be amazed by the possibilities you may see and learn on frequent basis.

Windows users all have Notepad.exe, but find an alternative is absolutely necessary. To start with, you may consider UltraEdit, TextPad, or some open source alternative like NEdit (in fact back ported from X Windows) etc. They offer a superset of features to Notepad and offers more powerful options via GUI. Access to the below options are unavoidable for a good editor, including

(1) search
(2) replace
(3) context-sensitive highlighting (eg syntax)
(4) diff

In fact text editing could be faster in keyboard only mode so you do not need to reach for mouse from time to time. But this could be painful and slow for those who work long enough in the GUI based editor: it takes time, be patient.

Vim or emacs or their deratives are noteable example of console-like text editors. They are available in most unix/linux systems thus they offer very good portability (so you do not have to install your favorite editor
on every machine you work on). You have to install on Windows platform and they are free and small!

Vim: http://www.vim.org/
GNU Emacs: http://www.gnu.org/software/emacs/
XEmacs: http://www.xemacs.org/

Any editor has a dialet for controlling the text emitting, so maybe you should try briefly each of them, once you feel comfortable with one of them, stick with it and invest time in the long run.

I will use examples in Vim to illustrate some basic editing and the above 4 important options you need most frequently.

Movement. Movement is a very important concept in Vim, it means both the way to move around in the text but also imply the selection or the target of operation when combined with another operation (do you see the 'composite' pattern?). Eg

w - word to the right   // notice the first letter 'w', it is equivalent to Ctrl-Arrow Right on native Windows
b - word to the left
^ - first character of the line // this is Ctrl-Home equivalence on Windows
                                        // note that ^ is a regular expression token, to be visited later
fx - move forward to the next occurence of the character x on the current line.
Fx - move backward to the previous occurence of the character x on the current line.
$ - last character of the line // this is Ctrl-End equivalence on Windows
% - switch between the matching () or {}, #if/#endif  // Visual Studio users could do this by Ctrl-]
[( - backward to the previous matching (   // notice the direction of [ and (
]) - forward to the next matching ) 
[{ - backward to the previous matching {
]} - forward to the next matching }  
( - backward to the previous sentence
) - forward to the next sentence
{ - backward to the previous paragraph
} - forward to the next paragraph
G - to the end of the file
gg - to the start of the file
<N>G - to line number <N>


Insertion. Vim has two modes, one is for the INSERT mode, one is for COMMAND (I cannot think of a better name for these two mode; some refer to them as edit mode and normal node but in fact you can perform edit in normal mode too so I do not feel those names are good enough either). You type several things to enter the INSERT mode, mostly such as:

i    - to insert before the current cursor
a   - to insert after the current cursor
I    - to insert at the start of the current line
A  - to insert at the end of the curren line
o    - to open a new line below the current line
O    - to open a new line above the current line  // notice the small o and capital O for related operations
c<m> - to change the text covered by the movement // I use <m> to represent a movement
  cw     - change the current word
  c$     - change all the way to the end of current line
  c)     - change the entire sentence from the current cursor
  c<N>_  - change the next N lines from the current line  // <N> is an integer; you then know <N>_ is alos a movement meaning "for next N lines"

Once inside this mode, keys entered from keyboard will appear as character in the editor, maybe we should call it CHAR mode too. You need type 'Esc' to come back to COMMAND mode.


Deletion. You must be within the COMMAND mode, of course, otherwise your input will become text

x    - delete the current character
d<m> - delete the text covered by the movement
  dw   - delete the current word
  d$   - detele all the way to tne end of the current line
  dd   - delete the entire current line
  d<N>d - delete the next N lines from the current line


Copy and Paste. vim uses registers to act like clipboard (or clipboard ring). the basic registers to know are
1) unnamed register ""  -- this is like a default clipboard place, storing text being deleted (cut) or yanked (copy)
2) numbered register "0, "1, ... "9 -- this is like a clipboard ring, vim automatically cycles the contents. "0 holds the recentest.
3) named register "a, "b, ..., "z, "A, ..., "Z -- this is meant to let you indicate a specific clipboard target for content from deletion or copying. Lowercase is replacement of register, uppercase appending the previous content of the register.

We already discussed deletion (ie cut) in the above. For copy, the term is yank, thus the following commands

y<m> - copy the text covered by the movement // 'y' stands for yank
  yw    - copy the current word
  y$    - copy all the way to the end of the current line
  yy    - copy the current line
  y<N>y - copy the next N lines from the current line // you see, it is very convenient to extrapolate commands

for paste, naturally we expect the command starts with 'p'

p - put after the cursor position
P - put before the cursor position

we mentioned the registers, but how to make use them, of course, you put the register before the actual command, such as

"ad<m>  - delete the text covered by the movment to the register "a
"ay<m>  - copy the text covered by the movement to the register "a
"ap     - paste before the cursor position with text from register "a

these commands are great and compact, aren't they?

Visual Mode
Mentioning of Copy and Paste would be complete without touching on Visual Mode. Visual mode is highlight text then perform operations on it (you may consider it sort of 'visual movement').

v - switch to text highlighting mode character by character horizontally and line by line vertically (ie afterwards, you can use h, l to move left and right, or j,k to move down and up)
V - switch to text highlighting mode entirely in line by line mode
Ctrl-V  - switch to block or rectangle highlighting mode. Use eg 4l3j to mean 4 to left, 4 down rectangle // In Visual Studio, this is like Alt modified selection.


Unod and Redo.

u  - undo last changes   // that's Ctrl-Z equivalence on Windows
Ctrl + R - undo the undo's, or redo   // that's Ctrl-Y equivalence on Windows

Finally, the above are basic commands that are useful in vim for you to easily move around and type text. The best part of vim has not yet been revealed, that's the the common scenarios we need most frequently:

(1) search
(2) replace
(3) context-sensitive highlighting (eg syntax)


Search. to search efficiently, regular expression is required, which is going to be visited later. but for now, we go through the syntax of search itself. just remember that <pattern> is just a regular expression

/<pattern>   - this would search the pattern forward, all matches highlighted
n                - go on to the next match, forward
?<pattern>  - this would search the pattern backward
N                - go on the the previous match, backward

Do you want to see/highlight all the wsdl message in a wsdl file, type

      /wsdl:message

and you should see them all, type n each time to go to the next one, or N for the previous one. Simple!

Do you want to see/highlight all the minoccurs being 1 in a wsdl file, type

      /minoccurs="1"

Do you want to see all blank lines in a wsdl file, type

      /^\s*$\r       (for unix)
or
      /^\s*$\n       (for windows)

Note: the regular expression would be discussed later, but now,
   ^ means the first character
   \s means a white-space (space, tab)
   * means zero or more
   $ means the last character
  \r or \n is the line feed
combine them, means, from the first to the last, include zero or more white-space character, which really means blank line.

Replace. before getting use to replace, you must know the Range the replace command has effect on. the following range is common

.               - the current line
$               - the last line of the file
<N>           - the Nth line
<N>,<M>   - from the Nth line to the Mth line
1,$            - of course, this is from the first line to the last line
%              - this is in fact the same as 1,$, just a shorthand for "the entire file"
.,$             - this means from the current line to the last line

the general syntax for replace is as follows, you start with ':' in COMMAND mode,

:<range>s/<pattern1>/<pattern2>/<option>

option is like switch, you can leavel them alone for the moment, but for completeness,

g - means for all occurrence on the same line, by default without g is the first occurence of each line
c - confirm each replacement

So do you want to replace all your xml namespaces 'www.example.org' in a wsdl file? type

:1,$s/www\.example\.org/www.myorganization.com/g

which basically says, for range from line 1 to the last line, perform a replacement for those portion matching 'www\.example\.org' with 'www.myorganization.com', for all occurences on each line.

every 'www.example.org' is then substituted to 'www.myorganization.com', and number of substitutions reported.

Want to remove all blank lines, type

:%s/^\s*$\r//     (for unix)
or
:%s/^\s*$\n//     (for windows)

this is beautiful, isn't it!!!


Syntax Highlighting. the syntax and highlighting commands for one language are normally stored in a syntax file, normally under \syntax folder. check that out and most of the time the file for your favorite language is already there, if not, try google.


Help. Ok, we have not yet discussed F1 for vim yet, whenever you need to check for additional help, type ':help' (without the quote) in the command mode and you will find the root help, or you can guess by typing ':help case' for anything related to change case of characters. don't we mention that, you can

gu<m>    - change the text covered by the movement to lowercase
gU<m>    - change the text covered by the movement to UPPERCASE
g~<m>    - switch the case for the text covered by the movment

eg.

guw      - change the current word to lowercase
gU$      - when at the start of a line, change the entire line to the UPPERCASE, make a title
~        - change the case of the current character and move onto the next character


Oh, yeah, you want to search by case-insensitive too (by default, it is case sensitive for its Unix root). Type

:set ignorecase

to revert back to case-sensitive,

:set noingorecase

Oh, there are plenty of options for you to tweak under ':set' command, or you can leave them alone and visit them in the future exploration.

I will stop here for Vim and good journey on Vim! As you can see, it is very flexible and powerful, yet not so complicated if you decompose the operations into managable pieces and learn one thing at a time. Only practices make perfection.

Caution: but don't overdo it. If you always try to find the right command for every little tiny thing you need to perform, you will soon end up have no time left to think about the actual work. The purpose is to start from small (like any big project), pick out those easy commands and practice until your mind don't need to pause a second and think about it when using them: then we have fluency.

No doubt this article is getting too long than it should be, and just covers Text Editing only. So I will continue another time on the remaing items for Text Manupliation (text transformation, code generation) then Regular Expression and Scripting. Of course, I welcome your comments, agreement or disagreement : )


Posted May 08 2007, 05:11 PM by blackinkbottle
Copyright SgDotNet 2004-2008
Powered by Community Server (Commercial Edition), by Telligent Systems