Counting words

I keep returning to the early solution of counting words: a state variable tracks whether the program state is in a word or out of it and counts each transition from a blank character to a nonblank character. Blank characters are whitespace characters like \n, \t and spaces.

I wanted to approach the solution from “first principles,” which meant I had to think for a while on how to reasonably fake a path to the presented solution (naively ignoring that it might have been a series of revisions).

...
if (c == BLANK | c == TAB | c == NEWLINE)
  state = OUT;
else if (state == IN) {
  state = IN;
  wordcount++;
}
...

It’s that niggling else-if that drives me crazy! How does one intuit that; it feels like magic. There is some mysterious implication that the authors understand that I do not; but we will see that it will not remain so for long. First, we make a table:

states                transition? start of word? end of word?
blank -> blank        no          no             no
nonblank -> blank     yes         no             yes
nonblank -> nonblank  no          no             no
blank -> nonblank     yes         yes            no

Notice there are two transitions and each is distinct within its context: going from nonblank to a blank character means reaching the end of a word; blank to nonblank means the start of a word. (If our problem were counting blanks, we would consider the first transition.)

Then the pseudocode is

if the current character is a blank,
  we are at the end of a word (set state OUT)

if the current character is a nonblank AND
   the previous character is a blank,
  we are at the start of a word (set state IN)
  count it

Combining the second conditional as an else-if means we equate “current character is a nonblank” with “current character is NOT a blank”: if the first conditional is false, the second conditional is considered. And, if the previous character is a blank, the first conditional should have set the state to OUT. Thus we compare with state OUT, just as in the authors’ solution.

Advertisements

Ho! Productivity

C# and Visual Studio 2008 enables me to – almost – shoot for a line-by-line port from AutoIt. That’s both scary and pleasant. A lot of common tasks are solved on StackOverflow or in Deitel; I switch between web and book while occasionally dipping into MSDN. Compiling down to a 17 KB (!) executable looks nice, too. (By comparison, AutoIt with UPX compression yields ~300 KB files.)

The .NET and CLR technologies let me stand on the shoulders of giants, but with whom am I a giant? It is certainly not business, which holds the reins. It is not the kernel developers or the device driver writers, who inhabit the deep world. My things are the thinnest gloss over circumspection; my works live only as long as they can be replaced. Maybe if I remember that, I will stay humble.

I/O Profit

There is a market for processing and printing. It is the bread and butter of computers, to read and write from one place to another. An apathetic regard for “the same old” will result in a sigh; you must put yourself at the head of the table: this data in the right context is information gleaned, filtered, and stored. It drives decisions, marks trends, and has the potential to transform and influence people.

Making these appliances – as it were, not complete, integrated suites or small scripts – is to create context. You are creating a sort of culture: the data together, those fields designed conforming around each other, become instances contrasted; their self-reference is implicit, innate.

Those goggled, post-apocalyptic scavengers with their cracked tablets and matte screens peer at the scrolling lines and green bars of your diagnostic program. A decade old, it functions to pronounce meaning by its manufacture; few authors emerge past those tests.

Backspace versus erasure

Code driving a line printer whaaat ? No, never: letters blink alive against white in a software terminal. Backspace understandably is confusing: who needs to send an escaped character like \b when the world is always forward ?

It’s a control character, like \n and \t and others. Assuming the printer does one of two behaviors – feeding paper out as a “new line” or moving the carriage to the start of a line (carriage return) – the overstrike output starts to look sane:

input: in the\ne\b_ver after

1
 in the
 e
+_ver after

The top line with “1” is the column where a “+” character is placed to indicate an overstrike. A blank is there for regular lines. Mapping control characters to physical mechanics, we get

  • \n : advance the paper and carriage return
  • <space> : printing lots of them “advances” the printer head to the right
  • \t : to a printer, tabs are “software-defined blanks”
  • \b : move the printer head left

in the

ever after

Between skills and chance

Porting a work to C# because antivirus flags false positive on the UPX-compressed AutoIt executable. More and experienced people have worked on the problem for a couple months to no avail, so my opportunity came up therein.

I like C# because it is like VBA with C syntax. Visual Studio gives all the right hints and everyone’s solved my problems. Every question is literally a StackOverflow answer away. When the robot overlords come to roost, some of us can tuck in with our spades and convince ourselves we are workers honored.

Papers! That is the way: modeling straight from the research, muddling through unknowns per sentence, a density familiar to grad students and exciting – frustrating – for me. Now if it were cheaper to puzzle symbols, or to continue that act, would make things better.

I thought about benevolent rootkits, but decided it was better to dream on those things instead.

vi mark

The m key marks a place on the screen to return later. You can store up to twenty-six different locations, as you specify a-z to save the spot. Suppose I move my cursor to the middle of the following phrase:

hello there
     ^

Now I type ma to mark the location as “a” spot; now I can do any of the following to return to it:

'<CR>
`<CR>
'a<CR>
`a<CR>

Every command will take you to the marked line, but the last command will take you to the exact cursor position.

Imposter! Imposter!

AutoIt devs came along and wrote a client-server. The TCP/IP stuff I had been struggling with was, for them, solved. Maybe we could bond over my weaknesses, a kind of ice-breaker to see the working code. I’m used to a culture of sharing, of code “up there,” free software; it’s a form of indoctrination when I can’t grasp people working proprietary.

A part of me is bothered by not-sharing as the default. Not reporting work, of new projects. As if these things can happen in a swirl above me, and I’m a function to get invoked when context calls. Alternatively, it is the unsettling of not knowing whether we are a tribe together or I am of the slow marginalization.

Well, I am not the CTO or a manager. So I am on a need-to-know basis, right? If I was really interested, wouldn’t I keep tabs on everyone regularly? But while I push my projects to others, I do not get the same return.

Import formulas with commas into Excel

Commas separate fields in CSV-formatted files, but you can take advantage of Excel’s CSV import to embed functions into cells. Usually we would only be able to have files like this:

col1,col2,col3,col4
3,b,c,=TODAY()
1,d,e,=TODAY()
...

However, you can use quotes to specify more complex functions. Here is an example with Tcl:

puts "$var1,$var2,$var3,\"=HYPERLINK(\"\"http://www.google.com\"\",\"\"Click here\"\")"

Excel will treat double-quoted fields as complete cell values, even when there are commas within the double quotes.

do while vs. repeat until

RATFOR translation in “Software Tools” has a repeat-until loop in the code listing for the detab program. The program replaces \t (tab) characters with a number of blanks until some predetermined tab stop.

I was always leery about initializing arrays in other functions, but detab uses a function settabs() to initialize an array passed from main(). This is a kind of interface, where the calling program only has knowledge of the functions’ use and not its implementation. The array is allocated in main() but used in its domain context as a tab-stop object.

I also tried to stick with function refactoring, where a long piece of code invoked more than once would be stuck in its own braces; however, in this case we have an initialization function called once and another function that returns a ruling on whether a loop should continue. That loop is the repeat-until loop.

C has a do-while, so I thought it was a direct translation. If I kept it, detab only produced one or two blanks per tab and not the expected output:

input:     some    thing   like    this
incorrect: some_thing_like_this__
expected:  some____thing___like____this

The do-while loop assumes true and continues until false; repeat-until assumes false and continues until true.

...
} until (tabpos(col, tabs) == YES)

...
} while (tabpos(col, tabs) == NO)

The continue statement

If you can reduce a workbook down to CSV text, you can use any language to process it. That gives you a chance to try different idioms. In this case, I used the continue keyword to process domain rules.

foreach line $fileInput {
  set row [split $line ,]

  set fld1 [lindex $row 0]
  set fld2 [lindex $row 1]; # a number
  set statevar 0

  if {fld1 eq "TYPE1"} {
    # process according to TYPE1
    set statevar 1
    # ...
  }
  if {$statevar == 1} {
    puts stdout "Derived value: [expr {$fld2 + 1}]
    continue; # skips further rule processing for current row
  }
  if {fld1 eq "TYPE2"} {
    # process according to TYPE2
    # ...
  }
  ...
}