Door slammed shut

I figured out why my app hangs: the server issues two TCPCloseSocket() calls. It doesn’t crash the server, but the client needs to be exited. The situation is like closing the door on someone while they’re talking, and you can still hear their voice.

After the last TCPRecv(), the client should send a concluding message. The server waits for this receipt, this confirmatory goodbye, and then calls TCPCloseSocket(). Meanwhile, the client blocks on TCPSend(); returning means the data was sent. Now the client can close. (If the client’s final message isn’t the expected farewell, something went wrong.)

Without the final receipt, the server “slams the door” on the connection, the (first) blind TCPSocketClose(). The client hangs on receiving data and never reaches its own socket-close. Because the socket isn’t explicitly closed, the server “accepts” the same connection. This time, since the message isn’t conformant with the protocol (a conditional check), control goes straight to the bottom, the TCPCloseSocket(). That’s the second time.

This was an elusive bug because the app “never” hanged for me until I launched it to a group of users. They were pretty good at getting the app to freeze. It was embarrassing. My first assumption was slow batch scripts. Maybe cmd.exe was slow and RunWait() timed out. I changed the server to use FileOpen() and FileRead() more often without outsourcing to the interpreter. But it still froze.

Next, I set up every task on the server to do only one of two things: send a file or send a variable. Anything that needed processing, like sorting an array, would be done client-side. The app continued to hang, so I did not think it was an issue with disk latency. It had to be the network.

It was only thanks to debug statements that I detected the double socket close. Because the server could still respond to clients while other clients froze, I would likely have never discovered such pathology until I let loose the stepwise ConsoleWrite()’s. Whenever the server closed the socket twice, the corresponding client would hang. Bingo.

If this works in the wild, I’ll pull out the useful parts into a networking library. I don’t know how long I will be using AutoIt or desktop apps, but it’s a useful prototype technique to whip up a single-threaded client-server app for local solutions. Synchronization is a non-issue, because data writes are serialized by connection.

To me, that’s a minor miracle.

Database buffet

“Send me the workbook, and I’ll look at converting it into a database.”

Those are the magic words. Utter them and enjoy your $1 vended Dorito’s Cooler Ranch chips. Munch them in the break room before the audience of microwaves and begin a mental forum.

You can get a report with VBA macros. You can slice, dice, filter, pivot, sort and highlight all in the workbook. Excel is powerful and useful; why jump to Access?

You get multiple users working in one or more views of a single data model. Isn’t that a powerful paradigm? Plus, you get the usual benefits: atomic transactions, click-and-drag forms, limited liability.

Just a week ago I was writing client-server hybrids of a fashioned protocol; couldn’t I have done the same inside Access? My romance with plaintext and scripts floweth over.

Question: how many databases until the whole thing needs to be merged into a bigger database, and those patched things represent the critical process? Design up-front of this sort is exactly what I am avoiding by sussing out AutoIt programs communicating to simple servers that run small scripts.

It’s like I can see the maintenance down the road, but maybe I can sort of avoid it by trusting my abilities (?)

Lockups and fallbacks

The TCP app was a real failure. It locked up too often. The server never crashed, but the application would hang. A part of me wants to blame the server-side batch scripts: maybe the command interpreter isn’t an analog of perl or cgi. Maybe I can trust it as much as I thought. After learning for /f, findstr and dir /B, || and >> redirects, you mean to tell me it was all for naught?

I used a variety of methods to “slow the user down.” Do you know how we came by the QWERTY keyboard format? People were typing too fast; the machine couldn’t catch up. This time, it’s the network: there’s a latency associated with a clicked refresh. Even if it’s only a hundred bytes over a tunnel that pumps the ocean, too many too often can cause a hang. (Specifically, the action handler for GUICtrlList.)

I added a button. Now the refresh only happens after selecting something from the list AND clicking the button. That little pause buys some breathing room. I also added a progress bar, blanket-surround GUISetState(@SW_DISABLE, $gui)/@SW_ENABLE statements, and an imperceptible Sleep(10) post-click.

Even all that doesn’t fix the app hang. I think I need to port the batch scripts to AutoIt. Then there will be less of a context switch between server process and cmd.exe process. At least, it will reduce my non-determinism to an easier isolable case: when you can blame one language, it may be easier to debug.

I thought about server-side caching: let’s store recent queries so that the server doesn’t have to work so hard each time. What’s the time difference between searching for data on disk, scattered across files, versus searching for it in an array? The second one is going to be much faster, even if there’s a thousand results.

How about delayed writes? Transmit acknowledgement, but write after we close. Then the app doesn’t have to wait: TCPCloseSocket() already gets called on their end. We’re one (small) step toward handling another connection in our pool.

The other thing is I haven’t been doing packet-level TCP. I assumed IP handled the packets, and TCP handled the connections – that all I had to do was send bytes. But others’ code seems to mention packets. I do need to work on timeouts.

Truth table combinatorics

For every n propositions, you have¬†2-to-the-n¬†possibilities. That adds up: given three characters and two propositions for each, you get six propositions – and 2, 4, 8, 16, 32, 64 combinations. Span your cast to seven or eight simple, archetypal characters as per Dramatica, and it’s 128 combinations!

That’s only theoretic: some combinations are too outlandish to work, and some progressions don’t make sense. At the climax, a character may flip from T to F or from F to T, but flip-flopping would be harder to track. Few readers scribble out a truth table for a one-off, I think.

Using the sequences of a truth table to thread your work can give you a roadmap for developing each character. How does the reader perceive this character to be “True” or “False” in regards to a proposition? For example, “The prince is honorable.” It can appear true in the beginning, but his actions may sway the reader to believe otherwise. Redemption is ripe after a steady string of False, only to turn True at the end.

You’re asking yourself, “How do I show this? How do I show that? How would this show that? Is that good enough? Would I believe it?”

Things can be further complicated by combining logic with constraints: societal constraints, traditions, law and culture. Take your reader on a journey into a rich world – a world that is not only punchy, but internally consistent.

This was also inspired by Bioshock. A friend said Spector’s genius lay in leading the gamer by the nose – getting him to think one thing until the very end, when everything completely shifts. Perception. Prejudice. Preservation.

All the truth in the world

From Discrete Mathematics and Its Applications, 5e, Rosen:

A detective has interviewed four witnesses to a crime. [Propositions follow.]

  • B: The butler is telling the truth.
  • C: The cook is telling the truth.
  • G: The gardener is telling the truth.
  • H: The handyman is telling the truth.

From the problem statement, I get the following compound propositions:

B -> C
C + G
~(~G & ~H)
H -> ~C

Here’s the truth table:

B  C  G  H   B->C  C+G  H->~C  ~(~G&~H)
-  -  -  -    -     -     -        -
T  T  T  T    T     F
T  T  T  F    T     F
T  T  F  T    T     T     F
T  T  F  F    T     T     T        F
T  F  T  T    F
T  F  T  F    F
T  F  F  T    F
T  F  F  F    F
F  T  T  T    T     F
F  T  T  F    T     F
F  T  F  T    T     T     F
F  T  F  F    T     T     T        F
F  F  T  T    T     T     T        T
F  F  T  F    T     T     T        T
F  F  F  T    T     F
F  F  F  F    T     F

There are two cases where all propositions together cause consistency: when G is true or when G and H are true. Assuming there was only one criminal, we could point out the gardener and close the case. But the question asks

For each of the four witnesses, can the detective determine whether that person is telling the truth or lying?

What? I looked in the back for the answer. The answer is that we can determine that the butler and cook are lying, but we cannot determine if the gardener or the handyman are telling the truth.

That makes sense: a criminal wouldn’t implicate himself, and presumably the truth-tellers wouldn’t take the fall. Somewhere in here are the seeds of plot and human nature, and why stories can still be the taut battleground of drama.

On the cusp of being useful

For my app demo, I activated a couple instances of the program. I set up both to send messages to the server on a button press. On the count of three, my friend and I synchronized the clicks. It was successful! One of us got a “First one” message, and the other person got “Already did it” message.

The major selling point was that, like biological fertilization, only one task would be handled at a time; all others would be notified otherwise. If it couldn’t do that, if it couldn’t synchronize among multiple clients, the app was worthless. The one essential requirement was coordination among multiple users.

The whole thing works because the server is single-threaded. The app works because it’s not Facebook. It’s not meant to handle massive, parallel traffic. It fulfills its purpose, neatly slotted, to increase efficiency. My plan is to gather enough metrics to answer the question “Can we predict growth over time by looking at similar patterns?” I would like to use “machine learning principles” somehow. (Whatever that means.) But first, I need data.

The reception to the app was much more well-received. Because I abstracted TCP traffic as a general request-response protocol, it took less code to write new functions. The client and server handled “packaged data” that could have arbitrary key-value pairs, a poor man’s version of HTTP.

The use pattern for this kind of software is unlimited: multiple users accessing data in a synchronized way. Of course this has been going on for a while. I just chanced upon it because of AutoIt. Looking back on a year of writing code, I’m certain I would not have been as productive without it.

AutoIt abstracts Windows library routines into a VBA-like syntax. It’s a natural progression from Excel VBA. Like Excel, it’s a killer app.

Message protocols

I wanted to generalize TCP communication because the server was getting crowded. Wouldn’t it be better to have a general data-passing strategy? What if everything was just requests and response? I came up with a @CR-separated name:value scheme:

please:add-me
first-name:roger
last-name:smith
address:mars

Now look at this:

HTTP/1.1 200 OK
Content-type: text/html
Connection: close

<html>
...

Uh-oh, I’m reinventing protocols. However, it is preferable to handle communication as general TCP data back-and-forth, because the format can be packed and unpacked in a standard way. There’s still too much magic strings – arbitrary string-naming, like magic numbers – in the code, but I see it as a step forward.

Plus, I never would have realized the HTTP connection until I had been looking for “a better way to do things.” Sometimes, rewriting code is the best learning experience.

Side Effects

I wrote my first function with side effects today. In VBA:

Function isTrue (ByRef sString As String)

  sString = "side effect"

  isTrue = True

End Function

I’m not sure if that’s actually a “side effect.” I’ve just always avoided writing functions like these. Functions should do one thing, and initializing outside variables with functions still feels like a foreign concept. In AutoIt, it feels like

Dim $prodigal = "better days"
If CruelWorld($prodigal) Then
  ConsoleWrite("You've been " & $prodigal & @CRLF)
Else
  ConsoleWrite("You've seen " & $prodigal & @CRLF)
EndIf

Func CruelWorld (ByRef $myChild)
  
  Dim $badApple = "unduly influenced"

  ; only God knows 
  $myChild = $badApple

  Return True

EndFunc

You could say it was my decision all along. But how else do you initialize a struct? The Win32 API uses this a lot. You ampersand the variable and toss it in the fire. In other words, the experts use it. However, knowing me, I should avoid doing it too often: it is tempting to write “yet another function” that will only be used “once,” and then I channel Mozart and build “Amadeus” with a dozen of these scattered around, an edifice to conceit, and it comes back to bite me with mysterious debugging errors.

The reason I call it a side effect is Lisp uses these a lot. There’s some parentheses, and you do something, and a string gets set “on the side.” Sometimes, a function really does do some useful work, and duplicating its effect in a second function, just to stay clean, isn’t clean. As long as the logic is consistent between return value and side effect, I guess it’s okay to use (sparingly).

Debugging learning

The very stuff I write is more mysterious to me than when it runs. Sometimes, I simply accept that the behavior will be indeterminate, that the output will be a crash or a freeze. Sitting inside a colorized text editor, I couldn’t begin to draw up or diagram the proof or state machine that could simply be mangled by the simple transition from theory to program. What happens, happens. I’m left to pick up the bits.

The first thing I do is try to reproduce it. I didn’t have enough debug messages. I add more of those. I try to reproduce the behavior. It’s not a problem if it never happens again, right? The bug itself helps me focus: isolate the smallest scope of code in which the error happens. Bracketed console statements – or dialog message boxes – can be expensive breakpoints; you can while away your time bouncing among different screens: server output, app output, putting in the same long input string (again), and so on.

Once I can reproduce the bug, I step back and look at the logic flow. Somewhere, something assumed something else, and the determinate chain led to a crash. This is the most interesting part of programming – and the most frustrating – because you have to mentally plot each step the computer took. Your mind is on fire with all these possibilities, and you check them one by one, mouse wheel flicking.

Technically, the computer already knows it will crash: I programmed it in a certain way, and it executes the instructions. I do not mean the computer is conscious; I just mean the machine will do The Thing as you wrote it. That got me thinking to learning.

What if we could debug learning? That is, somehow our brains pick up what we want to learn, faster than we could grasp by reading each sentence. Of course, when we are tasked to demonstrate this knowledge by solving problems, we get stuck. We have to go back and carefully think through our mental framework and challenge our assumptions. We’re trying to reproduce our erroneous understanding. We’re debugging.

Learning through programming is the best way for me to learn, I think. Even if I could memorize paragraphs of set theory, I still wouldn’t be able to teach it to someone. I need to internalize it in such fashion that I could deliver a hundred lectures on the topic, and they would all essentially have the same content. I learn by doing.

So whatever I want to learn, I ought to implement a program that reflects my mental model of the concept. If algebra, language trees to parse symbols and equations. If biology, distributed communication graphs. If genetics – genetic algorithms! Statistics? Machine-learning.

With the “C Answer Book,” the most agonizing experience is seeing how far I am from their implementations. I see clean C code, and I flounder. Comparatively, I can write many different “types” of applications in AutoIt, because buttons, ListViews and strings can accommodate many problems. But parsing a /* */ comment – I don’t have the context to write similar programs in that space. Yet, that is what I would need to learn: writing more programs “of that nature.”

I guess the best I can do is to write bug-free C and keep trying to re-invent the wheel. Eventually I’ll have enough background to understand glibc, and I can start using their functions. Higher-order programs should follow. I can’t imagine using strcpy() or malloc() with the sheer abandon of declaring AutoIt variables just to parse CSV values.

Somehow, it feels I must write a shorter, worse implementation of every C standard library function as a personal trial.

Father, son and machine spirit

I’ve been writing an application that implements some basic caching. The server is single-threaded, so I need to minimize latency in every way. This is done with improvements at three levels.

The slowest is the network: not only does the request go across the line, but the server must also perform I/O to fulfill the request. It would be nice if we save the trip and cache repeated requests to the user’s hard drive.

The local hard drive is much faster to access than network I/O. We can store temporary values that will be useful during the run of the program. This cache allows the application to appear much more responsive. Comparably, network communication took seconds, while disk fetch took less.

Disk caching is nice, but we can go one better: in-memory caching. For my purposes, I just use an array. Lookups will not be conceivably large, so space is not a problem. The runtime of the program is not long either, because users will mostly likely close and launch it each day.

When the program is started, the disk cache is loaded up. This preserves the user’s work across sessions without forcing him to click Save. Hm, auto-save. Nice.

We wouldn’t even need a server except for controlling read-writes to disk, which is necessary for a central location. Each user draws data from it for his work process. I looked at semaphores and locking files, but I want to see if this will be sufficient.