A simpler system

My latest entry-making accepts certain foibles:

  • I will probably never look at previous pages again, unless I really, really need to.
  • I will probably remember the visual context of past information than the content.
  • I almost never refer to old notes. If they are the same observation, I can’t be bothered to look farther than eight or ten pages (much less a second or third notebook).

I bullet-point each new entry, whether a sentence or a full-page project brainstorm, as “note x.” This separates random quotes, bugfixes, follow-ups, etc into discrete items. They are sufficiently atomic and granular.

Dates are placed only in the upper-right corner. The upper-left is for note ranges, like “notes 51-66.” Flipping pages left-to-right or right-to-left lets me see both sides!

I bookmark – with a felt-tip marker – any lists, but I will try to avoid making lists of lists, indices, etc. I want the notebook to capture my thoughts and to provide the bare minimum referencing (“see note 52, 73, 89”) without the obligation of page divisions or allocated page chunks. Lists take up a whole page, so consume wisely.

Software engineering

One positive about abstract is if you take it as far as you can, tracing bugs may improve the lower pieces. If they are properly compartmentalized, the impact propagates to the higher-level dependencies without introducing subtle problems. But it takes one novel re-use of a function to cause an API split.

That could be a form of evolution: the API needs to be improved; improve the API. That does require automated testing, because every dependency will need to be tested for consistent behavior. GUI testing will require mockups, not limited to printing PDFs, similarly designed interfaces, and maybe even network errors.

The promise of software is that we can make up anything, but I persist in treating only the real. Only the tangible, working code matters – results are immediate and noteworthy. On the other hand, there is a form of discipline in developing simulations, a patience in improving a model to reality, that distinguishes certain classes of programmer.

Equal equipment

Monolithic projects don’t sound so bad, especially when a corrupt binary workbook (.xlsb) errors with “Visual Basic: File not found.” The fix – for me – is to copy over the working version. So far, it has happened when I am editing the code and saving it; not treating it as a read-only file in production has bitten me more often than not. Dodging a briar of scheduled tasks to put in a new copy is more exciting than a rickety roller coaster.

Splitting out the modules helps. Put all your eggs in one basket, and you’re bound by a greater number of dependent tasks: there’s more of a chance that you will have to wait till the end of the day to copy safely. All in all, more of a mess than one needs in 2015.

Why haven’t I ported the non-VBA dependent elements yet? I was promised a new machine. I wanted to wait and stuck to expedience. Being a specialist is nice, but I’m always the last to hear about problems and expected to work on it “first.”

Double-quoted args into batch script

Spaces in paths have me coercing tightly-coupled scripts invoked from AutoIt RunWait(), arguments in quotes to handle spaced paths. If you escape first, you can ignore the problem of the first double-quote in the batch script forming an empty string with the first double-quote of the incoming argument:

@echo off

setlocal enabledelayedexpansion

set arg1="%1"
set arg1=!arg1:"=!
if "%arg1%"=="" (
  echo error provide arg
  exit /b 1
)
echo !arg1!
endlocal
goto :eof

This batch script should handle no input, input hi, input “hi”, and input “hi there”.

Parsing with Command Prompt

After pdf2text outputs a good file from an OCR’d PDF, we can pull out the relevant data with findstr and some assumptions. The first is that the data is pretty uniform and consistent. Second is having a watchful user familiar with the general distribution to detect anomalies.

setlocal enabledelayedexpansion

set next=0
for /f "tokens=1-3" %%i in (file.txt) do (
  if !next!==2 (
    set data=%%k
    goto :out
  )
  if !next!==1 (
    set next=2
  )
  if "%%i"=="Begin" (
    if "%%j"=="Transaction:" (
      set next=1
    )
  )
)
:out

echo !data!

endlocal
goto :eof

The raw data from pdf2text could be something like

Begin Transaction:
8xz94hfa94803bjfk3i33ijfj3i3j3if948f
81 STAR LOCUM

The “next” var controls whether we’re interested in the next line after. To discover this technique in batch scripting before any other language might mean I needed its constraints to be creative.

Note: when using findstr with a file, blank lines will be ignored.

Scroll-search lookups

I inhabit the space where a key-value store can be produced in Access and moved where a workbook can send SQL queries. Our users are prosumers: they craft new spreadsheets and harbor process in calculated fields. When the rows increase and cruft accumulates, certain optimizations present themselves.

In this case, an SQL lookup and Application.GoTo obviates the slower by-row Ctrl-F function. A midnight task churns through the day’s updates and produces a fresh database file. That’s one way to avoid the gradual, inevitable corruption of network transactions.

Being clever helps, but sometimes I wonder if I ought to be developing some series of applications. Something to leave my mark, as it were. These thoughts float in the ether. How far with scripting can I go; am I too simple?

Workbook refresh

The AutoIt method of treating an Excel workbook – or worksheet – as a database table and querying it with SQL is a useful conversion to CSV. Then cscript and VBScript to run an XLSB macro to convert it back to a workbook, and the file opens quickly again.

It seems regular network use and handling the various formatting and saves “ages” a workbook, because even one with 26k rows will crawl. Not everyone’s machine is affected, but this kind of “it was working yesterday” problem could be remedied on the user end. I saw a workbook balloon to 700 MB memory use as it was opening!

So a useful tool would be one that produces a clean workbook from an existing one, although surveying the sheet names would precipitate opening it. If it’s safe to assume the relevant sheet was the one last used before resorting to the app, then we can move forward.

PivotTable time bomb

Automating Excel is nice, but then VNC fails to launch. Too many EXCEL.EXE procs hanging around; even though other tasks run normally, a single error is enough to prevent remote control. The usual error is hanging around long enough to suffer the shortcomings of your early code, and the fix is to handle it.

In this case, creating PivotTable will fail with only the header row. To only have the blase of the past, when data would always be present; but in our case, we ought to check it from now on:

...

  If Range("A1").CurrentRegion.Rows.Count < 2 Then
    Exit Sub
  End If
  ...
End Sub

FileCopy block waiting

In Excel VBA, FileCopy and Shell return immediately. This is thorny for trying 7-Zip on files that haven’t been copied yet, or deleting .zip files that the 7-Zip process has locked. I ignore the second problem by clearing out any compressed files at the beginning instead of at the end. (However, you may still want to block wait to prevent the user from entering data into an InputBox() before the Command Prompt windows have disappeared.)

zipFile = Dir("C:\zips\*.zip")

' tumble over each other
While zipFile <> ""
  FileCopy source:=zipFile, destination:="C:\decompressed\" & _
    Format(Now, "yyyymmdd_hhmmss") & ".txt"
  zipFile = Dir()
Wend

' block waiting
numFiles = 1
While numFiles < 10 ' are you sure it is 10 files only?
  localFile = Dir("C:\decompressed\*.txt")
  While Dir() <> ""
    numFiles = numFiles + 1
  Wend
Wend

Participant arrays

You can create dropdown fields in Excel by specifying a custom data validation range and capture the selection event. In my case, a successive series of dropdown fields generated a smaller, filtered set. Even with only two hundred rows, populating the data to the filter worksheet was too slow.

Since the data is never modified, the search could be improved by using arrays. It’s not even “search,” but a kind of custom AutoFilter. The only difference is that a button will transfer the data, with accompanying user input, into another worksheet. My idea of the spreadsheet as an in-memory, two-dimensional allocation array did not take into account the cost of n-squared iteration: per-row and per-column processing. (It’s slower than I thought.)

The data is populated from an external macro, so at that time I can apply a sort. That will reduce the lookup to a chunk of contiguous rows; as soon as the first is found, keep all subsequent rows until the value changes. That is less than the worst-case of traversing all n rows. In fact, I can use this strategy for every level of the filter, as long as the sort uses the same levels (for example, planet, continent, country, province/state, city, etc).

Each filter after the first is only as large as the size of the contiguous chunk. Optimistically, the third and fourth filters only require two looks: one to suss the value, and another to determine that it’s the only result.