Split inputs for asynchronous tasks

By dividing your inputs among files, you can launch them and the OS will schedule each as its own task. You control the file names, so there isn’t any concern for shared memory synchronization.

 set taskno=0
 set nprocs=10
 for /f "delims=*" %%i in (list.txt) do (
   echo %%i>> task-!taskno!.txt
   set /a taskno=taskno+1
   set /a taskno=taskno %% nprocs
 )

This script divides your inputs so that you can cycle among them and launch processes:

 for %%i in (task-*.txt) do (
   start do-script %%i
 )

If you’re fortunate enough to have a remote machine, get the Pstools and learn psexec; it is really useful.

Advertisements

Launch asynchronous processes from cmd

For long running processes – even when they’re scheduled – you can take advantage of concurrency by running more of them at once. Break your script into smaller pieces and distribute the inputs among them. The main command we will use is start:

start /?

I don’t know why, but instructions after a spawning process are ignored:

@echo off

for /f "tokens=*" %%i in (list.txt) do (
  start do-script.bat %%i
)
echo doesn't get here

If you wrap this in a separate script, you can continue with the rest of the script:

@echo off

rem wrapper-script.bat
rem same code as before, but saved to this separate file
...
start do-script.bat %1

And in your original script,

cmd /c wrapper-script.bat
echo oh hai

Next I’ll talk about splitting inputs. Batch scripting is file-oriented, so asynchronous processes don’t have to step on each other’s toes.

Eye believe

This threading example demonstrates the need to synchronize a shared object. When would you ever need to synchronize a shared object? Suppose you have a World object, and you pass the reference to multiple threads. Suppose that any of them could invoke methods on the World instance at any time. In our case, the “world” is a candy shop and there are two candy thieves, or “jacks.”

class CandyShop {
    private int candy = 1;

    public boolean isAvailable() {
        return (candy > 0);
    }
    public void eat() {
        System.out.println("someone ate a candy!");
        candy--;
    }
    public void printInventory() {
        System.out.println("number of candy:" + candy);
    }

    public static void main(String[] args) {
        // expect negative number to be printed

        CandyShop shop = new CandyShop();
        // instantiation starts thread
        CandyJack j1 = new CandyJack(shop); 
        CandyJack j2 = new CandyJack(shop);

    }
}

Our infamous candy jackers:

class CandyJack implements Runnable { 
    // a CandyJack instance eats candy! 

    private CandyShop shop = null;

    public CandyJack(CandyShop shop) {
        this.shop = shop;
        Thread t = new Thread(this);
        t.start();
    }
    public void run() {
        while (shop.isAvailable()) { // expecting pre-empt here or --
            shop.eat(); // -- here
            shop.printInventory();
        }
    }
}

It is possible to end up with a negative quantity of the candy. One jack observes there is one piece of candy; the other observes it too; both move to eat. Their perception is a framed belief: there is a candy available to eat. In reality, there ought to be only 1 piece and then 0 pieces. This is a situation where we would want synchronization: threads sharing an object.

Threading separate tasks

A straightforward example of threading: no shared objects, so no synchronization problems. Each object running in its thread maintains its own state. However, I can’t access their methods in the main thread at the end.

class MyThread {
  public static void main(String[] args) {
    Person p1 = new Person("p1");
    Person p2 = new Person("p2");

    Thread t1 = new Thread(p1);
    Thread t2 = new Thread(p2);

    t1.start();
    t2.start();
    new Thread(new Person("p3")) { // anonymous inner class
      }.start();

  if (p1.getPos() >= 10)
    System.out.println("p1 got to 10.");
  if (p2.getPos() >= 10)
    System.out.println("p2 got to 10.");
  }
}

And here is Person.java:

class Person implements Runnable {
  // a person moves

  private String name = "untitled";
  private int pos = 0;

  public Person(String name) {
    this.name = name;
  }
  public void run() {
    while (pos < 10) {
      this.move();
    if (pos % 2 == 0)
      System.out.println(
        "Person " + name + " moves " + pos);
    else
      System.out.println("\tPerson " + name +
        " moves " + pos);
    }
  }
  public void move() {
    pos++;
  }
  public int getPos() {
    return pos;
  }
}

Doesn’t main need to block waiting?

It’s not real though

One of the first exercises in “Introduction to Algorithms” 2e is to fill in a table of input sizes according to an algorithm’s efficiency. So if an algorithm takes n! factorial time to solve input size n in microseconds, how long will it take in 1 second? 1 minute? 1 hour? 1 day? 1 year? 1 century?

Because factorials don’t use the numbers after the decimal point, but they become very large, I wanted to use the long data type. However, at 13! factorial both long and unsigned long overflow. The state of my program is sad when I have to use a calculator to confirm the results! I was going from 4.8e8 with 12! to 1.9e9 with 13!, so something no-good was happening.

I ended up using float because it can fit such large numbers. I didn’t want to, but I needed a data type that could accommodate the size. Conceptually, I was thinking large integers in the set of integers; though integers are in the set of reals, it felt cleaner to go with a representation that could store 4924928492849 and such. But not even that could happen.

So data types are just symbols of sizes. That made me want to ignore long and unsigned long; if I want anything larger than int, I would go straight to float. I would still like to use long and unsigned long, so that probably won’t happen. At least I am cognizant of it.

Data type interpretations are like entering a foreign land where the customs differ. It happens with the speed of an assignment. Truncation is silent; assimilation is total. If not for the compiler warning me about the output of %ld versus %.2g, I would have missed a few instances. In that sense, function prototypes and data type formatting checked at compile-time are good.

It starts to make sense to interpret data types through capitalized typedefs: the idea of ‘int’ and ‘long’ and ‘float’ are symbols for sizes. Our mental model is filtered through a machine’s numerical abstraction.

Using SQL statements in Excel to query workbooks

If you’re in the interstitial space between needing an Access database but stuck with a large workbook, try this:

Sub QueryWorkbook()
  Dim conn As ADODB.Connection
  Dim rs As ADODB.RecordSet
  Dim cn As String ' connection string

conn = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
    "Data Source=" & ActiveWorkbook.FullName & ";" & _
    "Extended Properties=""Excel 12.0 Xml;HDR=No;IMEX=1;"";"
  Set cn = New ADODB.Connection

cn.Open conn

Dim sQuery As String

sQuery = "select F1,F2 from [Sheet1$] where F1='fred';"

Set rs = New ADODB.RecordSet
  rs.Open sQuery, cn, adOpenStatic, adLockReadOnly, adCmdTxt

If rs.recordCount > 0 Then
    While Not rs.EOF
      For i = 0 To rs.Fields.Count - 1
        Debug.Print "field " & i & ": " & rs.Fields(i).Value
      Next
      rs.MoveNext
    Wend
  End If

End Sub

exp2f() and exp10f()

I received a compile-time error of “incompatible implicit declaration” when using exp10f() and a linker error of undefined reference with exp2f(), although both are documented in the GNU glibc manual.

To use exp10f(), you need to define the GNU feature test:

#define _GNU_SOURCE

To use exp2f(), you need to link the math library libm.so.6:

> gcc myprgm.c -lm

This knowledge is already documented on your computer:

> man exp10f
> man exp2f

Plato and the shadow app

I was presented with a user requirement of a concurrent grid view. Nothing major, right? I could do it in AutoIt+db, but I wanted to learn Java. The requirements were scant. Despite the positivity surrounding agile development, it sucks to throw away code. Then I thought about ideal shapes.

What if the user requirements reflect the implementation of an app that is a subset of some idealized super-app? That is, the features of the user app(s) are components drawn from a higher-dimensional “capability” toolbox (that could be also an app). You abstract the pieces up to a perfect plane, and import them as packages below.

This follows the natural “reuse” philosophy of OOP, the magic panacea to our software ills. How do we avoid componentization fever and Factory Factory Factories? I don’t know; I haven’t finished “Head First Design Patterns.”

But what if? Think of it as mind control: the frustration of adapting to user requirements is a call to build a more flexible, rigorous version of the componentized app. The key thing is that the interface itself is trivially adjustable. Three words: Netbeans GUI Builder.

Programming should make you happy.

This is actually JetBrains’ motto!

Java the AutoIt way

The Java trails are full of great info, but what if things were done the AutoIt way? Not so much The Way, but one way: macro programming and command-line processing.

Classes and OOP as a means of organization, but in the end code that you can compile and use immediately. That could mean simplifying the library into a single API, which is pretty foolish. But if it means abstracting objects to avoid reading docs more than once – except to use the streamlined API – then why not?

It gives me a reason to learn Java in a methodical way while building tools throughout the process. It also gives me a glimpse into the API design of AutoIt via its help docs, which are awesome. The AutoIt docs and forums are some of the best.

What if you could do this in Linux and Windows with one JRE?

package clicktest;

import autoit.*;

class ClickTest {
  public static void main(String[] args) {
    MouseMove(300, 300, 0);
    MouseClick("left", 300, 300, 1);
  }
}

Probably not worth it.
JNI with wrapper over AutoItX?

Hazards of a mapping table

We’re in the business of data reconstruction: whole forms of modeled pieces dissolve in the traffic of HTTP and are reconstituted on an alien machine, placed in relational divisions linked by rail-thin lines. The data we scrape, the data we harvest, is not always how we want it; new models reflect the compromise between view-spec and report spec: one person wants the GUI like so, and another the report thus. SQL is the collagen, the matrix, gluing the interface between.

Suppose you’re tracking users across systems. One holds data on their gender; the other on their work habits. They are retrieved through specific interfaces, so you can’t execute custom SQL statements. It’s safer and you don’t have to futz with understanding a foreign model. Take the data and come to your own conclusions.

You unite them in your database with a mapping table: each user has two login credentials; each user has an ID in the user table. As a twist, in one system the user may have multiple login accounts! So you need some way to map user A thus.

The hazard is when you build the report, when you build up the monument from the bricks. If you use the mapping table as glibly as the rest, you may end up with duplicate results.

mysql> select col1, sum(col2), col3, userid from user, t2, 
> mapping where 
> user.id = mapping.userid and 
> ...;

+------------------------------+
col1 | col2 | col3     | userid
+------------------------------+
userA  2      blonde     24
userB  5      brunette   4928

mysql> select * from user, t2, mapping where user.id = 24

+------------------------------+
col1 | col2 | col3     | userid
+------------------------------+
userA  1      blonde     24

What happened?

mysql> select * from mapping where userid = 24;
+--------------------+
userid | login | col2
+--------------------+
24 jackal ...
24 freddy ...

Your mapping came back to bite you!