Access doesn’t support multiple SQL statements, so each CSV row is multiple statements constructed as custom strings. First, the inserts go into the key-value tables. Then the @@IDENTITY is pulled out for aggregate tables. It looks fast by the scrolling of the debug messages, but performance improved by using an array buffer.
Performance-wise, a thousand rows is equivalent to five thousand: 10k rows took 20 minutes with both buffer sizes. That’s half the time it took reading the CSV file line-by-line, so still great.
Dim buff(1000) ' buffer is an array of strings
Dim fn As Integer
Dim csv_path As String
Dim bi As Long ' buffer index
Dim ss As String
fn = FreeFile
csv_path = Environ("USERPROFILE") & "\test.csv"
Open csv_path For Input As #fn
Do While Not EOF(fn)
Line Input #fn, ss
buff(bi) = ss
bi = bi + 1
If bi >= UBound(buff) Then
For i = (bi - 1) To 0 Step -1
bi = 0
If bi > 0 Then ' process any remaining in buffer
For i = (bi - 1) To 0 Step -1
' pass in instance of CurrentDb as Database object
' use Split() on csv_line and process
' build SQL statement strings
Sub DoSQLInsertWith(... csv_line As String)
Several years under the belt makes for a fast turnaround: suddenly, people are interested in you. Not sure if it’s early in the process, but it was nice to have some positive attention. Before, if you did not have a portfolio and had less than a year of experience, you were a nobody. Nothing has changed except I’ve banged out code daily for a little while.
I haven’t specialized in a programming language. My resume is to sit in the plush chair and get a makeover. There’s talk of targeting and clients. Things in which the world operates one way, and I am caught up in a tide.
She asked why I was looking. “To make more money,” I replied. She laughed.
In Excel, cleaning up each cell per row for 50k rows takes four hours. It’s a similar amount to read line-by-line from the resulting CSV file into a database. For something ten times the size, forty hours of compute time is enough to shock someone with over a decade of software experience. Me too, because forty computer hours is probably several human lifetimes.
Using a dictionary to read the spreadsheet into memory improved things. The 500k rows happily hashed in about a minute, and it only took 30 minutes to process the 50,000 rows. This experience made concrete the importance of processing data structures like arrays and dictionaries than per-line row loops.
I did not really think of time as an obstacle with automated reports, because they ran overnight: as long as they finished for the morning, the time might as well be “infinite.”
Sharing code and then seeing duplicate projects slowly spread into your domain is frustrating and awkward. We have the same manager, so it seems implicit that projects are approved and overseen to prevent overlap. Yet my help allows the other to progress, and I am none the wiser: no screenshot, no reciprocation of code, and just a confusing muddle of weirdness.
And if I refuse? That seems disingenuous, since I freely share what I have learned here. Why the Internet but not closer? And what I learned came from others also sharing. Still, intuition makes me feel I got the short end.
Recently, I have gained some comfort by imagining some duplicate somewhere in the world, working hard and sticking to study. We will meet and be calm amid the whirlwind. Our struggles were shared without communicating, and we got here by our bootstrapped labors. Together, we will inhabit the ever after.
Watching a brief segment on libraries and literacy, I suddenly realized free software was connected to borrowing books: the information I borrowed could be – should be – free. This was a moment of inexplicable tears and blubbering explanation as my parents stared at me.
Here was some kernel of a memory, ingrained as a specific pattern of neural pathways, dormant in some dusty portion of the mental warehouse, discovered – and with it, an unleashing of emotion. Free software had become some indelible identity, unchallenged, accepted, and perhaps had driven my decision-making.
Why the uncertainty of proprietary reinvention, the willful and deliberate remaking of things, its practitioners magicians referencing all works, proudly claiming ownership in a shared mental work, might have come from that first assumption: software as free, first.
Yet my resistance to frameworks and libraries, when that same code was made as books: information to be drawn up and applied in other contexts, a method of transmitting order to order. My current equivalent would be rewriting each book to work on the current volume, just to have primary authorship. And if not that, a poor assurance of others’ contents, that it had not survived enough fires and wars, to prove mettle to stand on pedestal shelves.
Watching the status page of an auto-refresh set of tables, I see each print request: each row a piece for a complete rebuild, but each component its own page. No matter how small, each requisition is inked in official format – and that includes individual screws, caps, and others.
Since I’m already parsing things, the next step would be to chunk similar items together. For each interval of print, combine like parts together: that caches the lookup, and the allocator can naturally assign a list to one person. He is now thinking in terms of one list-one person and not one part-one person.
On the other hand, this up-front processing is still delivered as a batch of tasks to allocate. The allocator’s job doesn’t change; one by one, he has to put them back into the system. It’s kind of crazy, but there’s nothing in place for indicating who is available to retrieve more parts. No one is interested in being more efficient, because they would just receive more work.
It takes several hours to convert a workbook to CSV. Each field is examined and cleaned. This could be replaced with a single formatting pass, but the optimization would require more testing. After conversion, it’s still line-by-line to construct multiple SQL statements to populate various Access tables. For n fields, when each belongs to a simple two-column table of index and string value, that’s n INSERTs before we get to nested INSERT-SELECT:
INSERT INTO tblpp (id1, id2, str1)
SELECT id1, id2, 'finished' FROM tbl1, tbl2
WHERE id1 IN (SELECT id1 FROM tbl1 WHERE s = 'paramore') AND
id2 IN (SELECT id2 FROM tbl2 WHERE s = #1995/01/05#)
Reading in the CSV file, I use “Exit Do” to only try out the first line. If the schema is correct, RecordsAffected changes from 1 to 0 per new insert. If there’s an error, sample code is available in Access Help for handling errors via “Err_Execute” label.
Since CSV is plaintext, we could chunk the file and spawn multiple procs. Two concurrent tasks would halve the import time, JET engine performance notwithstanding. We’re chasing after the Grail: queries to give us fast results, a challenge to the supremacy of VLOOKUP.
I could not figure out vagrant and Homestead with Windows, and I did not want to deal with cygwin + PHP on the same. So I grabbed Ubuntu 14.04 ISO and VirtualBox. Here are the subsequent steps:
ettis@vbox-ubuntu: sudo apt-get install php5
ettis@vbox-ubuntu: curl -sS https://getcomposer.org/installer | php
ettis@vbox-ubuntu: mv composer.phar /usr/local/bin/composer
ettis@vbox-ubuntu: vi ~/.profile
Add the following entry at the end of the file:
ettis@vbox-ubuntu: composer create-project laravel/laravel \
ettis@vbox-ubuntu: cd learning-laravel-5
ettis@vbox-ubuntu: php -S localhost:8888 -t public
Continue with Laravel learning videos: https://laracasts.com/series/laravel-5-fundamentals
Proceeding with the videos, you may also need the following:
ettis@vbox-ubuntu: sudo apt-get install php5-sqlite
Another person is going, and detritus follows: the scattered remnants of projects, source files, and documentation tucked in a different mental tree. At least it is under source control. My handover to ensure his job security was a brief, unfruitful one: I’m now owner of those former projects and a handful of new.
Anyone who works long enough somewhere will have a maintenance debt; being “ten times” more productive means you reach that point sooner. And by that I mean the moment a completed project needs to be improved, you’re blocked until it’s resolved. There goes your metric.
Somehow, I will have to learn to be content. I am satisfied in some sense. Life is a collection of pleasant and unpleasant experience. My idealized sense of programming was watercolor on a glowing canvas. I am no closer to that than years before, because truth called for text and streams.
PivotTable is excellent at finding unique values. You can group multiple values together and look for counts: finding 1s straight down, you can use those fields as a possible unique, multi-valued index. Raw data coming out of a database in spreadsheet form will have many repeating values, and a couple pivots might be all you need to get started.
Powerpoint can be useful too. Pulling out repeating values into a separate table and changing the original field from Text to AutoNumber can get repetitive. In Powerpoint, it’s as simple as changing values in a text box. Bold, italic, and underline can be used to mark primary and foreign keys. Color phrases in red like “Only pull raw data from current week.”
Eventually, you’ll need to insert all this data into the tables proper. In Excel, a module of utility subroutines for writing out CSV-formatted strings. In Access, the usual routines to read the file and INSERT statements. As well, testing the correct SQL statements to use with the SQL editor.
It feels like the open-source community is only now catching on to Windows. We see indie spreadsheets and hidden developers. There’s a shift from ideology to commercial swiftness. Everyone is enthused and alien.