Holiday Detour

My recenty journey into local ML continues, but I only have one interesting post to share: Mapping the semantic void: Strange goings-on in GPT embedding spaces. I’m not really into the math enough to say if this is profound or nothing at all, but I found the inquiry thought provoking.

Let’s walk in a different direction for the rest of this post…

st

I recently signed up for Kagi Search, a paid search engine. I won’t try to convince you that search is broken. When you get there, come back and check this out. But wait, what about privacy?

kagi

Searches are anonymous and private to you. Kagi does not log and associate searches with an account.

I basically assume that is a lie, but it doesn’t matter in practice. Search today requires trusting some third-party, and I’d rather the incentives be aligned such that I’m paying them directly for that service. Kagi has to continue to produce high quality search results, and not lose my trust, in order to keep me as a subscriber. We’ll see how this goes.

PagedOut

Next, a random magazine crossed my path, Paged Out! #3. I have to admit this was the first I heard of this particular zine, but for me it delivered in a few key ways. First, it captures that mystique. It looks cool, even though it is trying too hard. Second, the content seems solid. I didn’t read cover to cover, but the topics were interesting and sufficiently technical. It might be a bit too 2600, but that might also be why it works for me. Subscribed!

Which reminds me, I recently set up miniflux, a self-hosted feed reader. I’m finding it a bit difficult to get back into this style of consuming content. But as a former heavy user of Google Reader, I look forward to trying this again.

slash2

For our last stop we are once again going back in time, to around 2008. Yeah that’s just a random image I found in the same folder. The file we’re looking for is schema.sql. What’s special about this file? It’s the only part I’m trying to salvage from an old personal project. One of my current day personal projects involves scraping PDFs and producing structured data. And that data happens to fit that same old schema I designed so many years ago.

I used PostgREST to turn this database into an API. I had to brush up my postgres knowledge of schemas and permissions to get more comfortable with this. But once I did, it was pretty much as easy as deploying the docker container. Another big selling point was that it would generate an OpenAPI v2 (fka swagger) definition. With the swagger definition it was trivial to also deploy Swagger UI. Now I have a crude but useful web interface to explore. Finally, I was able to use go-swagger to generate Go client code.

Putting all this together, I was able to adapt my PDF parsing command-line tool to import the structured data into the schema I designed some sixteen year ago. That’s pretty cool for an afternoon exploration.