Here Lately

Hello World, again...

Hello everyone, thanks for stopping by. This is the first post using Pelican, a static site generator written in Python. I've tried various others, but decided I'd give Pelican a try. I'm doing a lot more python development these days, so staying in the same language is always nice. I did not keep the 3 whole blog posts I had here before as they were about a project that is also no longer here.

I debated keeping any of the old projects on here, but in the name of clean slates I just started fresh. If you ever visited this site before and happened to use any of the web based tools I had here feel free to reach out to me if you want the code so you can continue to use it. They all were single page vanilla web apps so its easy to host yourself. All of them have better alternatives, mine were mainly for me, hosted somewhere anyone could access and I never track so Im not sure if they were used beyond myself. Check the Links for alternatives if you did. I did keep the Directory, renamed to Links, which was really the only thing of value from the old site. I added more links as well, I've been sitting on a lot of tabs...

I'm focusing this site more on writing, it's something I've wanted to do for a while now. My current focus on programming, AI, philosophy, and the intersection of these fields gives me plenty of things that it often helps just to write about or write down. I've always been curious about a lot of things related to these fields, and now is a good time to seriously start talking about them.

I've been using AI since pretty early on, I was a hard skeptic in the beginning, but slowly we came to terms. As I learned more about it, experimented with it, built some pretty cool things with it and as a technology it progressed, I realized I was no longer a skeptic, I was kinda like Tom in this video

I've been developing my own agent framework T.O.M., it's a reference to an old CS joke and total coincidence to the video. I'd wager it "knows" more than me and it can do some unexpected things. It’s Apple MLX based, using FastAPI, Qwen 3 and function calling. I use Qwen 3 4B Thinking 2507, 1.7B if I'm testing functionality and need faster inference.

I have a background in engineering, I was a full stack engineer for a decade for CBS Interactive/Paramount focused on web development. The transition to building a framework that could be used to interact with a LLM was surprisingly easy. It's very similar to work I did before, APIs, managing state, parsing input/output, request/response. It's really a lot of fun and I highly recommend other developers to start experimenting with agent frameworks, LangChain, Haystack, LlamaIndex, etc. along with Claude Code and similar tools.

I am using Tom for research into topics around AI and programming in general. Currently I am working on and experimenting with various RAG methodologies and retrieval and context management. It's like a personal mini Claude Code or similar. I still use frontier models, but I am using my own for focused study because I keep everything local, except for web data fetches which I use very sparingly, and it's a great learning experience.

Giving a LLM a "fetch web page" tool that requires just a URL has been pretty interesting. The tool call for the LLM just requires a url, when the LLM makes the call the API catches the tool call for execution and then its just a function that uses Beautiful Soup to cut out bits the LLM doesn't need from the fetched html and reshape the context with the info for the LLM. It works fine for 'Hey Tom fetch this url...' but it gets interesting when Tom is working on a deeper problem. I have seen Tom work on something, think he needs more information to achieve the goal, and fetch web pages for information on the problem. The urls are pulled from inference, training data I'm sure contains tons of urls, as Tom knows arxiv and quickly figured out a way to search it and github for queries with no further prompting from me, it knows it has a tool that can fetch web pages, and it needs info to solve a problem.

I've been really bad about keeping logs, as some of what they can do given tools is interesting to watch, and like the searching for data on the web or other things it might be interesting to others. I'll start piping output to logs so I can share relevant bits.

The orchestration layer, keeping a LLM going smoothly behind the scenes is interesting. It really helped me personally understand more about what the big players, OpenAI, Anthrophic, and others are doing after developing one from the ground (LLM) up. The recent router addition to ChatGPT 5, routing to different models based on task, is something I understood once people started discussing it because I had read papers like MemGPT and others. In my own framework I quickly realized that, a) model size affects time to output, which led me to debate routing "trivial" conversation to quick, non-thinking model and non-trivial go to a thinking model. But who/what decides "trivial"? On Bluesky someone mentioned that pro users could switch models via dropdown, so they (OpenAI) may in essence have had data on users switching models and an estimated guess why. I know I have gotten the "do you prefer this or that response" which could be testing models capacity as well. It's a very interesting problem.

That about wraps this up I guess. I'll try to write more now, especially now that I took the time this morning to go ahead and redo this site. As always expect changes here.