Theo's Site

Writing about technology, self-hosting, and things I find interesting.

Tag: kimi

Experience Using Opencode on the Latest Models

Published on

I’ve been experimenting more with the latest LLM models for coding, and it’s pretty impressive how far things have come. These tools are genuinely impressive.

I’ve mostly been using the Kimi 2.5 model with Opencode as the coding agent, and I still find that combination pretty great.

I think the whole vibe coding / AI-assisted programming workflow that Opencode and similar tools encourage might not be the best for producing high-quality code, but it’s pretty addictive seeing that kind of rapid progress.

Until you get to the very highest (and expensive) tier of Claude/Anthropic and OpenAI models, Kimi performs basically on par with — or better than — what the biggest companies offer.

These coding agents can also take care of a lot of the boring drudgework of programming. They are good enough right now that I don’t have to spend too much time manually intervening and fixing what the LLM did. These tools are getting pretty accurate.

I’ve spent many hours working on a project, tweaking it back and forth, and the main thing stopping me from spending even more time is the fact that I have to pay for the credits to run inference for the models.

Once you spend the time working through all of the quirks, these tools have gotten pretty smooth as far as workflow goes. It’s genuinely fun to do this with the recent LLM models that have come out.

Cost right now is the only real problem — these things will burn through tokens by the millions.

It’s pretty clear that the $20/month coding agent tiers from OpenAI and others (even though the limits are being tightened) are being subsidized pretty aggressively. When you compare the amount of time you get on a coding agent from such a plan versus what open-source alternatives cost, OpenAI and similar companies can’t really be making money from the coding agent offerings.

On the other hand, it’s probably cheaper to use a self-hosted frontend (OpenWebUI, etc.) with a hosted inference API than it is to pay for a paid tier of ChatGPT.

I’ve also noticed that Opencode and other open-source agents/frontends are very sensitive to token output performance. Using a somewhat more expensive inference provider that delivers fast performance improves the experience quite a bit.

Switching API providers basically fixed some of the issues I was having with the model freezing and similar problems.

The project I’ve been working on as part of my testing is this:

https://git.selfhosted.onl/theo/marginleaf

It’s a personal blogging CMS.

It can do the typical blogging engine things, but instead of a frontend editing interface, I created an API and built some tools that allow me to fully manage it from Open WebUI, which opens some pretty neat possibilities.

It feels like a somewhat interesting possibility that some of these chat tools could become good enough to serve as the main interface for an application instead of a more traditional web UI.

In particular, the Open WebUI tools can be found here:

https://git.selfhosted.onl/theo/marginleaf/src/branch/main/openwebui_tools

But mostly I created it because it’s fun to work on this type of thing.

Kimi 2.5 and Self-Hosting Open WebUI

Published on

Been poking around with the Kimi 2.5 LLM and also started self-hosting Open WebUI on my server (a self-hosted ChatGPT-style web frontend for LLM APIs).

Kimi probably isn’t the best model on the market, but Kimi 2.5 is the first time I’ve used a truly open-source model that feels vaguely in the same category of performance as ChatGPT and similar systems. I don’t really feel much of a penalty using it versus ChatGPT.

Of course, running it directly is way beyond what any device I have can do reasonably well.

But there are already API providers offering it with very favorable privacy and data retention policies, so I’m probably going to switch to using it over ChatGPT.

I wouldn’t recommend using the chat/API offered by the model’s creator — I don’t really trust that company.

If I self-host the frontend, all of the actually sensitive data like chat logs, etc. are stored on my server.

Open WebUI is pretty cool. It works almost as well as ChatGPT does.

I’ve run into some issues with the model occasionally freezing during processing, but I’ve occasionally seen that type of thing with other LLM providers too.

It has a search integration that works with the model so it can do web searches, etc. It’s pretty customizable.

I also quickly created a custom tool that the model can use which queries the OpenAlex API to find open-access academic articles.

The code for that can be found here:

https://git.selfhosted.onl/theo/openwebui-tools-skills/src/branch/main