Theo's Site

Writing about technology, self-hosting, and things I find interesting.

Posts

Switching Away from Apple

Published on

Old post (3 years old) - may be outdated

I am talking while going through my feed on Tumblr. I am going to talk about interesting posts as I see them. And then I am going to feed this recording into transcription software.

https://northshorewave.tumblr.com/post/697583950171865088/whats-the-issue-with-your-macbook-as-far-as-i

The first interesting post that I see is North Shore Wave talking about switching away from macOS. I recently actually switched away from macOS myself. A lot of the reasons why I have been switching away from Apple products is that Apple's business practices have become a lot worse. So last year I decided to just start doing a switch over to alternatives. I bought a proper workstation desktop and put Linux on it. I sold my Macbook and I switched to basically using Linux and Windows as my two primary OSs.

I think the biggest issues I ran into with the transition from macOS are just device incompatibilities. I have a few specialized devices like a sound recording DAC and a few things like that that are kind of paired to either the Macbook hardware like the Thunderbolt port that isn't like really common on PCs or is also dependent on some of the macOS software and doesn't work well on Linux and or Windows. Then I've also ran into the issue where at some point my workflow kind of depends on proprietary software a bit. I eventually will run into something where there's not a good open source alternative or the open source alternative is really different and I have to kind of experiment around to find the proper alternative.

For me that came up a lot when it comes to photography stuff because like my workflow got like built around Photoshop and Lightroom and the thing with Photoshop and Lightroom is there is not really a single program that does like everything that Lightroom does. And for Photoshop there are open source alternatives like GIMP but it's kind of just not the same in terms of how good the UI is and how good just the user experience and functionality is.

For Lightroom the issue is basically that like Lightroom does a lot of stuff. Lightroom kind of does digital asset management and backup. You put your photos into it and it backs it up to remote storage which kind of sort of has privacy implications since Adobe's remote storage but a lot of my hobby photography stuff isn't that important from a privacy perspective. So it automatically makes backups. It automatically is able to sync between all devices so if you're working on a tablet you can sync to that or if I take a bunch of photos I'm on a computer that I don't have Photoshop installed or I don't want to have Photoshop installed I can go into the web app and just upload things from there.

Fortunately due to the pervasiveness of Chrome OS Adobe is starting to have an actually really good web app so it's possible to just use Lightroom as I normally did now. That a little while ago didn't used to be the case and still with Photoshop the actual web application version of it is just garbage. It's totally terrible.

So like for switching away from Mac OS it's a process that took a while and like I think I'm finally getting rid of the last Apple device that I have like that I use on a regular basis. I recently bought an Android phone and I moved my phone plan over to the Android phone. It's a Google Pixel. It's a small Google Pixel not the full one. I kind of still have my iPhone because I'm kind of just moving data between the two but pretty soon I'll get rid of the iPhone and just only have Android. That will be the last big Apple device.

I've kind of already switched away from the iPad. I've switched away from the Macbook and like yeah I don't rely on Apple services as much. Like it kind of does feel a bit weird switching to Google since Google is also a big company that does a lot of things wrong. The thing with Google is where they get really bad is privacy and like that feels like kind of a lost cause. What Apple's doing that's kind of new in its badness compared to whatever other tech companies do is the extent to which Apple doesn't let you treat your device as your device and like tries to block what you can do with it.

Like when Apple pressured Tumblr into blocking certain content on their site and just didn't do it. And because their App Store just refused to accept the Tumblr app. That's novel. The fact that Apple uses the fact that your device is locked to them and like you can't sideload apps. That's been an Apple thing for a while but what's new and pernicious is that Apple is using that to really control what users can do with their device and kind of just restricting. It's a threat to software freedom that's new. Like with a classic proprietary OS like Windows like you can put whatever software you want on it.

Apple not only prevents you from putting your own software on it but is now using that power to kind of just dictate what you can do with your device and like what activities Apple finds acceptable. And that's really bad. And it's not saying I want to spread throughout the software world. I'm scrolling over to the next post now and seeing if I can find any posts that look interesting.

Voice Typing Wrapper Around Whisper

Published on

Old post (3 years old) - may be outdated

I just wrote a voice typing wrapper around Whisper. It types what I say as keyboard input, and it creates a system tray icon to turn on and turn off the dictation.

https://github.com/theopjones/voice-typing

(I just created it, it might have bugs, only tested on Linux)

I'm not sure how much additional time I want to invest in this little project. Because I'm not an expert in this type of technology or AI in general. I'm not sure if I'd do a super good job at implementing it further.

I think right now I have something that's a very interesting proof of concept. But while testing it, I have encountered a few bugs and little glitches. And I definitely don't get the same exact level of accuracy while voice typing with this tool that I'd get just pre-recording my voice and feeding it in all at once. But it is IMHO a more convenient way to write documents than recording a big audio file all at once.

Internally what this does is it breaks up the audio into small little snippets and parses each one of those snippets automatically. This doesn't do wonders for interacting with the underlying model because it's not consistent with the assumptions being made in Whisper.

The underlying Whisper model and its ability to parse grammar and everything kind of assumes that it is dealing with really long blocks of audio that it can go through all at once.

When I dictate a lot to it at once it kind of jumbles up the grammar/punctuation. This is pretty easy to correct for, but it does look a bit weird, at least until I get done correcting it. I'm actually writing this right now with it.

The method I am using to feed a constant stream of audio seems kind of like a Jerry-rig, instead of an actual solution to the underlying problem.

In its current state, it comes pretty close to meeting my immediate need for a dictation program/voice typing program. I mostly just want some reasonably accurate way to write short to medium sized documents.

The Whisper Speech to Text Library Appears Really Powerful

Published on

Old post (3 years old) - may be outdated

There's a new speech-to-text program/library that just got released by OpenAI as open source called Whisper and it’s impressed me quite a bit so far. It’s really powerful and it competes pretty well with the incumbent major speech-to-text tools in terms of accuracy. and it's impressed me quite a bit so far. It's really powerful and it competes pretty well with the incumbent major speech-to-text tools in terms of accuracy.

The caveat being that its not a full featured tool. Currently all it does is is convert an audio file to text. It's a command line tool so far. It doesn't have anything more sophisticated like simulated keyboard input or training or any of the types of things you'd expect from a well-established desktop speech-to-text program like Dragon Naturally Speaking or something. It's more intended as a research model than anything, but the results I've gotten out of it are spectacular. It is not quite a hundred percent perfect, but the error rate is impressively small. The accuracy is better than even mature speaker dependent systems like Dragon. It has a very strong model of grammar and gets things that are really difficult for most speech to text programs like capitalization, or prepositions or small words.

It gets a lot of technical/specialized terms right, this is something most other speech to text systems I've used have a lot of difficulty with. It has the same accuracy to expect from a speaker dependent program that's been trained a while on your voice even though it's a speaker independent program that just works off of a generic model of speech.

As part of my testing, I read a few of my older blog posts to it. The audio clips + the generated text can be seen here..

It seems to work well with a wide range of microphones. I tried my SM7B (a standard broadcast dynamic microphone), but I also tried more exotic microphones. One of these is a a stenomask. A stenomask is a microphone that goes right up against your mouth and you speak in a really really soft voice into it, so it gives you privacy and people nearby can't hear what your are saying. These microphones are very frequently used with speech recognition, but because stenomasks muffle the sound of the speaker's voice, a lot of speech recognition programs have trouble with stenomasks, and when you use one versus a regular microphone accuracy tends to go downhill. I tried the stenomask on whisper and the same pattern of declined accuracy occurred but the accuracy was still pretty impressive and quite usable.

There are of course some limitations. I'd say it's only kind of sort of open source. You can download the tool to convert audio into text and you can download a pre-built model for it but software to actually generate that model from audio hasn't been released yet. Additionally, the model is based on a lot of not open source licensed data so you can't just regenerate the exact same model from public data sources even if you had the model generation code. So, I would say it's not fully open source although it's still a lot more open than basically any common and wide widely used speech to text program

Its also just one part of the puzzle of a fully featured speech to text system. To be a full competitor to other tools there would have to be a whole ecosystem of software using this model, and not just what we have now – a way to convert an audio recording to text. This includes integrations to other software, and integrations with the operating system. Of course, none of that exists for this particular speech to text model. But it appears that the broader open source community is working on ways to make use of this tool. There is, for instance, a repo on Github for a program that can take in live microphone input and run it through speech to text in real time. for a program that can take in live microphone input and run it through speech to text in real time.

I have been long interested in speech-to-text systems because I have a handwriting disability that makes it hard for me to quickly type and write normally, hopefully this progress means that some of the big incumbent sellers of speech to text software will have competition from the open source community.

Home Servers, Tunneling, etc

Published on

Old post (3 years old) - may be outdated

As a follow-up to my post earlier this week, I'll discuss some other interesting things about setting up a home server.

Unfortunately, the technology here is a little bit opaque, and I'm not really aware of any good documentation that exists on how to set up servers that is newbie friendly. Most of the writing here doesn't really start from first principles, and a lot of what you'll find is aimed at super knowledgeable people, or people like IT systems administrators.

There's a lot of stuff on Internet forums and on Reddit and on various peoples blogs. And when I figure this stuff out I do a lot of Googling and visiting Reddit threads, and visiting Stack Overflow threads.

I've thought about writing a bit more about how the technology works and how to set up this type of server. But this is not something i've done yet

Setting up HTTPS has become a lot easier than it used to be. Caddy, which I use for the reverse proxy in my server basically handles SSL without me having to do much. There's also a helper for NGINX which deals with a lot of the setting up the reverse proxy and setting up SSL.

The existence of Let's Encrypt has basically eliminated the need to buy SSL certificates from designated certificate authorities, and it's what the tools I mentioned above are built on top of.

The security situation is kind of a mixed bag, there are some tools I ran into that have super insecure default configurations, fortunately the security of the most common software programs has improved a lot compared to where it used to be. Most of the big tools that you'll run into like Web servers and so on are pretty much secure by default, you would have to actively change the configuration in undesirable ways to make it insecure.

And I think container programs like Docker and so on also help a lot with security, basically every application I have running on my server has its own docker container. The Caddy reverse proxy works as the glue between these containers. 

Docker is a way of packaging software programs with needed libraries and dependencies, it functions in a very VM like way – there is a high level of isolation between the different containers by default. This isolates security issues, if one of the services running on the server gets owned it's hard for the hacker to privilege escalate to the rest of the server, so it's possible to just deal with the security issue by just nuking that one container and starting fresh.

Additionally, since there are a lot of docker images that are packaged either by the developers of the software or by someone else upstream, it's pretty easy to find a docker container where everything's packaged into a pretty secure by default container.

For backups, I use the Duplicati tool, set to make daily backups of the server. It's possible to back up to a portable hard drive, or to another server with Duplicati on it that's off-site. I haven't taken any of these purist paths, and I have taken the more non-self hosted route of uploading my data to a cloud storage provider (in this case Wasabi).

Duplicati is capable of encrypting the backups before they go to the cloud storage provider, or friend's server, or whatever else you're using for your remote backup.

There are two ways to connect the server to the outside world.

The traditional way, what I used, is to get a static IP address from your ISP. AT&T, who I use for my Internet, sells static IP addresses in a /29 block, that is six usable IP addresses, unfortunately, they won't give you just one static IP address. Additionally, I still have access to one dynamic IP address from them.

My router/gateway/modem gets assigned one of the static IP addresses, the home server gets assigned another, basically every other device on my network gets put behind the dynamic IP address.

Even for standard home dynamic IP addresses, IP address geolocation, at least what you can do from publicly available data sources, isn't super accurate, the main thing you'll get with almost perfect accuracy is country and which ISP you're using.

If it's a standard dynamic IP address, it will probably be able to take a really good guess at what city the user of that IP addresses is in, and a pretty inaccurate guess at what neighborhood they are in. Short of getting the ISPs logs of IP address allocations or customer records, you'll never be able to map this IP address one to one with a physical geographic location.

For the static IP addresses, as far as I can tell AT&T (and my guess would be most other major ISPs) allocate all of their static IP addresses from one big pool of IP addresses without giving much consideration to geographic area. I haven't seen any of the IP address geolocation services accurately guess anything else other than what country I am in.

I don't consider the privacy implications of a static IP address that huge. Probably the main risk of using a static IP address is that it is known to the public and the network it's on becomes susceptible to denial of service attacks.

The barrier of entry to conduct a fairly crippling denial of service attack on a small server or network is pretty low. Taking down your typical home server on a fiber Internet connection is definitely something the typical unskilled script kiddie can do.

The more newfangled way of connecting your server to the internet is to use a tunneling service.

Ngrok and PageKite are two pretty good examples of these types of services. Your server opens a connection to the tunneling service, and the tunneling service assigns an IP address to your traffic (or subdomain that can be attached to a domain name as a CNAME record).

All of these tunneling services hide the server's IP address from the open Internet.They also have the added security benefit of adding another step between spinning a service up and exposing it to the open Internet, making it harder to accidentally expose a service that shouldn't be attached to the public Internet.

The one I've done the most experimentation with has been Cloudflare Tunnel. The biggest problem with this service is that it kind of adds another ISP-like intermediary between your server and the user. This is a step back in terms of avoiding over dependence on centralized services, but since the data itself lives on a server you control, it's still an improvement over your standard content silos or proprietary services. I didn't use this type of service in my original post for this exact reason.

Cloudflare goes a bit further than many of the other tunneling services terms of the amount of integration with your site – and not only routes the data, but also takes over the SSL certificate and does a lot of filtering and analysis on the traffic. This does provide a lot of useful security features, but it does mean that Cloudflare does have access to all the traffic going in and out of your server, and it can view SSLed traffic in unencrypted form.

Cloudflare tunnel is probably the option I'd recommend to people who don't have a super in depth technical knowledge.

Concerns about this one company having control over an increasing amount on the Internet aside, it's a very powerful service with a very powerful free tier. Other than the core tunneling service, it handles a lot of things.

It handles reverse proxying, i.e. it can do what I use the Caddy web server for, that is acting as the glue between the various services running on your server.

It can put private services that aren't supposed to be accessible to the whole world behind an authentication portal, this can act as a source of two factor authentication for self hosted web apps that don't necessarily support two factor authentication natively. It can provide a wrapper around SSH allowing external access through a web app, but with additional authentication.

It provides a lot of security features. It can provide DDOS protection.The DDOS protection basically eliminates the threat posed by script kiddy style DDOS attacks, and if the rest of your server is configured correctly it can mitigate even very powerful denial of service attacks. It can rate limit bots accessing your site, which reduces some security threats. Particularly with the paid tier, it can provide a web application firewall which provides some attempts to block known exploits from being used on your site.

It also provides a content delivery network, meaning that Cloudflare's servers store and send out frequently accessed static content instead of sending a request to your server every time. This basically eliminates the type of scalability issues that I spent most of my original post talking about.

Self-Hosting a Text Heavy Website is a Solved Problem (Even on a Home Server)

Published on

Old post (3 years old) - may be outdated

A while ago I got a mini PC and turned it into a home Web server. This turns out to be a remarkably effective way to host a website. The blog you're reading right now is hosted on on this mini PC. And this home server is connected to my standard home Internet. And it's not just my website, I host a lot of other services in order to improve the privacy of my data.

It's a good alternative to the increasing centralization of the Internet.

I decided to do some testing to figure out how much traffic my set up can handle. Thereby confirming if a small and cheap mini PC connected to a home Internet connection is enough to host someone's whole personal Internet presence.

The Server

The server I am using is a fairly inexpensive Beelink mini PC. It has 8 GB of RAM and a 256 GB mSata SSD, the exact model I bought doesn't seem to be for sale right now, but a roughly equivalent device from the same manufacturer seems to be going for about $170 on Amazon right now.

I feel like this is a good example of the performance range to expect from the type of device that someone would try building a home server on. It's a fairly attainable level of computing power to just set aside for this application, particularly once you consider the expense that comes with things like cloud services or web hosting, or the indirect costs that comes with putting your data where it can be harvested or sold to advertisers.

The Software My home server runs Debian Linux with the Caddy web server. Most of the other services running on that server run under Docker containers. Almost everything on the server is freely downloadable open source software.

The Internet Connection My Internet connection is a 1 Gbps (Symmetrical up/down) fiber connection. I have also bought a block of static IP addresses, however this isn't strictly necessary for hosting a Web server, there are many tunneling services that will provide your server a good way to receive connections from the outside Internet. One such service I've experimented with in the past is Cloudflare Tunnel.

Despite the past experiments, my set up is not behind any proxying services or CDNs. It's a direct connection from the users to the server.

The main reasons to get a static IP address block are if you want more flexibility, ability to host services that require ports other than standard HTTP or HTTPS, or if you want an alternative to centralized services that would otherwise have to be used.

The Website

Right now, the website you're reading is built with the Hugo static site generator. This creates a fairly lightweight website, lighter weight than say a WordPress blog, although in the past I've successfully hosted a WordPress website on the same server.

While I haven't done the same level of stress testing that I've done with the Hugo site, I feel that WordPress is definitely usable for a personal website on this server.

How I Tested the Maximum Load the Server Can Take

I used two services to load test the server. The first is Loadforge, the second is Loadster. These are both paid commercial services to test how much traffic a website can take.

I configured these services to test the usage pattern where first the user opens a post on the blog, and then clicks on the homepage of the blog and loads it.

I picked this usage pattern because I think it describes vaguely what a user would do in the case of what I think is probably the highest level of usage that a normal person would encounter on their personal blog – a post going viral and suddenly getting a large influx of traffic driven by external websites. Something like the famed "Reddit hug".

Results

As tested by both services, my home server has the ability to handle about 300 HTTPS requests per second. If the load goes much beyond that the rate of errors getting returned by the server increases dramatically, and response times to queries slow down drastically.

The limiting factor is pretty clearly the servers ability to handle that many simultaneous requests. Internet bandwidth didn't seem to matter much, during the testing the bandwidth usage didn't exceed 50 megabits per second. Meaning that while I have a fairly high end Internet connection, there's a lot of leeway and that most people who want to host their own blog on a home server could do pretty well if they use a slower Internet connection.

Based on watching the performance of unrelated tasks on a different computer on the network, I also don't feel that the performance of the router or the modem was a bottleneck either. However I haven't really been able to determine what the bottleneck is. RAM usage and CPU usage didn't really seem to hit the limits of the server.

I don't really have the resources to test the exact parameters and limits more, since the server load testing services I have found to be reliable are quite expensive to run. I don't really have the budget to throw more resources towards this experimentation than I already have.

But this is enough for basically any plausible use. It is enough to have a website that can withstand getting posted on the front page of Reddit (according to one source I found, the 99th percentile level of load that comes from being posted on the front page of Reddit is about 25 users a second. For that particular website, each user makes about 15 requests to the server. Meaning that the highest level of load that that particular website was put under was something around 360 requests per second., the 99th percentile level of load that comes from being posted on the front page of Reddit is about 25 users a second. For that particular website, each user makes about 15 requests to the server. Meaning that the highest level of load that that particular website was put under was something around 360 requests per second.

That is still a little bit over what my server was able to be benchmarked at during the load testing.

However, based on my tweaking and experimentation, a well optimized blog can probably get substantially below that, as long as most users don't dig deep into the archive. For example, a page load on my site only causes three requests to be made to the site. Additionally, tools like CDNs would substantially improve performance also.

So, my conclusion is that, yes, a self hosted blog that is well optimized can be hosted on a standard home Internet connection using a cheap computer as a server.

Hosting text heavy content in a decentralized way, is therefore basically a solved problem. The computing power and Internet connectivity available to the typical person means that anyone can self host a website without needing to rent server space, use a content silo, or pay to have someone else host it.

However, once you include a lot of rich multimedia the bandwidth requirements start to skyrocket pretty quickly, and depending on how the website is structured, there could be a lot more requests to the HTTP server. I think recent advances in decentralized Internet technology might come to play with higher bandwidth content. Sharing large files effectively and in a distributed way seems to be to wheelhouse of technologies like IPFS, while the task that is easily handled by standard HTTP, that is, hosting lots of small text files is the Achilles' heel of IPFS and similar. I feel that there is a good potential for a mixed solution combining both traditional technologies and some of these newer technologies.

Quadratic Voting Does Not Scale

Published on

Old post (4 years old) - may be outdated

Quadratic voting is a potential voting method that has gotten a fair amount of discussion in various places, one of the most notable presentations on this is in Radical Markets. While the game theoretic justification for this voting method is sound under optimal conditions, with low information/transactional costs, and perfectly rational actors, I believe that there are flaws in this idea that make it unusable in most real world circumstances where it is being proposed. It is a system that is perfect on paper, but unsuited to the real world.. While the game theoretic justification for this voting method is sound under optimal conditions, with low information/transactional costs, and perfectly rational actors, I believe that there are flaws in this idea that make it unusable in most real world circumstances where it is being proposed. It is a system that is perfect on paper, but unsuited to the real world.

A flaw of many real world voting systems is that there is not a good way to allow voters to provide information about the relative importance of issues. This means that people who only have a weak preference on an issue will be in effect over represented in political outcomes on that issue. Quadratic voting is a proposal to fix this issue.

In a QV ballot a voter has a number of points that they can allocate across issues. Allocating more points to an issue makes the vote on that issue weighted more. The value of each point declines as you add more points to an issue. Accordingly, there is an incentive to split your points across multiple issues.

I won't go into too many details about the justification and game theory here because its been covered by other sources quite a bit. I would assume that if you are reading this blog post, then you probably have some knowledge here. However, I have provided a bit of a summary at the end of this blog post — focusing on the quadratic funding variant because it is somewhat easier to build an intuition around.

What do I see as the problems with this proposal?

In summary, it runs into issues with very large elections, and breaks down when people don't act as Homo economicus purely rational self-interested actors.

For now, I will leave you with two examples of what I am talking about. – How to optimally allocate voting points

The optimal strategy in a QV system would be to allocate votes in proportion to the value of a vote, not the subjective importance of the outcome of that particular issue.

By value of a vote, I mean the probability that the voter will be the pivotal voter that decides the outcome of the election * the importance of the outcome.

This can result in the game theoretic optimal allocation of voting points being fairly counterintutaive in cases where both large and small elections are on the same ballot.

Lets imagine that there are two elections on the ballot 1) the election for the U.S President, where you have a one in a million chance of becoming the pivotal voter, and 2) the county dogcatcher where you have a one in 10 thousand chance of becoming the pivotal voter. Lets say you care about the outcome of the presidential election 10,000 more than the dogcatcher.

Because of the probability of being the pivotal voter in each case the value of a vote in the dogcatcher election is 100 times more. Therefore, the strategically optimal way to fill out your ballot is to allocate approximately 10 times as many points to the dogcatcher.

In a real world election using this method, most people may not be that extreme in how strategically they vote — and there will probably be wide variance in how people allocate their votes taking into account the pivotal voter probability. This mix of strategic and non strategic voting will eat at the efficiency benefits of the system. – High minimum threshold for issue importance

Consider the funding-matching version of quadratic voting (picking this variant because it is particularly easy to follow the math here). (picking this variant because it is particularly easy to follow the math here).

Lets imagine that there is a public good that 1 million people donate one cent to. In this case, if quadratic funding was used to allocate matching funds, $10 billion would be allocated — $10,000 per capita.

This scales upward with population, if there were five million donors donating one cent each — 250 billion would be allocated to the project — $50,000 per-capita.

And if someone for various reasons donates a larger sum, that will be magnified to an absurd degree — making any voting behavior other than being perfectly rational and self-interested a system breaker.

As you can see, quadratic funding/voting can only really be used for very huge issues and in fairly small communities given the practical limits of how people think — defeating one of its main features, allowing more day to day political participation in every-day political decisions without the limitations of direct democracy under more standard voting systems.

Additional notes / Overview of Game Theory Justification for QV

Here is a brief intuition for how QV works / where the theoretical justification comes from.

Imagine you are in a home owner's association that is considering building a swimming pool. You and the other 100 members each get $100 of value out of the pool. This means that you would be willing to pay up to $100 to have the pool built.

If contributions were completely voluntary, the marginal value of each dollar you contribute would be equally distributed across everyone in the association. Ie. if you contribute enough to completely build the pool, you would only get $100 of value out of that.

In effect, for each dollar of value that you create with your personal contribution, you only get one cent of personal value.

Therefore, if you were a perfectly rational and self interested actor you would only want to pay this if the pool's total cost was $100 or less. This applies even if the total value for everyone of the pool was much more than that.

But in a world where everyone is a perfectly rational actor, individual willingness to pay can be a very useful source of information on what people's preferences are.

Lets imagine that your HOA comes up with an idea — using fees to create a matching fund for projects submitted by members.

Is there in effect a way to translate the individual willingnes to contribute money into a estimate of the total value of the project?

Lets also say that everyone is behaving like a perfectly self-interested perfectly rational actor. Furthermore, lets say that theb good/service being funded is non-excludable — its not walled off to only the people who contribute.

For that swimming pool example, what ratio of matching donation would it take to make it rationally self interested for yourself to donate enough such that your donation + the matching funds = *the social value of the project).

For the example of a project where the benefits are split across 100 people, the correct matching is $99 dollars for each donated dollar.

The optimal maximum funding for a project where each person donates a dollar is actually, 100 * 100 or $10,000

You can by this logic derive a general formula here for the case where everyone's preferences are identical.

Optimal project funding is equal to the individual donation amount times the square of the population size.

Which is the core idea and why quadratic voting is quadratic.

For the matching funding game, you can derive a general rule for cases where indvidual preferences (and therefore donation ammounts) vary. The total funding = the square of the sum of the square roots of the donations.

Similar logic can be used for voting on funding projects with abstract points instead of matched monetary contributions (the aforementioned formula can be used to calculate the relative importance projects, and the funding avaliable can be allocated proportanate to this).

And finally, similar logic can be applied to voting on issues/canidates. A quadratic voting ballot would allow voters to allocate points to each issue — with the weight given to that issue being equal to the square root of the number of points allocated.

Thoughts on Mastodon and Avoiding Content Silos

Published on

Old post (4 years old) - may be outdated

I recently set up a Mastodon instance (username @theo@theopjones.com)

The 500 character text limit on Mastodon does seem a lot better than Twitter's shorter character limit.

500 characters amounts to about 100 words, which is in the good middle range between really short content, and the longer form content that is good on WordPress and similar full-featured blogging engines. This is a type of use case that Tumblr is really good at, as Tumblr is gradually becoming a dying site, this does mean that Mastodon may be a suitable replacement.

In my ideal world, most people on the internet would use open source software running on commodity infrastructure. I want a world in which where you decide to host your content fundamentally doesn't matter and there is competition in content hosting.

I want a world where people can just drop their current hosting provider without too much difficulty.

A world where identity isn't tied to a particular host. The internet does have a way to do decentralized identity – DNS. But most people don't have a domain name that is the home for their content.

The big thing that worries me about Mastodon from a structural perspective, is the fact that Mastodon simultaneously

is generally structured in a way that means the vast majority of users won't run their own instance or at least hire someone else to run an individual instance for them- has a primary mode of content moderation built around instance administrators blocking other instances.

This could easily replicate the situation with email where there are very much first-tier email hosts. With email, Google and Microsoft have by far the best email deliverability. It seems possible that there could be similarly dominant instances of Mastodon that eventually turn it into a de facto centralized service.

It is possible to come up with ways to design a decentralized platform that doesn't have this issue.

Spam, harassment and other similar ubiquitous problems in the social media ecosystem are fundamentally due to the fact that it's way too easy to get your content in front of someone who has not opted into interacting with you.

Sending someone a message on a Internet service is usually effectively cost free. And it is similarly easy to get on someone's activity feed.

If I were designing things, content filtering would be user based – and split in the three categories. – There are the people that the user has directly opted into seeing.- There are the people who are trusted by a user that the user trusts.- And there's the rest of the world.

The first two categories would be allowed to pass through fairly effortlessly.

The rest of the world, would have to do something costly, like a digital stamp or proof of work or something else that acts as a limit on excessive posting.

This is pretty far from Mastodon.

Additionally, one of the more fuzzy and hard to put exactly in to words issues with Mastodon that I have noticed is the nature of the people running things. I'm definitely not sure I have high faith in the judgment of the typical person who runs a Mastodon instance.

I'm just seeing a lot of people who revel in the idea of the tech industry throwing around its weight to reshape society.

A lot of the mindset I am seeing just feels like the same part of the tech industry that got social media into this mess. It feels entirely possible that a lot of power is in the hands of people who could be a lot more fickle and arbitrary then the people who run Twitter or Facebook.

And also a lot of people for whom their main objections to the way that Facebook and Twitter are ran is that these platforms are not exerting enough control over their users.

In a ideal world the judgement of where the lines of acceptable behavior in public discourse are would also be decentralized and democratized — instead of decisions being handed down from above.

From a bird's eye design and ideas perspective, I like what what I see in the Indieweb project (as mentioned before) https://indieweb.org/ The emphasis on personal domains instead of shared instances is good. There is also an emphasis on trying to build software that works on commonly available infrastructure like LAMP stack hosting. And also the idea behind the Vouch anti-spam protocol The emphasis on personal domains instead of shared instances is good. There is also an emphasis on trying to build software that works on commonly available infrastructure like LAMP stack hosting. And also the idea behind the Vouch anti-spam protocol https://indieweb.org/Vouch But all of the actually existing software here has been a pile of half-working kludges in my testing. While Mastodon actually works. But all of the actually existing software here has been a pile of half-working kludges in my testing. While Mastodon actually works.

The Costs and Benefits of Urban Development are Distributed Very Unequally

Published on

Old post (4 years old) - may be outdated

Urban development in big cities is very controversial, and there are politically powerful movements that oppose almost all new construction in big cities.

There is one explanation that in my opinion just doesn't hold water. This explanation focuses on incumbent property owners who want to increase the value of their property. According to this story, incumbent property owners will be for restrictions on new development in order to constrict the supply of available property, therefore driving up the price and the value of their property.

The reason why this explanation doesn't hold water is that the urban cores with the most opposition to development like San Francisco and New York City also have some of the lowest rates of property ownership and compared to the United States average some of the lowest percentages of people who live in owner occupied property versus renting their house. These are markets completely dominated by renters. And renters would have very little personal economic interest in driving up home prices. Some of the markets with more permissive regulations towards development also have some of the highest rates of owner-occupied housing in country. (footnote 1))

Additionally, opposition to development in these cities seems primarily driven by groups that claim to represent tenants, not groups that represent property owners. (footnote 2))

My explanation for where opposition to urban development comes from is based around the fact that the costs and benefits of development are distributed very unequally. Incumbent renters in these markets have very few personal gains from new development except under very long time frames and very large scales.

Externalities of Urban Development

Populist movements that oppose new construction make the claim that new development will cause gentrification which will drive out incumbent renters and displace current residents at the expense of wealthier newcomers. They state that new development will be a net negative for the people who currently live in neighborhoods where new construction occurs.

But is there actual evidence for this, or is this argument just pure economic illiteracy?

I think there is a fair amount of evidence for this.

A 2019 article discusses the economic implications of development. (footnote3) Increased density in urban areas has quite a few benefits, in part because this means a lot more people will be in one place and the activities that benefit from having a lot of people in one place will become much more efficient. There are also social benefits to having a lot of people in one place. Average wages will go up and people will find more job opportunities along with having an easier time searching for a job, there will be more innovation, cities will become less car dependent and public transit will be more effective, denser urban areas produce less environmental impact than sparsely populated areas.) Increased density in urban areas has quite a few benefits, in part because this means a lot more people will be in one place and the activities that benefit from having a lot of people in one place will become much more efficient. There are also social benefits to having a lot of people in one place. Average wages will go up and people will find more job opportunities along with having an easier time searching for a job, there will be more innovation, cities will become less car dependent and public transit will be more effective, denser urban areas produce less environmental impact than sparsely populated areas.

Property values, however, will be increased. Paradoxically, land in urban areas may become more expensive if there are more uses for this land. In fact, a lot of the economic benefits will manifest in property values. This is great if you own property in one of these neighborhoods, it's not so great if you rent.

The article concludes, "the effect on rent exceeds the effect on wages. In a spatial equilibrium framework … there may be a collateral net-cost to renters and first-time buyers if residents are not perfectly mobile and housing supply is inelastic."

New development can be a windfall for property owners even beyond what economic theory would predict because of the sausage making of urban planning and the operation of municipal governments. In practice, each new development is a trench fight between those opposing development and developers. The permits to develop your land can be immensely valuable. Much more valuable than the land itself in many cases. Giving a property owner these permits gives said developer a huge windfall. The property owner gets to in effect charge a monopolist's price on a new development because this developer is the only person to have the right permits.

There is also a dynamic where many of the impacts of development that are positive occur very far away from the new development, while the negative impacts of this new development occur close to home. These positive impacts are very diffuse and subtle, while the negative impacts are immediate for those affected.

A 2015 paper concluded that the benefits of new housing development are huge. (footnote 4 ) Wages would increase drastically on a national scale, and the economy would’ve grown 50% more than it had in the period between 1964 in 2009 if zoning regulations were more permissive. A handful of major metropolitan areas are the main sources of economically destructive restrictive housing policy. ) Wages would increase drastically on a national scale, and the economy would've grown 50% more than it had in the period between 1964 in 2009 if zoning regulations were more permissive. A handful of major metropolitan areas are the main sources of economically destructive restrictive housing policy.

Improving housing policy would create an economic boon on a national level but it is also true that the cost would be reflected on a local level, as the bulk of this new housing development would've occurred in a select few urban areas.

The economic benefits of new housing development are huge, and the United States' major metropolitan areas desperately need more housing and therefore more permissive regulations on the construction of said housing (or at least some other mechanism to actually get this housing built). But the opposition a new housing construction doesn't come out of the blue.

The distributional impacts of new housing construction cannot be ignored and are the primary source of a lot of opposition to new development. Housing development must be done in a way that makes sure that the typical American benefits, and this development should be done in a way that benefits renters and enables more Americans to attain homeownership.

In a way, the discussion of this issue in the economics press and in mainstream news articles, comes from the perspective of a subset of the American population. This is the subset of the American population that works in the fields of employment that are most likely to benefit from higher density, and the subsets of the American population are most likely to own property.

In a way, the economics literature and the elite opinion in general comes from the viewpoint of those most likely to benefit from new development given the rest of the current regulatory environment.- – – – – -1 According to the data here Los Angeles, New York City, the San Francisco Bay Area are the metropolitan areas with the lowest percentage of the population owning their house. The areas of the nation with the highest rates of homeownership, tend to be concentrated in the South and the Midwest. This page is archived at https://advisorsmith.com/data/states-and-cities-with-the-highest-homeownership-rates/ 2 See the paper 2 See the paper Resisting the Politics of Displacement in the San Francisco Bay Area: Anti-gentrification Activism in the Tech Boom 2.0 by Florian Opillard for a discussion of populist movements that oppose new construction. An internet archive version is at by Florian Opillard for a discussion of populist movements that oppose new construction. An internet archive version is at 3 3 The economic effects of density: A synthesis by Gabriel Ahlfeldt and Elisabetta Pietrostefani, . An internet archive link is at by Gabriel Ahlfeldt and Elisabetta Pietrostefani, . An internet archive link is at 4 Housing Constraints and Spatial Misallocation by Chang-Tai Hsieh & Enrico Moretti . An internet archive link can be found at 4 Housing Constraints and Spatial Misallocation by Chang-Tai Hsieh & Enrico Moretti . An internet archive link can be found at ######

Censorship Degrades Public Trust

Published on

Old post (4 years old) - may be outdated

I recently watched the two controversial Joe Rogan episodes. Frankly, what I heard on those episodes was that out of the norm for current political discourse. I wasn't in anything close to hundred percent agreement with those guests had to say, of course.

I would say the controversial guests did make good points. I would say the discussion was about 50% reasonable points (many of which haven't been discussed very much elsewhere) and about 50% crackpottery.

I think censorship is counterproductive. When a large portion of the population is sympathetic to what your opponents say, you can't censor your way to public consensus and maintain public trust.

And even if you could enforce the correct viewpoint on the public, censorship is fundamentally a Faustian bargain where society is creating extremely dangerous infrastructure of mass surveillance and social control. The type of centralized authority and centralized infrastructure that is required to force the censorship that we've seen recently on the Internet is intrinsically dangerous. It's an extreme act of hubris to think that if you give the right people that type of power, you'll get an utopia.

Another issue of widespread censorship is that suppressing discussion affects moderate speakers more than the extremists. Basically, when you're an extreme critic of policy you're going to get the ire of people no matter what you do. But when you're more moderate, people are going to tolerate you as long as you shut up on the parts where you disagree with the party line.

And I think that there is a dynamic where people hear cognizant points from these speakers and when the discussion has been suppressed, they often first hear those cognizant points from the controversial people.

This gives a lot of credibility to the somewhat eccentric crackpots even when they are full of shit. I think that's why a lot of people are interested in hearing these controversial podcasts, podcasts like Joe Rogan's podcast are one of the rare places that you can hear actual discussion of some of these issues, instead of parroting of a party line is in many ways incoherent, arbitrary and rapidly changing.

I don't think anyone thinks that it's the best possible source of commentary, I think a lot of people do think it's the only place where you won't hear commentary that's internally lockstep of everyone else.

And the whole idea of building public trust by censorship and basically telling people that they're not allowed to have opinions on policies that are affecting their lives, is self-effacing fundamentally policymakers taking this approach will necessarily offend a large portion of the population and they will degrade public trust even more. It creates an adversarial relationship between policymakers and the public. And it creates a world where policymakers are too used to barking orders at the population, instead of finding ways to build public trust. This also ignores the vast variety of perspectives that are actually relevant to coming up with the best policy response. Different Americans are affected by the policies in different ways, policy elites will have their own biases and interests that are different than those of the typical American. This means that tight control over policy discussions will shut out many perspectives in a way that goes far beyond enforcing scientific objectivity or truth.

I also think that as a whole US coronavirus policy has been highly corrupted by the fact that policymakers and media outlets who decided to treat China's response as the objectively ideal response, or at least a baseline for one response should look like, instead of the policy responses of Asian democracies like Taiwan or South Korea. The implication of this has been that US policymakers and media outlets have been acting like leaders from a communist dictatorship and using measures that would only reasonably be sustainable in an authoritarian state, instead of coming up with measures that would be reasonable for a pluralistic democracy.