Three bugs in the Go MySQL driver (2020)

72 points by kristianp 2 years ago

klabb3 2 years ago

Golang has an excellent net stack, but not perfect. In particular, it appears to suffer from the ever-recurring issue of not paying enough attention to teardown. Everything must return to 0, ideally gracefully, but even the best programmers like to gloss over the details, because it’s easier to just kill everything and call it a day.

Tcp is full duplex, but net.Conn (the main std API that everyone is using), doesn’t have a “CloseWrite” method. That bubbles down to all downstream uses, such as tls. Another example would be how deadlines and timeouts are handled, and how they (don’t) interact with context, which is the blessed way to do timeouts for all non-IO purposes..

Still, Go is one of the best languages for network programming imo, because it allows you to check most edge cases (like timeouts) without wasting like 4 threads to do so.

But that’s just like.. my opinion.

kevin_nisbet 2 years ago

Also, TCP is more like 2 uni-directional streams bolted together than a single full-duplex connection, which I've seen trip up lots of systems.
- sleepydog 2 years ago
  
  Your analogy is better directed towards the bsd sockets API that 99% of programs use, rather than TCP itself. I still feel that the API should lean even further into the "two streams" abstraction, as described here: http://cr.yp.to/tcpip/twofd.html
  
  klabb3 2 years ago
  
  It really is logically two streams indeed. In an alternative universe…
- GauntletWizard 2 years ago
  
  There's probably room for a socket option to automatically close connection on FIN; I'm surprised it isn't there already.
bhawks 2 years ago
net.Conn needs to cover non-duplex conns like UDP - so it's challenging to add halfclosable methods with reasonable semantics. You can always check for the presence of the CloseWrite method and cast in this case.
One trick to unify the context and io timeouts is roughly:
conn = ...// Wherever you get your file or con.
ctx, cxl = context.WithCancel(ctx)
defer cxl()
go func(){
```
  defer conn.Close()

  <-ctx.Done()
```
}()
// Do reads and writes knowing that if your context times out the fd will close and your io op will stop.
The only caveat is if you need to be aware of partial data read/written when a timeout occurs - then you need to tell the os about your timeouts :(.
- klabb3 2 years ago
  
  That’s correct! I do similar things EVERY time I deal with non-trivial operations on a conn, which is very tedious, and why I wish cancelation propagated down to IO. Although goroutines are cheap, something is wrong if you need in sequential code, imo. They’re also notoriously easy to leak.
  However, once you need idle timeouts or heartbeats you have to increase the complexity further. Setting deadlines before every op (which is recommended) is (a) super easy to forget and (b) a perf issue if you write a stream of data, due to the several syscalls involved, which can be noticeable if you transfer at bandwidth capacity.
  Even std (such as net/http, tls) is full of these workarounds. If Go could get the last remaining pieces right, it would be near perfect for userspace network programming imo, as well as benefits in other domains as well. I want this Pony.
rendaw 2 years ago

Similarly there's no way to shut down an HTTP client, the connection pool sticks around for the duration of the program.
- lelanthran 2 years ago
  
  > Similarly there's no way to shut down an HTTP client, the connection pool sticks around for the duration of the program.
  I don't understand this - isn't the pool supposed to stick around for the duration of the program?
  Did you mean that the connection sticks around for the duration of the program?
  
  cormacrelf 2 years ago
  
  If you make a one shot request to an HTTP server, and the system creates a connection pool for that hostname:port, and you never need it ever again, should the connection pool survive until the program terminates? What if you make hundreds of requests like this? I don’t know how Go buckets their HTTP connection pools, but ADO.NET (the .NET SQL API) does it by connection string, and has a function for freeing all pools and also for one specific pool.
  
  lelanthran 2 years ago
  
  > If you make a one shot request to an HTTP server, and the system creates a connection pool for that hostname:port, and you never need it ever again, should the connection pool survive until the program terminates?
  I did not know it did that, and scanning through the docs, they don't appear to mention it either.
  Is this only on client programs? Does it do the same for server programs too (create a pool for each incoming src-ip:src-port connections)?
  
  cormacrelf 2 years ago
  
  It does have connection pooling for clients: https://blog.cloudflare.com/whats-inside-net-http-socket-lat.... The pools naturally clear themselves though, because they observe the keep-alive timeouts. As long as your servers are not setting insanely high timeouts you might never notice. Nevertheless it does seem to be impossible to manually clear the pool. This is more of a problem when you realise you can’t use this to temporarily fix a memory leak due to a missing Close() call on a request body or something like that. So in those cases yes it will just leak indefinitely unless you specify an idle timeout in the client constructor as well.
  
  rendaw 2 years ago
  
  Or if you're doing http over various types of software tunnels, where automatic cleanup doesn't happen as usual, or there's issues with the upstream server (for internal apis) etc.

cratermoon 2 years ago

Relevant to the Race condition on error from (*Rows).Close. The Tour of Go notes, "The deferred call’s arguments are evaluated immediately, but the function call is not executed until the surrounding function returns." Problems with deferring Close() on writable files have been known for some time, and the recommendation I have is not to defer just the close, but defer a function or block that does the close and handles any error return. In this case, though, the dangling buffers in the driver introduce a problem that goes beyond just handling an error on close.

sa46 2 years ago

One pattern I happily use is to capture the error into a named return error, using a multi-error if the named error is already non-nil.

    func fetchFoo(ctx context.Context) (_ Foo, mErr error) {
        conn := pool.Acquire(ctx)
        defer errs.Capture(&mErr, conn.Close, "close conn for fetch foo")
    }

I have a couple of useful variants:

        defer errs.CaptureLog(logger, conn.Close, "close conn")
        defer errs.CaptureT(t, conn.Close, "close conn")

SPBS 2 years ago
To avoid the function overhead, you can also defer the Close() and explicitly call Close() at the end:
```
    defer rows.Close()
    /* do things */
    err = rows.Close()
    if err != nil {
    }
```
1. If the explicit Close() is called first, the error from the deferred Close() is ignored.
2. If the explicit Close() is never called because of a panic, the deferred Close() will kick in and cleanup the resources.
- arp242 2 years ago
  
  Is that actually faster? You're saving the (small) overhead of an extra function, but you're adding a new function call. Certainly in the context of an SQL query (which will take milliseconds at the least) I wouldn't expect a few nanoseconds to matter.
  
  SPBS 2 years ago
  
  That's true. However I also like to not mess around with any deferred functions that have to capture an error variable by name (named returns) or by pointer so that it can modify the error on the way out. Calling rows.Close() after you're done with rows is a simple 4-line addition that doesn't require rewriting the already existing defer rows.Close(), which lowers the cognitive load in my opinion.
  Like you said, the performance gains (if any) are probably not going to matter when a database call is involved.
enneff 2 years ago

I prefer not to use defer at all in these cases. It’s too confusing. Just put the stuff that you would do between the defer and the end of the function in another function. It’s almost always clearer.
- tptacek 2 years ago
  
  That's less safe, though: in a request handler that catches panics, you'll leak connections but stay running if something snags in the row collection logic you write, which isn't what you want.
  
  enneff 2 years ago
  
  I also prefer not to use request handlers that recover from panics. In my codebases, a panic means fail now and fail hard. :-)
  I think if there’s a panic somewhere that you’re not explicitly expecting you can say goodbye to any notion of safety you might have had.
  
  arp242 2 years ago
  
  HTTP request handlers are self-contained (or rather, should be), and more analogous to a process on a system than a function call. Bringing down the entire application is akin to bringing down the entire OS because one process segfaults.
  Most panics are small stupid things localized to a specific piece of code, like nil dereference or slice out of bounds, and not indicative of some global process state being messed up.
  
  enneff 2 years ago
  
  My HTTP handlers tend to share state between invocations, such as caches and database drivers. If the cache or database driver panics then I don’t trust the cache or db to continue to be correct, so the only sensible thing is to shut the server down.
  In my deployments, a server is just one in a pool of servers, so losing one is not a problem. The whole system is designed such that any server can die at any time and the overall service will continue to be available.
  If you have a different architecture and design philosophy to me then I can see why you would want to recover in your handlers. It just doesn’t suit me at all.
  
  arp242 2 years ago
  
  I figured you'd have something like that. Works well enough if you have a pool of servers, but it's basically a non-starter for anything that you want to be self-hosted for example, or other scenarios where things are small enough in scale that "pool of servers" isn't really worth it.
  Cases where it panics and leaves an incorrect state seem very rare; panics due to an incorrect state are slightly less rare, but still fairly rare.

sofixa 2 years ago

> Although GitHub.com is still a Rails monolith, over the past few years we’ve begun the process of extracting critical functionality from our main application, by rewriting some of the code in Go—mostly addressing the pieces that need to run faster and more reliably than what we can accomplish with Ruby

Is there anything GitHub haven't copied from GitLab? They did it before it was cool multiple years ago. Same with CI/CD, Kanban project management, package management, etc.

Competition is good for users, but it must be annoying for GitLab to get copied in everything by the incumbent competitor that remained stale for years before being bought by unlimited money Microsoft.

On this topic, maybe finally this will make GitHub Enterprise self-hosted an acceptable alternative to GitLab? As far as i know it has always been extremely painful to manage.

arp242 2 years ago

Using Go (or more recently, Rust) for some parts is a common thing lots of companies have been doing. At my last job we did it with ColdFusion and Go.
CI was omni-present on GitHub in the form of Travis; integrating it in GitHub was kind of a no-brainer, and Kanban boards have been around long before GitLab. It seems more "convergent evolution" than anything else.
> incumbent competitor that remained stale for years before being bought by unlimited money Microsoft.
GitHub already got $200-300 million in revenue before they got bought by Microsoft, roughly equal to GitLab today. I don't think money is really the big factor here (and AFAIK GitHub finances are still independent from Microsoft; they don't "pump money" in it).
I don't know why GitHub was relatively stale before Microsoft bought them. I suspect the reasons are more organisational than finance-related.

up2isomorphism 2 years ago

A lot of people believes rewriting in a different language will solve most the bugs. In reality this never happens to me in real life, not even once, on the other hand people just repeat the old bugs in a different way and feel they are doing a good job fixing them.

dang 2 years ago

Discussed at the time:

Three bugs in the Go MySQL driver - https://news.ycombinator.com/item?id=23254045 - May 2020 (87 comments)

donatj 2 years ago

I used to have a ton of issues with the Go MySQL driver - there was a specific version with specific settings I'd cargo-cult into all my projects along with comments explaining not to mess with them because I knew it was relatively stable.

These days though it's genuinely pretty solid, haven't had a problem in a couple years. They've largely worked the kinks out.

nitinreddy88 2 years ago

Someone should add (2020) to title and it was pretty old one.

dang 2 years ago

Added. Thanks!

fatih-erikli 2 years ago

Different programming languages, same databases, same problems

Spivak 2 years ago

The fact that the solution for everyone isn't "link against libmysql and write bindings so the interface is idiomatic to your language" will be a source of endless bugs.
- haileys 2 years ago
  
  The GitHub Rails app did link against libmysql originally, via the Ruby bindings. This turned out to be a source of endless bugs, especially in dev envs, so in 2015 Brian Lopez and I built a new client library for the Rails app: https://github.blog/2022-08-25-introducing-trilogy-a-new-dat...
  This article however is about Go, where linking to C libraries is often not the best approach, so the situation is a little different.
- chrsig 2 years ago
  
  1. That presumes linking against libmysql and making bindings won't be a source of endless bugs :)
  2. One of the major pitfalls with linking against c libraries that do their own i/o is that they don't play nicely with the go scheduler. That means the goroutines blocking on a c i/o operation will occupy it's thread for the duration of the call.
- ComputerGuru 2 years ago
  
  I mean, that's golang for you. While many other languages are plagued with NIH (I love rust, but it's very much guilty of this, often to avoid having to deal with non-native dependencies) they at least give you a feasible option of FFI while go is truly allergic to it.
  
  enneff 2 years ago
  
  Not really? A cgo wrapper for SQLite is by far the most popular way to use SQLite in Go and it works great.
  But I think it’s a major design error to use FFI for a thin client library. You’re bringing in a whole other network stack for what should be some simple serialisation/deserialisation. Plus even if you were, say, linking against libmysql you’d still have a bunch of wrapper code to make it a nice API in whatever the host language may be. Not worth it.
  
  dpifke 2 years ago
  
  Why do you think CGo is not feasible?
  
  bdhcuidbebe 2 years ago
  
  FFI overhead
- sofixa 2 years ago
  
  This has a significant usability issue. Static C libs usually have a bunch of static dependencies like libc which means you can't really update them without updating your whole OS. The Golang git package was unusable for multiple Debian versions because it relied on libgit which was missing some critical functions for multiple Debian major versions (e.g. Jessie and Stretch), which means for literally years.
- up2isomorphism 2 years ago
  
  Because then you can not write a big blog post to show what you are doing.