Hi everyone, I'm the plaintiff in this lawsuit. I'm still working on my companion post for tptacek's post! I'll have it ready Soon TM, but feel free to me any questions in the meantime here.
Bear in mind that Matt technically lost this, even with the backing of some of the absolute best civil rights lawyers in the country, Loevy and Loevy, fighting on his behalf. This shows you the absurd difficulty in fighting city hall, especially if you're crazy enough to do it without representation.
The one thing working in our favor is what is proposed in TFA: change the law. Once the state Supreme Court has ruled you're hosed unless you can get an amendment. Illinois has a very strong history of amending its FOIA statute, although a proportion of those changes are to further protect information from disclosure, not always on the side of sunshine.
Another change that needs to happen is strong punishment for bodies who lose these fights. In Illinois this is limited to a "$5000 civil penalty" against the body. What is a civil penalty? It's vaguely defined. They used to throw the money to the plaintiff, but in the later cases I fought they simply awarded the money to the county. As one State's Attorney said to me "I don't care if I lose every case, I just write a check out to myself."
(one final note: be careful what you wish for when you litigate, you can end up with an appellate decision like this that solidifying in law the exact thing you were fighting. It's nobody's fault, but it happens. I ended up with one absurd decision that removed prisoners' rights rather than enhanced them.)
A losing public body is also generally on the hook for attorney's fees, which can be considerable. But the general problem here is that the public bodies are all spending someone else's money, so the real deterrent you have is how much of their time you can credibly threaten to eat up with legal actions.
That's true, as long as you are represented. I knew one lawyer in Illinois who would sit in FOIA court and take all the non-represented persons aside and offer to take their cases and split the attorney fees 50/50. I believe it isn't strictly above-board, but it is a solution to a problem.
People don't like being put under oath, so you can somewhat temper a public body's future refusals by deposing them or sticking as many of them on the stand. Especially with depositions, if you aren't represented then you can't be giving any attorney discipline for asking completely outrageous questions to force the deponent to admit crimes etc under oath.
I went up against my muni over their refusal to release their police General Orders (which seems real dumb in retrospect; we got the General Orders from most of Chicagoland with no protest†). I reached out to Matt Topic, who offered to sue for free, or send a nastygram for a billable hour.
I ended up doing the latter, because I gotta work in this town, but one consequence of fee recovery is that it's much easier to get representation for a FOIA suit.
I don't understand the argument that knowing the column names doesn't help an attacker? Especially in a database that doesn't allow wildcards, doesn't it make things much easier if you know you can do '); SELECT col FROM logins, as opposed to having to guess the column name?
And I don't think I disagree with the court on schema vs. file layouts either. It's not the file layout, but it's analogous: it tells you how the "files" (records) are laid out on the "file system" (database tables). For example, denormalization is very analogous to inlining of data in a file record. The notion that filesystems are effectively databases itself is a well known one too. How do you argue they aren't analogous?
Plus, generally if you have SQL injection, you have multiple tries. You're not going to be locked out after one shot. And there's only so many combinations of `SELECT {id,userid,user_id,uid} FROM {user,users,login,logins,customer,customer}` before you find something useful.
Guessing table names is significantly harder. Maybe they get some tables like that, maybe they don't have time to guess my table called "amt.user_ticket"
You can "always" do that? Well I just did that. My database said: no such table: information_schema.columns
And what if my database had disabled this capability entirely?
Also, is there anything implying SQL here at all? Can't other databases with injection "capability" have schemas?
> Plus, generally if you have SQL injection, you have multiple tries. You're not going to be locked out after one shot.
No, you can't say it with such certainty at all. It really depends on what else you're triggering in the process of that SQL injection. You could easily be triggering something (like a password reset, a payment transaction...) where you're severely limited in your attempts.
> And there's only so many combinations of `SELECT {id,userid,user_id,uid} FROM {user,users,login,logins,customer,customer}` before you find something useful.
Your reasoning and motivation is reductio ad absurdum.
It does not make sense to base your system security on hiding from the public that your 'Users' table is called 'Users'.
If you are vulnerable to this attack, the guilt rests on your deplorable application code, not whether or not your schema table names are known. If we should follow your logic, we would have to name our Users table U_ZER_CLEVER_S because naming it something people could guess would be a vulnerability.
There is one further problem with this entire sub-discussion:
There are two mitigation strategies discussed:
- A: guaranteed SQL-injection-proof (SQL injection impossible.)
- B: Having non-obvious table-names and 'secure-defaults' (e.g. INFORMATIONSCHEMA disabled).
So, the original commenter says, he wants to _hide the schema_, so that B can protect him in case of A.
Well, failure of A is Amateur Hour. If you fail on A, I highly doubt you would have delivered correctly on B.
To write it out in plain text: If you have set up and manage an application with SQL injection errors, I have a hard time seeing you still taking care to disable /enable obscure security defaults, or take care to avoid obvious and trivial table names.
Just to put icing on the cake: As soon as you have an SQL injection attack,
a simple select * from randomTable or DESC randomTable would give you the table COLUMNS, so it utterly makes no sense to want to hide those column names - you have already lost them! (in the case you are arguing you need their protection in).
..Unless you argue that the guy making sql injection applications ALSO has set up a secure default to disallow select *..
In my experience, SQL injection is evidence of work of the sloppiest and immature nature; it was bad in 2003, and presumably still is.
> You can "always" do that? Well I just did that. My database said: no such table: information_schema.columns
Don't expect attackers to give up after one try. It depends on the database software, not everyone implements this exact ANSI standard for reflection but every database supports reflection. That's why the first step after finding a SQLi is to fingerprint the database software and go from there.
> And what if my database had disabled this capability entirely?
You can't disable it, lots of software, database features,
ORMs and clients rely on reflection. If a client can query a table they also can retrieve metadata about that table.
Absolutely, we have very strict lockdowns on the tables and views available to the users that our application uses. The permissions system in Postgres (for example) are very extensive. We even deny delete and update permissions for most tables so they become append only.
Nevermind you are right its possible, but I still think it breaks so much stuff that at least I've never seen anybody doing it or recommending it. All kinds of ORMs and migration tools would break for example. But I guess it would be a defense-in-depth strategy.
Yeah those tools may break if such a change is introduced suddenly, without testing etc. But that's not how normal reality for most companies look like, such rules are there for 2 decades at least. DBs are very old tech without much change in past 20 years and this is DB security 101.
Not even going into reasonability of ORMs, most of the stuff I've seen or implemented added practically 0 added value, and added hard-to-debug issues down the line as software evolved. Cargo culting at its best, often done on trivial schemas that could handle either direct SQL or some sql-query-to-object mapping easily.
Ah so what you're saying is that we ought to rename our logins table to "duckwords" because nobody will ever guess that? Also we should probably store passwords in plaintext but name the column "entercod3" because nobody will think of that. Oh and we should use printf with %s to build our queries right?
That's a good point, has anyone hardened a database by locking out users who select columns that don't exist? Or run other dubious queries? This would obviously interrupt production but if someone is running queries on your db it's probably worth it?
I once did an security assessment for a product such as what you describe. Among other problems with it, the product itself had SQL injection vulnerabilities
If you are mature enough to do that, you're mature enough to net SQL injections in the first place. There shouldn't be that many handwritten queries to review in the first place as most mundane DB access is usually through a framework that handles injection properly...
Zane Lackey (with Dan Kaminsky) gave a talk that discussed doing literally that sort of things, back in 2013. Zane went on to found Signal Sciences (acquired by Fastly), doing this sort of stuff in the 'WAF' space.
I guess the main difference is that a WAF attempts to spot things like injection (unbalanced delimiters, SQL keywords in HTTP payloads where SQL shouldn't exist, etc.) typically without knowledge of the schema, whereas GP is talking about the DBMS spotting queries where queries must exist but disagree with the schema. Might as well do both, I suppose.
That’s not what the talk is about - it’s using dbms query error logs to spot attackers. Stuff like “table doesn’t exist” or “invalid syntax” on your production database can be extremely high signal indications that something is wrong, potentially maliciously so.
In the very early 2000’s I worked at a company building something along those lines. We could analyze SQL and SMB traffic on the fly and spot anomalous access to tables/columns/files, etc. Dynamic firewalling would have been the next progression if the company didn’t have other issues.
So if you deploy code before you run the associated db migration, or misspell a column name, you magnify the impact from whichever code paths (& application tier nodes) are running the broken SQL, to your entire production environment.
Simple variation to a hard shutoff: immediately page "significant risk a successful sql exploit was found", and then slow down attackers:
If an SQL query requests an unknown table, log the error, but have that query time out instead of responding with an error. Or, even better, the offending query appears to succeed, but returns fake table data, turning it into a honeypot built-in to the DB. This could be done at the application layer, or in the DB.
The goal is to buy an hour for defenders to determine how to respond, or if its a red herring. There are a variety of ways of doing this without significant user impact.
Yeah it's definitely something that could do more harm than good to a company long term. But I'm sure there are instances where this tradeoff is worth it. They would invest more heavily in runbooks or maybe even ci that runs migrations on deploy. Deleting columns would need to be done on your deploy + 1. Probably no rollback at all.
A good DBA would restrict the account so that it can't access the information schema. It's easy to imagine an environment with a vigilant DBA and less vigilant web developers.
This makes sense, but the the vast majority of tooling including ORMs, autocomplete SQL IDEs, and even suspect application code relies on table descriptions and listings provided by the information schema
That is why we have development and production environments. The production environment is expected to operate in a potentially hostile space and does not need developer conveniences beyond the ability to generate alerts and produce logs, which should be stored in a safe way, everything else should be locked down as much as possible.
If you have an injection friendly application then that is the security problem.
Say someone hacks the db, is the problem easy to guess table names? The column should never have be called "passwords"?
Perhaps 30 years ago that would sound good.
Obscurity should hardly ever be a line of defense. If it is the only defense the problem isn't that it wasn't obscure enough.
Edit:
I'll do you one better. If you so much as suggest that obscurity is good security you actually openly invite people to fool around with your applications. The odds holes are to be found are much better than elsewhere.
I probably delete everything and pretend it never happened. It depends ofc on the worse case scenario. What can i do/afford to deal with the greatest risk? I might use it on a machine without internet.
> I don't understand the argument that knowing the column names doesn't help an attacker?
So Kevin Mitnick supposedly did most of his hacking using "social engineering". He'd call up some person, pretend to be in some other department within their organization, and ask them for some specific bit of information he needed to further his attack (or ask them to change some specific thing that would allow him to further his attack).
Would knowing the structure of Illinois governmental organizations help someone perform social engineering attacks against them? Yes, absolutely.
Should Illinois therefore keep the internal structures of their organizations -- the department names and the officials who run them -- secret? No, absolutely not.
First of all, if an attacker doesn't know them, they'll just use other social engineering attacks to figure them out; i.e., hiding the structure doesn't stop social engineering attacks, it just slows them down. Secondly, the value to the public of being able to navigate governmental structures far outweighs the cost of potential attacks.
This seems to me to be a direct analog: The "organizational structure" is the "database schema", and the "willingness to help a random person on the phone who seems to know what they're talking about" is the "SQL injection vulnerability". If an attacker knows the schema, their job is faster; but if they don't know the schema, they'll just use attacks to figure out the schema; so keeping it private doesn't stop an attack, only slow it down. And the benefit to the public of being able to issue FOIA requests far outweighs the cost of potential attacks.
> And I don't think I disagree with the court on schema vs. file layouts either.
I disagree that the law should prohibit disclosing "file layouts" but it's pretty clear that the law does block that, and I fundamentally agree with you that schemas are directly analogous to file layouts and thus restricted.
A SQL schema literally does not indicate the locations of data inside of a file. In fact, the whole reason schemas exist is to decouple the relationships between table rows and the pages and indexes that store that data. We had relational databases before SQL, and there are non-SQL relational (and non-relational) databases today, but you program them, at the query level, with code that is aware of what tables live where.
A schema is the opposite of a file layout. A schema is to a file layout what a Google search is to an IP address.
If you tell me that you have a closet for your jackets and another closet for your shirts, you're telling me how clothes are laid out in your wardrobe. Specifically, you're telling me that you're laying those out separately, and able to deal with them independently, with little interference between the two. It's not the entirety of the layout information, but it sure is some of it.
If you tell me that you have a column for your first names and another column for your last names, you're telling me how names are laid out in your database('s files). Specifically, you're telling me that you're laying those out separately, and able to deal with them independently, with little interference between the two. It's not the entirety of the layout information, but it sure is some of it.
Sure -- in theory, you could be actually throwing everything together into a dumpster, then paying enough people to search it all in parallel when you want to retrieve that red jacket. If you're actually doing that, maybe you could legitimately claim that you haven't divulged anything about your closet's layout by telling me that shirts and jackets are separate. But chances are pretty darn good you're not actually doing that (and I would know this for a fact if I already somehow knew you were actually using closets built by Joe down the street), and thus actually are exposing layout information by telling me that you're storing them separately. One security implication of which is that, the moment that I get a glimpse of your closet and notice that it contains a shirt, I know it's not the one with the jackets, and I can skip it when trying to steal that expensive red jacket.
It's either a file layout or it is not a file layout. If you write an affidavit saying it's "sort of like a file layout", the conclusion will be that it is not one. Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts), and then they pulled up a definition of "schema" from Mirriam-Webster, and the definition of "schema" was so abstract it could have matched anything.
If anybody on the Illinois Supreme Court had known what a schema actually was, we'd have won the case. Further, if the definition of "file layout" had been more material to the Chancery case, it would have been in the trial record that it wasn't one.
> Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts)
"Wrongly" was exactly what I just spent an hour writing a long comment disputing, with a detailed explanation. Specifically, with a real-world analogy between “a description of the arrangement of the data in a file” and “a description of the arrangement of the clothes in your closet.”
If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk. Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
Now if you wanted to argue that a schema serves the same purpose as a file layout, ie that it's how a programmer interfaces with the data, and that it impacts workload performance, that would be fair enough. And given that laws are all about intent perhaps that would be relevant. (Or perhaps not. I didn't read about the case yet.)
But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
> If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk.
That's one thing I'm saying would be sufficient to consider this file layout, yes. I'm not saying it's necessary. Databases can obviously be row-oriented too. Knowing that they don't cluster would also be layout information. As could any number of other things.
> Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
It doesn't have to include offsets or any of those other things. File layout information could be as simple as "data should be aligned to a page boundary for performance" or "this field must reserve space for up to 16 characters" or even "data from different records should not be stored in an overlapping manner, to allow fast erasure"... I could go on. And notice the wardrobe layout example doesn't have offsets either, but the decision to separate jackets from shirts is absolutely one about layout nonetheless.
> But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
It is not complete file layout information. But it certainly can be part of the file layout information.
Imagine you had a table with columns name1 VARCHAR(64) and name2 VARCHAR(64) in that order. Now imagine you modified a couple of bytes on the disk, such that you swap the 1 and the 2. You can imagine a database where that would be sufficient to confuse it into thinking the two columns had swapped contents, right? Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
Note that "some information related to the file layout" or "some information that has an impact on the file layout" is not "the file layout" in a literal sense. Thus it seems to me to follow that the answer to the question "is this a file layout" should be no.
Symbolically it isn't [ schema -> file layout ] it's [ schema, engine version -> file layout ]. Even if you had that additional information, neither item by itself nor even the pair together would be correctly considered a file layout. If I have a function f( foo, bar ) -> baz neither a foo nor a bar is a baz. I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.
For that matter, even the [ schema -> file layout ] case isn't technically a file layout any more than a json blob is an xml blob. Being trivially translatable doesn't change the definition.
Compare that with the question (also commonly asked by courts) "is thing equivalent in intent (or use, or ...) to other thing" in which case the answer might feasibly be yes.
> Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
In that example you have made an educated guess about the file layout and then taken advantage of that (guessed) information. "You can imagine a database" tells you everything you need to know here, namely that this is entirely dependent on the implementation. So yes, I would claim that the schema did not on its own contain any file layout information though in conjunction with knowledge of the implementation it could be used to derive such.
> I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.
What is "sandwich" in this analogy? Nobody is claiming the schema is a "database", or a "table". I was saying it's one component of the file layout.
Using your own analogy: if you know you put the jam near the peanut butter, you know part of the ingredient layout. You can't say "it's not ingredient layout if you haven't told me where the bread is."
We can successfully interpret the two words “guinea pig” without it pertaining to either pigs or things coming from Guinea, so I’m sure this is also possible.
DBs can be files on disk though? Besides they're a bit like easy hand rolling powder mix for filesystems. Filesystem entries has properties like filenames and inode numbers and file contents. Databases has columns like emails and membership IDs and their favorite cookies. I don't think "file layout" is an absurd framing.
It is in literally no sense a layout; the whole point of a schema is that it doesn't tie you down to a layout. SQL schemas make sense even in the absence of files!
You suggest that we interpret "file formats" as exactly this -- no more, no less. This approach is also called "textualism". The other option is to interpret "file formats" in the context of the law that includes these words. Or: what exactly did the lawmakers have in mind when they said that (a) government needs to provide information; (b) except for several cases, of which one is (c) "file formats". What kind of information did they think it was ok for the government not to provide?
I agree with the Court's argument that "the information about how the actual information is stored and connected one piece to another" is what the lawmakers meant in this case.
- If the actual information is stored in the files, the government does not need to disclose how these files are organized ("file formats").
- If the actual information is stored in the database, the government does not need to disclose how the database is organized (database schema).
- If the actual information is stored in the block memory -- with structs and pointers -- the government does not need to disclose the structs and the pointers.
The "textualist" opponent would of course argue, as OP did, that the second and the third example aren't excepted by clause (c) because "when there is no file, there could be no file format". This however is missing the point (in my opinion), as it doesn't see the forest for the trees.
>> And I don't think I disagree with the court on schema vs. file layouts either.
> I disagree that the law should prohibit disclosing "file layouts"
Note, the court wasn't ruling what the law should say, only what the law says. At least that's my understanding of it. I certainly wasn't opining on what the law should say.
Understood. I mention that distinction only because I find many people (not you) who say that "X law doesn't apply because if it did, it would be bad" vs directing your ire at the actual laws, which are poorly written and the legislators who are negligent in fixing those laws.
Courts should decide based on the law, not based on what is "good".
Agree, and, I don't even understand why it's in there in the first place (it should just not be) but that's a job for the legislature to resolve, not the courts.
> Without additional context, I would interpret the term “file layout” to mean the file and directory structure of an application.
I would interpret it to mean a description of what the file contains and where. This is information you need if you have a mysterious file and you want to parse it. It's also information you need if you have some data and you want to create a readable file that expresses it. But for the concept to apply to a database schema, (a) the database would have to be a file, and (b) the schema would have to specify where the information in the database is stored. That's difficult to do, since the schema has no knowledge of how much information there is in the database or how it might be written down.
> Attackers like me use SQL injection attacks to recover SQL schemas. The schema is the product of an attack, not one of its predicates”.
If it's the product of an attack, but not the end goal, surely it's of value to the attacker?
It seems clear to me that the statute does, as worded, in principle allow the city not to disclose the database schema - it would compromise the security of the system, or at the very least, it would for some systems, so each request needs to be litigated individually.
The proposed amendment sounds like a good way to fix this - is it likely that will pass?
Lots of things are "of value". That's not the bar the statute sets. To the extent something isn't per se exempted by the statute (as the outcome of the case established schemas are), the burden is on the public body to demonstrate that disclosure Would jeopardize the security of the system.
It still seems like a massively gray area: despite the distinction between "would jeopardize" and "could jeopardize" as explained by TFA, the definition of "jeopardize" includes "danger" which means "could lead to harm" not "would lead to harm" at which point it hardly matters whether a thing "could endanger" or "would endanger" the security of the system.
"Would" versus "could" has nothing to do with why your analysis doesn't hold. If something doesn't enable people to attack a system, but is merely one of the valuable things you could get from that system, it does not jeopardize that system under Illinois law. The standard of proof for the jeopardy doesn't enter into it, because no claim of jeopardy has been made.
Again: this part of the case is settled. We didn't lose at the State Supreme Court because the court was worried there was jeopardy, but because they re-read the statute as per se exempting schemas as "file layouts".
How is it that this wording stuff isn't already decided globally? I mean, the concept of dangling modifier has existing for centuries, do the courts really decide this kind of thing on a case-by-case basis by random dice roll?
Whereas math, science, and engineering use language as a vehicle for attaining truth, the legal profession too often regards it as truth.
The greatest legal scholars of the state of Illinois believe there is more decorum in querying Merriam-Webster than there is in reading tea leaves or consulting a Ouija board, but they are wrong. All too often, jurists make decisions based on unconscious accidents of wording by their predecessors, then compound it with their own fallible powers of interpretation and deduction, further cementing their wrongness as "precedent." Instead of addressing this core ambiguity of the FOIA exemption, or attempting to appeal this nonsense interpretation of an undefined term, or introduce better linguistic standards to the legal profession at large, the path of least resistance for victims of litigious violence is to add more complexity in the form of endless amendments. This is what Matt and friends must now pin their hopes on.
Little wonder how one can spend a lifetime specializing in the (martial) art of litigation.
Maybe for this case, but it sounds like enough hinges on the details of the system that in another database, a court could uphold that there "would" be jeopardy instead of there "could" be. So you won on the more fragile part of the ruling.
On the other hand, interpreting the law as exempting database schemas is something that can be applied to any computer system, and it presumably sets a binding precedent (I'm not familiar with Illinois jurisprudence, but that's how I'd expect something called the State Supreme Court to work) so losing on that point is worse for future cases.
Losing on what point? Everybody agrees it is bad schemas are per se exempt from FOIA. On the security concerns of releasing schemas, we won in basically every court.
> If something doesn't enable people to attack a system, but is merely one of the valuable things you could get from that system, it does not jeopardize that system under Illinois law.
The problem I have with this is that the schema isn't something an attacker recovers for its own sake. It's something the attacker recovers in order to further their attack. This necessarily means that it does enable people to attack the system. That's the only value an attacker sees in it.
> Again: this part of the case is settled. We didn't lose at the State Supreme Court because the court was worried there was jeopardy
Doesn't matter to the discussion; the court, Supreme or trial, can be wrong as easily as it can be right.
I don't understand your argument. If I have a SQLI, I can, as you acknowledge, fetch the schema. So what does it matter if the schema is published a priori? All that matters is whether I have SQLI.
No, as other comments in the thread have pointed out, you can easily have an SQLI that doesn't send information back to you. You may find value in changing what's in the database even if you can't read from it.
If you do have the ability to retrieve information, then one of the first things you'll do is retrieve the schema.
And the reason you'll retrieve the schema, if you can, is that it facilitates the attacks you actually want to make. It has no value to you other than enabling your attacks. This observation seems sufficient to answer the question "does knowing the schema enable attacks?".
There is a whole sub-field of software security dedicated to retrieving information from SQL injections that don't directly return results. This is not a plausible objection.
> If it's the product of an attack, but not the end goal, surely it's of value to the attacker?
Well sure, but it doesn't help them attack. That's like arguing that since the bank robber wants dollar bills, dollar bills must be a useful tool for breaking into bank vaults.
If both sides agreed to the analogy of giving the bank robber the blueprints to the vault, I think any lay judge would agree that endangers the bank's security.
I'd say it's more like knowing the layout of the drawers inside the cage. If a robber is inside the cage, they've already won. And if an auditor is checking the bank has what it says it does, they've got legitimate grounds to ask which money is in which drawer, and "no, it's a security risk" is not a good answer.
>It's not the file layout, but it's analogous...How do you argue they aren't analogous?
laws don't get to be analogous
foia request: "I'd like the report the committee prepared about the costs for the new bridge"
response: "denied. the report contains costs laid out in tables with headings, which while not being schemas are analogous, with schemas not being files but being analogous"
I agree with you. Knowing the exact column names can speed up an attack and, in some cases, make it more feasible.
Why don’t they just request disclosure of what’s actually stored and allow renaming of the columns? It seems odd that knowing the exact column names would be necessary if the goal is simply to understand what data is being stored and its intended purpose.
Yeah, I think it's still useful info for an attacker. But only if the system was actually developed by amateurs who never heard of parameterized queries.
I find it a bit bizarre that the city uses "our system was developed with no consideration for security" as a valid defense.
This fails if either the UI sanitizes wildcards, or if the database prohibits them, or if it produces so much data that you can't ingest it in time, etc.
It also fails if the system was written using parameterized queries. I wouldn't expect a system to be sanitizing anything if fails to take the most basic step for db access. This whole discussion is only relevant for systems developed by amateurs. SQL injection can only work at all if you use string concatenation to create queries, which you should never do.
If you do it wrong, yes. Sure, there is no 100% security, but honestly, it's 2025. We already know the techniques how to prevent SQL injection of any kind. I wrote about this here: https://valentin.willscher.de/posts/sql-api/
> Right but the case that is being imagined here is a site that perfectly sanitises * but somehow still allows SQL injection? I don't think so.
It could literally just reject anything with asterisks.
It doesn't even need to do anything perfectly, it just needs to do it enough to produce hurdles for you. Like blowing through the number of attempts you realistically have remaining.
I don't want to take away any steam from your sails but giving bad information in regards to case law shouldn't be taken lightly. Your "expert witness" did you a disservice.
Schema is very much a critical field in terms of AuthZ privileges. Just knowing the structure is not far off from knowing the max entropy a password may hold. In regards to InfoSec, table structure is the recon phase which limits effort and minimizes time. Someone with that much time in security knows DBs will be hacked, not if but when. Time is an incredibly important tool which is why we have expirations on so many authN and authZ windows of attack.
I'm glad that you are challenging them but I believe a credible engineer would have made mince meat of your expert and hurt the rest of us who want to see you successful.
It's possible rewriting certain statutes can help us but there is no company worth its salt that would share DB schema.
> Just knowing the structure is not far off from knowing the max entropy a password may hold
Not if the password is hashed, as it should be. Unless the schema somehow indicates that it uses a hash algorithm such as bcrypt that has a maximum password length. And even then, if they pre-hash the password, the password itself could have more entropy than that. And if there is a maximum password length, then you can probably figure that out via other means, like it being listed in the requirements when you set your password. It does tell you the size of the hash of the password, but if the maximum entropy is sufficiently high, as it should be, then it doesn't really matter; it would still be impractical to brute force.
> there is no company worth its salt that would share DB schema
So you are saying that every company with a self-hosted or open source product that uses a database isn't worth their salt? If your DB is running on a customer's infrastructure, that customer will by necessity have access to the schema. And likewise if the source code for a product is publicly available it is trivial to determine the schema.
This is a technical solution to a people problem. My reading is that the city doesn’t want to give up this information. If that’s the case, a technical solution wouldn’t work, no matter how easy it is. And given that this has already gone to the Illinois Supreme Court (and lost), the only solution is what is discussed at the end: updating the law.
I agree this is something of a technical solution, but the court wasn't interpreting whether you could ask for rows from a database, but whether you could ask for the schema directly. I don't think the court had the option of saying "you can't ask for the schema, but asking for a sample row is ok".
The short answer is yes, you can do this. I've seen this work for emails, where the request is basically, "Give me the most recent email of blah@gov.com".
And yeah, the plan was to eventually submit a batch of requests using the table names, similar to `SELECT * FROM {table_name_from_schema_request} LIMIT 1`, but one FOIA request per-table.
I have once wrote a script that translated sql requests into proper Ukrainian legalize invoking the equivalent of FOI to quite citizenship statistics from the agency. It worked, but they were not very happy when I had to get to them on the phone.
No offense, but how can you be 1) insisting it's safe to give up the information to you and 2) openly planning to use the information obtained for further exploitation, at the same time? You can't have the cake and eat it too, unless the information available in 2) technically do not depend on 1) but doing it this way would only save them massive time or something.
Seems like you could asked for a verbally masked description? Like an enigma coda specific to the FOIA.
"Describe to me the columns, in simple non-programmatic english, and what the purpose of the table is for, for each table related to parking tickets"
Essentially a human to schema DSL That is only technically decipherable by the admin of the database. Then you're not having actual code and only the admin could decipher.
But yah, as you said, if the humans don't want to disclose their foibles, how the request is filled is technically meaningless.
I wish it were that easy easy. I'll go more into this specific question in my post, but the short answer is that FOIA does not statutorily require the creation of new records in response to a request. The gov agency creating a description of the data in response to the FOIA request would be creating new records. It's silly.
Yeah I can see that, seems like masking isn't creating a new record, but obviously that's not how it's interpreted, because you're using the human filling out the form to interpret then return the data. FOIA typically allow for redactions and that seemingly creates new records because they have to redact things and knowing what to redact is providing masked information and that's a new record.
As such, they could claim all FOIAs that require redactions shouldn't be fulfilled because a redacted record is a new record.
> the only solution is what is discussed at the end: updating the law.
That, and actually penetrating the data system and subsequently "leaking" parts of it. Which is nearly always illegal, but could be considered a form of "Civil Disobedience" especially if done ethically - e.g. removing sensitive data or leaking only aggregates of the data. Either from outside, or by a whistle-blower.
I'm not saying "hack the government!". But I am arguing that the pressure of "getting hacked" is like the pressure of protests, blockades, occupying facilities etc, all of which civil disobedience, and often simply illegal too. All are tools in the belts of civilians to keep a government in check. Extracting information that a government is not willing to give but that would benefit the governed, should IMO often be considered such a tool as well.
Have you tried looking for information from the developer about CANVAS? With any luck the developer has support documentation online that describes CANVAS and maybe you'll be able to narrow down your FOIA request.
I think the point of the lawsuit is less about CANVAS schema itself and more about the ability of the government to hide this kind of information from FOIA requests.
Damn, this is impressive. I've been fighting with a state agency since December for 17,000 emails. I don't think I've ever tried to request emails and received zero push-back, but a $33 million estimate just, chef's kiss
Very interesting case! Just one question: to what extent do changes in database schemata fall under FOIA in Illinois? That is, if they should change the database schema to conceal whatever it is they're fighting tooth and nail to hide, are they compelled to retain detailed information about that change? Or can they later present you (should the legislation pass) with a cleaned-up, nothing-to-see-here updated version?
Hard to say. One of my personal drivers for this lawsuit is a tip I received that said that Chicago has a list of vendors whose tickets are dropped in the back-end. When I requested that info, the city said they had no such list. I trust my source, so having schema information could help figure out the extent and if they were lying.
If they lose in court they have to pay court-determined attorneys' fees. That might be sufficient to get them to appeal automatically.
This is a tension you sometimes see discussed in the context of wrongful imprisonment, where one faction says that if you get tossed in jail for 30 years over something there was never any evidence that you did, the state should have to pay a penalty, and another faction says that if you penalize the state for randomly imprisoning innocent people, those people will never be allowed out of jail.
Earnest question: If you suspect them of lying on the issue, why would you trust them to release the full schema in response to the FOIA request, and not just omit any possibly incriminating columns?
It's always a possibility that some low level official not in on the scam sees the FOIA request before management tells them not to work on it. The more you ask for, the less filtering there is going to be, simply because of how people work.
If you're running the scam, you don't want to tell low level employees about it, because they have no incentive not to blow the whistle.
What is the theory then for why they do not want to release this schema? Don’t misunderstand me I appreciate how important it is that people push the boundaries of FOIA.
The statute says they're not required to. For a couple years, the statute did say that they had to, as we won multiple cases in lower courts, but Chicago appealed to the Illinois Supreme Court, and the outcome was that now the statute exempts schemas.
By that logic there's no point investigating any crime or doing any kind of audit. You increase the costs of covering up, and put them in a dilemma - remember this is exactly what brought down Nixon.
Because this is not how government works. Most of the time it's not a heavily entranched conspiracy. Once the request is approved to go through by the legal department, some technician will happily give you everything you want and it won't be censored or tampered with in process.
Many times the people answering the requests aren't part of the conspiracy to commit random acts of malice. Sometimes they're roped into it under threat of termination.
And often times, the denials eventually lead to significant reorg once judges and Congress can revise laws to fix the ambiguities.
Well that certainly sounds suspicious. But it could also provide more damming evidence of targeting groups, people skimming the till, bribes to make tickets go away, all sort of fun shenanigans.
Bribes are most certainly not logged in the system under the "bribes" column or codified in any way. The data discovered through foi could show some patterns which are suggestive of bribes, but the actual thing is negotiated "off chain".
That’s what I meant. For example, people who have a suspicious number of tickets dismissed. Or perhaps certain employees that dismiss a suspicious number.
As mentioned in the post FOIA tends to only include existing records/information, it doesn't extend to producing new work. So producing a new report would be considered too much work. (But fighting a lawsuit to not reveal the schema is fine )
> Normally, a flustered public records officer would just reject a giant request for being for “unduly burdensome”… but this sort of estimate is practically unheard of. So much so that other FOIA nerds have told me that this is the second biggest request they've ever seen. The passive aggression is thick. Needless to say, it's not something I'm willing to pay for!
While I believe that the city should share the schema, and that the city is effectively argues for security through obscurity, I disagree with the main premise of the article: that knowing SQL schema doesn't help the attacker.
If I understand the argument of the author here:
> Attackers like me use SQL injection attacks to recover SQL schemas. The schema is the product of an attack, not one of its predicates
The author appears to imply that once the vulnerability is found, the schema can be recovered anyway. It is not always the case. It is perfectly viable to find a SQL injection that would allow to fetch some data from the table that is being queried, but not from any other table, including `information_schema` or similar. If all the signal you get from the vunlerability is also "query failed" or "query succeeded, here's the data", knowing the schema makes it much easier to exploit.
> the problem is that every computer system connected to the Internet is being attacked every minute of every day
If you specifically log failed DB queries, than for all the possible injections that such 24/7 attacks would find you have already patched them. The log would then be not deafening until someone stumbles on the actual injection (that, for example, only exists for logged in users, and thus is not found by bots), in which case you have time to see it and patch before the attacker finds a way to actually utilize it.
Knowing schema both expedites their ability to take advantage of the vulnerability, but also increases their chances of probing the injection without triggering the query failure to begin with.
> that knowing SQL schema doesn't help the attacker.
Knowing the name of the service helps the attacker, knowing the name of government officials working at city hall helps attackers, knowing the legal description of what a parking ticket is helps attackers. If you are sued and decide you want to hack the government knowing the details of the suit against you helps you in your attack.
The barrier is not “any helpful information must be censored” the barrier is “don’t disclose passwords or code that would divulge backdoors” a schema cannot be that.
I'm not an attacker, just a boring old software dev. If there's an SQL Injection I'd say all bets are off re: schema.
That said I've definitely worked on applications where knowing the schema could help you exfill data in the absence of a full injection. The most obvious being a query that's constructed based on url parameters, where the parameters aren't whitelisted.
So I actually do agree that the schema could potentially be of marginal benefit to the attacker.
I can't imagine how the schema would reveal SQL injection holes. Maybe other holes, though. Any poor choices for PKs, dumb use of MD5 computed fields, insecure random, misuse of NULL, weird uniqueness constraints (this also ties back to NULLs), vulnerable extensions, wrong timestamp type, too-small integer type, varchar limits, predictable index speed...
> I can't imagine how the schema would reveal SQL injection holes.
It wouldn't. I'm just assuming that the thrust of the hypothetical negligence accusation was "The schema is useless unless you have SQL injection holes. So give us the schema or admit you are negligent!" But you're correct that there are other justifications one could make to keep the schema secret.
The schema can provide an insight into what the application developer was thinking when writing the code, which in turn can direct an attacker towards tricky corners where mistakes might have been made.
This is the city government here. The people arguing the case didnt write the code and dont have time to look through all their code but one thing they do know is that it was written by monkeys. They probably have some level of reason to believe their are SQL injections available in the code.
However his comment assumes monetisation is selling the bug; (tptacek deeply understands the market for bugs). However I would have thought monetisation could be by scanning as many YouTube users as possible for their email addresses: and then selling that limited database to a threat actor. You'd start the scan with estimated high value anonymous users. Only Google can guess how many emails would have been captured before some telemetry kicked off a successful security audit. The value of that list could possibly well exceed $10000. Kinda depends on who is doxxed and who wants to pay for the dox.
It's hard to know what the reputational cost to Google would be for doxxing popular anonymous accounts. I'm guessing video is not so often anonymous so influencers are generally not unknown?
I'm guessing trying to blackmail Google wouldn't work (once you show Google an account that is doxxed, they would look at telemetry logs or perhaps increase telemetry). I wonder if you could introduce enough noise and time delay to avoid Google reverse-engineering the vulnerability? Or how long before a security audit of code would find the vulnerability?
Certainly I can see some governments paying good money to dox anonymous videos that those governments dislike. The Saudis have money! You could likely get different government security departments to bid against each other... Thousands seems doable per dox? The value would likely decrease as you dox more.
> "query failed" or "query succeeded, here's the data"
Blind SQL injection is a type where no error is produced, but some subtle signal can indicate success or failure. The most interesting one that I know about is where the presence of a successful injection was a normal looking response that was one byte longer than an unsuccessful injection. This was used to not only figure out the schema, but to fully exfiltrate the entire database.
There is nothing in the log on the server that indicates an error.
Most of the relatively introductory SQL injection exercises that I taught proceed without any knowledge of the schema.
Not just with SQLi, but I've managed to statistically proof "information" with timing attacks.
Where if you join another table (by e.g. requesting extra info in a graphql query) the response goes from ms to s or even m. Indicating the size of the joined table.
Or where I could change a "?sort[updated_at]=desc" to a "?sort[password_hash]" through trial-and-error and suddenly see the response time drop from ms to seconds (in this case finding columns that exist but aren't indexed).
Even if the response content is exactly the same, we know things exist, are big, not indexed, or simply present, by timing the attack.
A famous one is obviously the timing trick to find out that an email is in the system because "user = user.find(email) && user.password_matches(password)" short cirquits if the email does not exist but spends significant time on hashing the password for matching it. A big lot of backends and apps make this mistake.
If you specifically log failed database queries, where "failure" means "indicative of SQL injection", then nothing you can do with the schema is going to reduce the signal in that feed --- even a single SQL syntax error would be worth following up on. No, I don't think your logic holds.
I don't understand your logic. Knowledge of the schema can give an attacker an edge because they now know the exact column names to probe. Whether these probes get logged is irrelevant; even if it makes the system more vulnerable for an instant, it's still more vulnerable.
Even if logging failed queries is your metric, then knowledge of column names would make it more likely for an attacker to craft correct queries, which would not get logged, thus making your logs less useful than if the attacker had to guess at column names and, in so doing, incur failed queries.
To probe for what? How does knowledge of a column name make it easier for me to discern whether a SQL injection vulnerability exists? I've spent a lot of time in my career probing for SQL injection, and I can't remember an instance where my stimulus/response setup involved the table names.
SQL injection is a property of a SQL query, not of the schema itself. To have a meaningful chance of blind-one-shotting a query, getting a TRUE/FALSE answer about susceptibility without ever generating a SQL syntax error, I would need to see the queries themselves.
Knowledge of the column names doesn't give you insight into whether a vulnerability exists. It gives you insight into what you can do with a vulnerability, should it exist. For example, if you want to set your account balance to $1 million, you'd need to know the column name in order to generate a valid query. Without advance knowledge of the column name, your job becomes harder.
SQL injection will give you the entire schema anyway. It doesn't help if someone tells you the col names beforehand. I'm more wondering about non-SQL-injection vulns.
SQL injection isnt just an ssh tunnel to the database. If the line you've injected isnt a select and the backend never fetches it how does the injection give you the column names?
> How does knowledge of a column name make it easier for me to discern whether a SQL injection vulnerability exists?
It doesn't. It just means that as soon as you find one, you can immediately begin crafting valid queries instead of randomly guessing table names and columns, therefore not setting off the "DB query failed" alert.
EDIT: I guess this is the part I missed:
> To have a meaningful chance of blind-one-shotting a query, getting a TRUE/FALSE answer about susceptibility without ever generating a SQL syntax error, I would need to see the queries themselves.
Really? I guess I have to take your word for it because I've never attempted it, but I would have thought that in some (horribly broken) systems `bobby tables' or 1=1 --` would have a very reasonable chance of detecting SQL injection without alerting anyone.
Right, and that's what you use to find the vulnerability. But imagine you've found the vulnerability and now you want to use it to update all of your parking tickets as paid. Without the schema, this is going to be quite tricky and will generate a lot of failed SQL. With the schema, you might be able to do it on your first try.
Is there not any SQLi vulnerability in practice that doesn't allow such an information recovery? That is, is the schema-recovery step so foolproof that it can always be performed on any target form? GP is suggesting that this may be difficult, depending on the kind of signal that gets returned from the form.
In my entire experience as a software security practitioner, which at the time of my testimony encompassed some hundreds of assessments of SQL-backed websites, the availability of a schema has never impacted my ability to exploit a SQL injection. It's not my job as an expert witness, nor Matt's job as a plaintiff, to invent improbable scenarios where security could hinge on schema availability. The court (all of them, in fact) found that testimony dispositive, so I'm happy to leave the issue there.
I don’t think that’s a very common setup but perhaps I’m just exposing my own ignorance. Just consider the popularity of ORMs. They explicitly load the schema into the application in many cases.
Not just that, but perhaps the app is smart enough to lock you out the second it detects an attempt to gather the schema, e.g. by logging and automatically responding to a query that displays the schema. Then you have to look for other ways in (another IP, etc.). But if you know the schema in advance, you have a better chance of a one-shot injection that accomplishes your malicious goal.
In other words, advance knowledge of the schema may make it easier to act maliciously.
> nothing you can do with the schema is going to reduce the signal in that feed --- even a single SQL syntax error would be worth following up on
Syntax errors coming from your web application mean there is a page somewhere with a bugged feature, or perhaps the whole page is broken. Of course that's worth following up on?
Edit: maybe I should add a concrete example. I semi-regularly look at the apache error logs for some of my hobby projects (mainly I check when I'm working on it anyway and notice another preexisting bug). I've found broken pages based on that and either fixed them or at least silenced the issue if it was an outdated script or page anyway. Professionals might handle this more professionally, or less because it's about money and not just making good software, idk
> Syntax errors coming from your web application mean there is a page somewhere with a bugged feature, or perhaps the whole page is broken. Of course that's worth following up on?
This is a government system, with apps probably built by lowest-bid contractors.
I imagine most of us would be horrified by the volume of everyday failed queries from deployed apps.
Can be, but I'm not sure it's worth investigating whether a particular deployment has such a specific monitoring system before being able to do a FOIA. The schema is marginally relevant for attacks at best (with heavy emphasis on just how marginal it is) and that's no barrier to releasing it
That's where the court's technical distinction between the words: "could" and "would", is important. It appears they have reduced the distinction to a risk assessment which is more objective than opining wildly!
For example: I've just re-wired a three gang light switch. I verified power on with my multimeter (test the meter), cut the power and then retested all the circuits to make sure I had got it right.
It turns out that switch three is on a separate ring main. Cool I didn't get to test my body's ability to take a whopper of a shock. In the UK it is common to have upstairs and downstairs rings for light circuits. Our kitchen has quite a few lights in it so it got a separate ring as well. Anyway there are quite a lot of wires in there because all of them are two way switches. Oh and I am allowed to work on them because of the switch location - not kitchen and not bathroom, ie a low risk location
I noted down the connections, and took them all out. I put Wagos over the flying ends to make them safe, turned the power back on and got on with the job in hand.
I then cut the power (both circuits) checked again with my Fluke. Oh bollocks ... enable power, test the Fluke and then cut power again and recheck the circuits.
Now I re-terminated all the connections. There was plenty of additional wire so I decided to cut and re-strip the conductors, to make sure that I avoided potential failures due to "work hardening" from the inevitable pushing and pulling and "gentle" forcing into position. Once all the conductors were screwed down I pulled on them fairly forcefully to make sure they wont fall out.
I screwed down the switch face plate and restored power. Its a brushed metal finish switch so I did test it was not live, because I'm careful. I tested the functionality ie all three switch circuits (three) from all the switches (six).
So, given that description is it possible that the connectors might fall out in the future and short on say, the metal back box. Of course it is possible. It could happen but would it happen?
You could postulate all sorts of scenarios. Perhaps I may be careful but I might be cack handed and forgetful and got something wrong anyway and a wire might still drop out. Now we are at the point of whataboutery! and that wont wash.
The would/could distinction is a powerful one and it is analogous to how we do risk assessments.
I'm certainly not saying you are wrong in your assessment but I think you are fiddling with details to conjure up a "could" and not a "would". I agree that knowing the schema would assist a hacking attempt but would it make a successful crack more likely - no I don't think so. It is a classic case of obscurity despite security but a rather more complicated one than putting the ssh daemon on port 2222.
Kurt posted this to troll me. Just know my audience here was, mostly, non-technical people involved in politics in my local Chicagoland municipality.
Permit me a PSA about local politics: engaging in national politics is bleak and dispiriting, like being a gnat bouncing off the glass plate window of a skyscraper. Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed --- in my spare time and at practically no expense (drastically unlike national politics).
An amazing thing about local politics, at least in a lot of places, is that they revolve around message boards. The boards won't be in places you want to be (in particular: a lot of them are Facebook Groups) and you just have to suck it up. But if you enjoy participating in a community like HN, you can participate in politics, too, and message-board your way towards making things happen.
> Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed
You live in a country where local governments have the power to make laws… in a lot of other countries they don’t - or, to be more precise, their lawmaking power is extremely limited.
Actually, even in the US, that’s often true too - only local governments with “home rule” can enact laws on any topic (provided it doesn’t contradict state or federal law), those without it can only enact laws on specific topics authorised by the state legislature. Some states grant home rule to all counties and municipalities, others none, others to some but not others (e.g. in Texas a municipality can give itself home rule powers, with approval of its voters, but only once it reaches a population of 5000).
Even state legislators are, by their nature, pretty much locally driven given the relatively small size of their constituencies and thus the margin of victory.
Voters significantly underestimate their power even up to the House level; AOC’s first campaign was very scrappy and resulted in a bartender unseating the chair of the Congressional Democrat Caucus and likely successor to Nancy Pelosi, and that was the first campaign in which anyone bothered to primary him.
Would you care to elaborate which law you helped to pass?
Also, can you link to some good resources for someone who wants to get off the sidelines and get more involved in Chicago politics, whether the resources are on FB or elsewhere? I've previously tried Googling for some but with very limited success.
We're the first municipality in Illinois to draft and adopt an instance of ACLU's CCOPS model legislation, which requires board approval at a recorded public board meeting before any agency (most especially our police force) can adopt any form of surveillance technology, given a broad (ACLU-supplied) definition of "surveillance". Previous to that, our police force could acquire arbitrary surveillance products so long as they kept under a discretionary budget threshold; they used that latitude to acquire a pilot deployment of Flock ALPR cameras, and CCOPS was a response to that.
My real goal is zoning.
In Chicago itself, I have less clarity, but am optimistic that somewhere on Facebook is a message board where the staff at your alderman's office reads posts, and the most politically engaged people in your neighborhood argue with each other. That's your starting point (and maybe your ending point). Just go, listen, and chime in with high-effort comments. If you're used to clearing the bar for HN comments, you're way past the threshold of coding like a super-thoughtful person in local politics.
Rather than the complete elimination of single family (and by extension even larger lots) I feel like it ought to follow something resembling an iterated 80/20 rule out to huge rural lots at the far end. Notice that this would imply a plurality of the land being zoned for the highest density at any given time.
The thing that really kills density in most cases is the height restrictions. A lot of the upzoning in my area has resulted in ugly, wall-to-wall low-single-digit floor count buildings with near zero setback. It's better than single family but it isn't particularly dense and it's a huge step backwards aesthetically.
It's might actually be easier to win the economics battle by chipping away at restrictions on taller buildings. The builders in my area are copy/pasting a 3-flat design all over the place but it requires bargain-basement land prices (literally building on former toxic waste dumps) or money from the township because 3-flats make you have to build wide.
The muni I live in is very constrained (we're just 4 square miles, right on the border of the west side of Chicago) and our land is overwhelmingly SFZ, so most of the ballgame is getting SFZ lots opened up. The emerging consensus is towards "missing middle" housing, which is 2-40 units (but really, a medium term sweet spot in the teens), where you're talking about buildings spanning multiple lots.
That very little can economically be built on existing SFZ lots even with relaxed zoning is actually a feature, not a bug, for getting this done. People want change to be slow. At least to begin with, it's better strategically if it takes a couple years and gradual tweaking to make lots of building happen.
You'd hope that Oak Park, Evanston, Wilmette, and then Berwyn and Schaumburg could get this done, and then your next step would be either Chicago (tough because of aldermanic structure) or statewide, the way California did. Either way: you start in one municipality and work from there.
It helps that zoning matters more in Oak Park (and Evanston) than almost anywhere else in Chicagoland.
There is no way you get Wilmette to change zoning. They've fought with Small Cheval about the size of their sign for like 9 months. I doubt you'd get any village in the NT district to rezone - the Optima project was pulling teeth, everyone is worried about overcrowding NT, which as a single HS is pretty packed now
The whole project is going to take many years. Even if we fix Oak Park zoning in the coming year, it'll still be years before anything significant gets built, and years past that for us to serve as a test case.
Yep. Historically both of these places basically exist to concentrate the interests of the upper middle class and to reinforce segregation. They're both basically Chicago but with a better funded school system (because lawyers and doctors get to funnel all their property taxes into the school down the street from them), which makes them highly desirable.
It's about that it's a small-dedicated group that brings change and not government or private institution. If it's still hard to grasp, then think about how national movements started.
>The boards won't be in places you want to be (in particular: a lot of them are Facebook Groups) and you just have to suck it up. But if you enjoy participating in a community like HN, you can participate in politics, too, and message-board your way towards making things happen.
The way you'd expect: I bumbled through a bunch of different Facebook Groups, starting with the one simply labeled for my neighborhood, and followed cross-posts. Eventually I found the two really important ones in my area (one is an organizing group for local progressives --- I live in a very blue muni, and the other is the main high-signal political group for the area, in which all the village electeds participate).
Is it not absurd that the supreme and appeal courts disagreed on a syntactical matter? Never mind that this isn't uncommon, or that (IMHO) it would be ridiculous to interpret it as "any file layouts at all, and other stuff too, but only bad other stuff". It's crazy to me that were happy for laws to sit on the books being utterly ambiguous.
I know this suits the courts who benefit from the leeway, and that (despite valiant efforts) we're not going to get "formal formal" language into statutes. I know that the law is an ass. I know that the laws are written by fallible and naive humans.
Even after all that, if the basic sentence structure of what's in the law isn't clear to the courts, hasn't the whole system fallen at the first hurdle?
I am not a lawyer, but my understanding is that's just how the justice system works. Reasonable people can disagree about what exactly a complicated statement says, since language is full of ambiguities. People have been discussing what the U.S. Constitution says exactly from the day it was written and there are still a lot of disagreements.
The standard response to this is that laws should be written in ways that are non-ambiguous but that's easier said than done. Not to mention that sometimes the lawmakers can't fully agree themselves so they leave some statements intentionally ambiguous so that they can be interpreted by the courts.
Nobody reasonably expects all laws to be written completely unambiguously. But since laws (and indeed all manner of legal documents) are filled with lists and modifiers, I don't think it's unreasonable to require that they be written to a certain standard which defines how these lists and modifiers should be interpreted, similar to RFC 2119 https://microformats.org/wiki/rfc-2119.
I’ve often thought we’d get more sensible results in court cases on computer-related issues if we had specialised courts where the judges were required to have a relevant degree (computer science, software engineering, computer engineering, information systems, etc). But I doubt it is going to happen any time soon.
> These days, he often looks for some kind of STEM background for the IP desk. It’s not necessary, but it helps. Bill Toth, the IP clerk during Oracle v. Google, didn’t have a STEM background, but he told me that the judge had specifically asked him to take a computer science course in preparation for his clerkship. When I asked Alsup about it, he laughed a little — he had no recollection of “making” Toth take any classes — but he did acknowledge that sometimes he gives clerks a heads up about what kind of cases are coming their way, and what kind of classes might be useful ahead of time.
Note that it's not necessarily the judge that's important as an individual knowing the material, but that the clerks who work for the judge are.
Civil code law uses that way of thinking, where there are specialised courts for different areas: administrative, civil, labor, family, commercial and so on. I actually am not so sure it is great as these courts increase the depths of the bureaucracy to the point of being self serving. They also serve to segment expertise.
> Civil code law uses that way of thinking, where there are specialised courts for different areas: administrative, civil, labor, family, commercial and so on.
This happens in common law countries too. For example, the US has specialised courts (at the federal level) for bankruptcy, federal government contract disputes (US Court of Federal Claims), taxation (US Tax Court), among others. It also has a nationwide appellate court (Federal Circuit) with jurisdiction limited to certain topics (patents, trademarks, federal government contracts, among others), and another (DC Circuit) which despite being technically geographic in practice also has topical jurisdiction (many-but not all-lawsuits against federal agencies). Many states have specialised courts for various areas of law
It is very common in common law countries to have specialised courts/tribunals (or divisions thereof-there isn’t a big difference between a specialist court and a specialist division of a generalist court) to deal with certain types of cases, especially bankruptcy, family law, probate, child welfare, juvenile crime, patents, taxation, administrative law, military law, immigration, small claims - the exact set varies, but specialised courts/tribunals/divisions are very common.
But I’ve never heard of a specialised court/tribunal/division for computer cases
To me it feels like the kind of dispute that is exactly why we have multiple levels of appeals court. The "file format" thing is super dumb, and they got it wrong, but the "that if disclosed" statutory interpretation is a thing that seems important to get a final, consistent determination on.
Of course I can't disagree that it's good that it's now settled. Still I can't help but imagine a world where the meaning, at least in terms of which words apply to which others (rather than qualifiers like "reasonable"), should be settled before the law is debated, voted on, and passed.
Even (some) programmers have learnt the dangers of parsing at run time (e.g. "eval is evil"). How can we decide it's the law we want if we don't know what it means yet?
> How can we decide it's the law we want if we don't know what it means yet?
FWIW, judicial interpretation of legislation is generally seen as an exercise in figuring out what the legislature meant. Courts start by looking at the "plain meaning" of the words used, but where that doesn't yield an unambiguous answer they will often look at the overall scheme or purpose of the legislation to try and figure out which interpretation is most consistent with that.
It's far from perfect of course, but it's not like legislation just consists of a bunch of random symbols that are later imbued with meaning by a court operating in a vacuum. The meaning of most legislation is clear most of the time. I'm sure the authors of the bill thought it was sufficiently clear, for any scenario they could contemplate (or, at least, the ones they cared about). But it's hard to see every potential corner case (and if every potential corner case did have to be identified and settled before the bill could even be debated, it's likely Illinois wouldn't have a FOIA today).
> It's far from perfect of course, but it's not like legislation just consists of a bunch of random symbols that are later imbued with meaning by a court operating in a vacuum.
Isn't this exactly what happened? A court of computer laypeople reached for Merriam-Webster in order to disambiguate a sample of programmer argot that was written into law by another group of computer laypeople. The legal profession isn't just dirty, it seems doomed to defeat itself in even its most rigorous practice.
> Courts start by looking at the "plain meaning" of the words used, but where that doesn't yield an unambiguous answer they will often look at the overall scheme or purpose of the legislation to try and figure out which interpretation is most consistent with that.
There is also the concept of a "canon of construction", which exists specifically to handle these kinds of reoccurring grammatical issues. I'm surprised there isn't one for dangling modifiers.
That's not the only alternative though. Why are experts not involved in the interpretation and it's left up to how two seperate non-technical groups interpret it?
Other countries have legal specialists for different areas and update their laws continuously based on expert opinion, common law gets expert testimony but is based on generalists to make the final determination
I find it slightly odd that you get hung up on the file format thing. The law as you quoted it says "including but not limited to" and the first example given is then "software".
I'm confused why file layout is included in the list of exceptions in the first place. If an adversary knowing your file format is a security problem, then you are doing something very wrong!
And with the ruling that the condition only applies to "other information" (which to me seems like a very strange reading, and probably not the intent of the law), regardless of if a SQL schema is considered a "file layout", creates a massive loophole, where the government can just use some obtuse custom file layout to avoid FOIA requests.
Am I the only one slightly perplexed/worried by the point-blank source code exemption?
It's easy to imagine a scenario where the city decides to develop a specific software in-house and hide the "biases" in the source code, or any other thing one might not find desirable.
Hell, they don't even need to make everything from scratch! Could just patch and use a permissively licensed 3rd-party component.
In my opinion, the proposed amendment does not go far enough.
It is the same problem people trying to open sourcing closed projects experience, there is all sorts of locked-in proprietary code which the developer and the customer only have the license to use but not share the source.
Even projects which from day one are staunchly open and built without direct commercial interests like government contractors need also suffer from this. The Linux kernel challenges for supporting ZFS or binary blob drivers in kernel/user space and so on are well known[1]
Paradoxically on one hand information wants to be free, and economics dictate that open source software will crowd out closed competitors over time, it is also expensive to open source a project and sometimes prohibitively so and that deters many managers and companies open sourcing their older tools etc, even if they would like to do so, involving legal and trying to find even the rights holder for each component can deter most managers.
If a government put requirements in contracts that the vendor should only use open source components in their entire dependency tree, it could drive the costs very high because a lot of those dependencies may not have equivalent open source ones or those lack features of the closed ones so would need budgets to flesh them out. In the short term and no legislature will accept that kind of additional expense, while in long term public will benefit.
---
[1] yes kernel problems are largely a function of GPL, more permissive licenses like Apache 2 /MIT would not have, BSD variants after all had no challenges in supporting ZFS.
However a principled stance on public applications being open source by government would be closer to GPL than MIT in terms of licensing. Otherwise a vendor can just import the actual important parts as binary blobs "vendored" code and have some meaningless scaffolding in the open source component to comply.
Maybe FOIA should trump licensing in this case. Suppose I write a manual on how to issue bad parking tickets and hide them in a database, and then license that (in since restrictive manner) to the state of Illinois. I think the public's right to see that document is more important than my right to prevent copying and dissemination.
That is true for all kinds of IP . The balance between the two is what IP laws do. Give inventors some protections to encourage innovations while keeping the public benefits in mind .
Copyright is time limited author’s death and 70 years for individuals and 95 years for corporations .
While there are arguments to be made for lesser duration , better preservation requirements etc the balancing of public good to private value is the basis of all copyright laws since statute of Anne 1709.
In a court case you can get access to all types of information as part of discovery, if you are harmed or believed to have been, there are other avenues available for you . If you have standing to sue and the discovery requests are made by a competent lawyer you can get access to internal communications to trade secrets to any other document supporting your claim . you or your lawyer can not use such information for economic benefit or disclose it, they are still protected .
Given that you have options legally to get this data , there is no public need that trumps private property rights because of real or potential harm that justifies blanket access by default
PS: note software is not just copyrighted , it is also covered by patents (20 years) and trade secrets (no expiry ). Also while the law provides protection it does not require disclosure on expiry .
In theory the decision to put those biases in the code should be public information. You can ask for the criteria the software was made to, just not the software itself.
Though rulings like this might have a chilling effect.
Only if they are written down. For instance, DOGE makes sure everything is done by voice so there is nothing to catch them out on in future. I've found that once you start hitting a public body with FOIAs regularly they learn to stop putting incriminating things down in writing.
It does seem absurd to think of divulging schema as protected, as described it allows for a magical sort of outcome where: "well it's in a database you can't know anything about, and if you can't tell me how to find it you're sol".
Working at a small company with lots of clients I wouldn't want to hand out DB schema outright, but I also go out of my way to search / get the client the data they want ... not reject them.
A private company wouldn't want to divulge their DB schemas because it's advantageous for competitors to see how you're doing things. That doesn't apply to government databases.
Not quite, and the details get hairier the closer you look. The database in-question here is an IBM system. The database itself is used for government functions, making it FOIA'able, despite it being managed by a third party company. IBM even tried to argue that the schema was trade secret, but the statute isn't straight forward. Here's my (successful) response when they tried:
You mentioned on Thursday over the phone that IBM is not too keen on having its database schema released, and, between IBM and Chicago, is seeking an exemption under 5 ILCS 140/7(1)(g) - an exemption that is only valid if the release of records would cause competitive harm. This email preemptively seeks to address that exemption within the context of this request in the hopes of a speedier release of records. It is FOI's belief that there is little room for the case for the valid use of 5 ILCS 140/7(1)(g) when considering the insignificance of the records in conjunction with the release of past documents:
1. Chicago released CANVAS's technical specification [1] seven years ago. To the extent that the specification's continued publication does not cause competitive harm, it is very unlikely that the release of CANVAS's database schema would cause any harm.
2. The claim that the release of a database schema would cause competitive harm is not unlike suggesting that the release of filing cabinets' labels can cause competitive harm.
Furthermore, in your response, please be mindful that the burden of proving competitive harm rests on the public body [2].
The schema on the last project I worked on was probably our most important IP. Specifically, the ways in which we solved certain circular dependency issues.
I wouldn't take the ability to design a schema for granted. I don't think many people are any good at it. Do not underestimate the value of your work products.
Part of the reason I’m so… enthusiastic… about tech debt is that I’ve worked a few times where we had a competitor whose lunch we were stealing or who was stealing ours and the ability or inability to copy features cheaply was substantially the difference between us.
That quad graph of value versus difficulty that everyone loves? It’s not quadrants it’s a gradient and the difficulty dimension depends quite a bit on context. What’s a 4 difficulty for me might be a 6 for someone else. Accidental versus intrinsic complexity plus similarity to or distinctions from things we have already done.
"Retrieve the data of every parking ticket issued to ‘Bob O’ and also all the rest of the information in the database including everyone’s passwords."
This is the example of SQL Injection written in plain English, yet "everyone's" is problematic here in that it's an orphaned single quote. If "Bob O'Conner" is bad, so is "everyone's"
I understand freedom of information, but what exactly does the public gain by Matt getting the database schema ?
If the answer is "the ability of the request data from a specific table/column", I would say that this should possible to do by asking for the relevant data directly (instead of asking for "the timestamps of each ticket" ask for the "time-related data of each ticket" for example) ?
And yes, having your db schema out in the wild can be a vector of attack, if only because it allows targeting the sql injections (the blog author himself argues this in court).
The court was right to reject this. Maybe the exact word of the law doesn't ask for it, but the spirit certainly does.
Municipalities obstinately refuse reasonable requests because they resent that the Freedom of Information Act allows regular civilians to get all up in their business. The excuses they make for noncompliance (it's burdensome! it violates privacy! sql injection!) are not serious. They don't want to comply because they don't like accountability. That's it.
I assume that - even though there's a strong public interest argument for it - government orgs are prone to blanket banning the release of source code, for the same primary reason that businesses are prone to doing so. That is, too high a chance of sensitive data (passwords, tokens, IP addresses, etc) being hard-coded in all-too-often non-12-factor-aspiring code; and too much security / liability headache if said sensitive data gets out.
There's probably also some actual business logic that government orgs want to and are legally permitted to keep secret. In the OP's case of a parking ticket database, maybe there's software talking to that database, whose source code includes the logic of picking when / where parking inspectors should conduct a "random" blitz of issuing fines.
> maybe there's software talking to that database, whose source code includes the logic of picking when / where parking inspectors should conduct a "random" blitz of issuing fines.
Oh yes, and that "random" blitz of issuing fines definitely doesn't have any racist part to its algorithm. Just trust the government on that one. The government and the "business" what wrote the code in the first place. Yup, makes sense.
Great read. Frustrating that the court ruled that a schema was a file layout, since I don't think it is, but at the same time if it didn't fall under that exception, there is a strong arguments that would be considered "documentation pertaining to all logical ... design of computerized systems". A schema is literally, the logical design of the database, and the database is a part of the computerized system. Once it was ruled that those examples are "per se" exempt it was a long shot to argue that schema wasn't covered by any of the examples.
I completely agree with you that (unlike/despite the Supreme Court ruling), database table/column schema design (and other system designs) should fall under the Illinois statute as "documentation pertaining to all logical and physical design of computerized systems". It's interesting that the law did pick up on that distinction between logical and physical design but none of the parties described in this article did. Logical/physical designs are not just about servers and integrations, they are also about data.
I'm not sure why that wasn't argued by the state and the state argued the database schema was a "file format". Per my reasoning, the state still would have won, but for different reasons.
I disagree with you slightly however and would say that the schema table/column names should be considered not logical but "physical design" while the business naming/meaning of tables would be a "logical design" (or conceptual design). See Wikipedia: https://en.wikipedia.org/wiki/Logical_schema
SQL injection is really about physical schema designs, not logical ones (I do get that every bit of information including business naming of tables/columns helps in an attack, but it does change the degree of threat and thus the balancing tests of the risk which are relevant per the definitions and case law described in the original article.)
So in terms of what the law /SHOULD/ be, the law should not include logical design as a security exception, only physical design. It /SHOULD/ be possible for citizens to do FOIA requests and get a logical understanding of all the database fields without giving them the SQL names that can accelerate SQL injection attacks. In that way citizens could ask for the data by a logical/business-named handle rather than a physical one.
And the state should create logical models or provide data dictionaries with business (not technical terms) on request as part of their FOIAable obligations to their citizens for the data they are maintaining.
My 2 cents as someone designing database schemas for 25+ years.
A schema isn't software in the sense imagined by the ILGA. If it was, every Excel spreadsheet would be too, and Excel spreadsheets are the basic currency of FOIA.
An "operating protocol" is a step-by-step list of things to accomplish some action. It's a finite state machine for humans. Obviously, a schema isn't that; a schema is declarative, and an operating protocol is imperative.
The court definitively established that SQL schemas aren't source code in the sense imagined by the ILGA. SQL queries can be. Schemas are not.
See downthread for why a schema isn't a file format. In fact, a schema is almost the opposite of a file format.
A court will look at the term "documentation" in the ordinary sense of the word; as in, "a prose description and set of instructions".
"Associated with automated data processing operations" isn't an element in the statute; it's a description of all of the elements.
If the Excel spreadsheet has formulas in it, it's software. If you're just talking about the data in the sheet, i.e. what you'd get exporting it as a CSV, then it's not.
Col types, unique/FK/PK constraints, default values, and computed cols define the steps for handling row inserts/updates/deletes. Even adding a uniqueness constraint to an already-unique col will change how the code interacts with it, specifically how it deals with concurrency/locking. If they said it has to be an imperative programming language, then it's not that.
If they said the schema isn't source code then ok, but I still think it is.
I assure you that Excel spreadsheets with formulas in them are FOIA-able in Illinois. Since we can take that as axiomatic, I think we can put "schemas are software" to bed.
That's fascinating, but you just claimed Excel spreadsheets were "software" in the sense of the Illinois FOIA statute definition, and they are not. QED.
You said that SQL schemas aren't software, and that's what this lawsuit was about. If they explicitly say that Excel docs (even w/ formulas) aren't software, I think they're wrong, but that doesn't matter because Excel docs aren't SQL schema.
Now if you want to go by Illinois definitions, SQL schemas are file layouts, that's why the plaintiff lost.
Again: the post explains why the court determined schemas to be file layouts, and none of it involves any of the logic you've supplied here. Even Chicago didn't try to claim that a schema was a "software".
I think a schema will definitely be part of the source listing, either in the main programming language source code or in a some other file used to define or initialize the database. But I don't think it is software, any more than a protocol is software. Software does something.
One tricky aspect of this is that even if the schema itself as a higher level concept doesn't fit into any of those definitions, all existing instances of the schema are likely considered either source listings or documentation. So the instances are barred from release per se, and you can't ask the government to create new documents.
The schema defines how the DBMS sets up its tables and such, so it does quite a bit imo. And if the schema isn't stored in any doc cause just manually punched in CREATE TABLE once, yeah what you said about creating new docs.
But on the other hand, in all database systems the schema is used to determine how the files are laid out. Although I suppose the same thing could be argued for any data that is stored in a file, excepting that a schema is metadata that determines the organisation of data so it's a bit of a special case.
Does your interpretation not mean that(coupled with the court ruling that file formats can't be foia'd) any document with sections cannot be requested via FOIA?
Yea coupled with the courts arguments the interpretation of sections in a document as a "file format" means no files with sections can be released via FOIA requests
Arguably, all requests for files could be returned with all of the letters in the document but scrambled in a random order soas to obfuscate the file layout.
Just that it's a file layout. Or even if you strictly define a file layout as say an ext4, NTFS, or FAT file tree, that revealing the schema is revealing the file layout.
I don't know why they don't want to reveal file layouts, but for whatever reason, they decided it was "per se" exempt regardless of the security implications.
It's obviously not a file format. The same SQL schema can generate N different files, with N different layouts, for N different databases. By the logic you're using ("schema" + "database vendor" = "file format"), a Word document outline is also a file format.
It's interesting that the opening analogy in the post uses an Excel spreadsheet as a great way to explain a database. It's such an easy next step to say the way an xls/ods file is saved is a file format but the column layout in the tabs/tables are the schemas. The court (and the city) playing these games is so scary since it is so biased toward all modern government data being covered by FOIA exemptions.
The schema describes the database layout. The file layout (if you were going to call it that) in a modern RDBMS would describe how the RDBMS implemented a particular database layout as described by the schema.
It literally does not describe a file, and does not literally describe the data layout of anything on disk (though with enough knowledge, you may be able to infer facts about probable layouts).
Schema is an abstraction over the file structure. Different RDBMSes will use different file layouts for a given schema. The same RDBMS may even have different engines that use different file layouts, or may change file layout between major versions.
"Determines" is too weak: it must be "is". If "schema is file layout" is true, then sure, a schema is a file layout. But if it is merely "schema determines file layout", then no, a schema is not a file layout.
Abstractions are notoriously leaky in DBMSes. First off, they don't even use the same SQL spec. Give me a schema that uses anything Postgres-specific, and I can tell you what the bytes on disk look like for a given row or index.
I think it's a moot point anyway because the language is broader than just files in the filesystem sense, which is basically what the court said too.
So you mean the filetree and file contents, as seen by userspace program?
It's meant to be imprecise, because they didn't want some "gotcha." If they say we won't reveal the disk layout, technically you can't tell that from the filetree. If they won't reveal the filetree, but this is SQLite, it's always a single file. If it's file tree + contents, well the CPU byte endianness might matter for some DBMSes, even though you could just try both.
> Each spreadsheet has a header row, labeling the columns, like “price” and “quantity” and “name”. A database schema is simply the names of all the tabs, and each of those header rows.
This is also how I explain it to my relatives, I'm kind of surprised this analogy (one so direct that it's almost literal) didn't fly with the judges.
If database column names cannot be revealed, then shouldn't that mean the state is also able to redact the headers of all their spreadsheets?
Knowing a spreadsheet header doesn't help an attacker gain access to that spreadsheet in any way. Knowing SQL column names may give an attacker an advantage in accessing a database.
Compare: "Knowing the writing style of current employees may give an attacker an advantage while phishing, therefore, we cannot turn over any memos or emails whatsoever."
> Believe it or not, there’s case law on “would” versus “could” with respect to safety. “Could” means you could imagine something happening. But the legal standard for “would” is “clear evidence of harm leaving no reasonable doubt to the judge”. The statute set the bar for me very low and I managed to clear it.
Random thought: someone should drive to Chicago, get a parking ticket, and then make a FOIA request for all of their information contained in that database.
It won't be the whole database schema, but it would be a start.
How were you able to stand as an expert witness when you have a personal relationship with the plaintiff? I don’t know the specifics of the law in Illinois, but my understanding is that that would generally be a disqualifying conflict of interest.
I have this cousin, Vinny, who's a lawyer, and he was able to use his girlfriend as an expert witness. Both sides agreed she really knows her stuff because that's what really matters.
> [Public bodies] shall provide a sufficient description of the structures of all databases under the control of the public body to allow a requester to request the public body to perform specific database queries.
I sure hope the impact of this is not that government entities switch to schema less databases!
"Schemaless" is like "serverless" in that there's always a schema, even if it's not enforced by the database and instead applied dynamically by the application layer.
In the new language proposed in SB0226 (as linked, didnt search for authoritative sources, can't tell how durable that link will be for posterity, arrgh archiving the web is hard etc), doesn't that language leave open a hole for excessive complexity to be a reservoir for FOIA resistance?
Feels like there is an important theme here that SB0226 is dancing around --could government be legible in addition to being "plain-text" transparent?
"plain-text description" of "each field of each database of the public body" and "specific database queries" may not do what you mean.
Not sure how to fix it though.
I could see gratuitous ORMs and database-of-databases patterns winning tax dollars with taunt-them-with-the-schema listed as a feature.
That would be against the separation of powers doctrine inherent in all Western democracies. The job of the legislature is to write the law. The job of the judiciary is to interpret the law.
Besides, when the law is ambiguous, it's very often because the legislature themselves weren't sure what they intended, and/or because the legislature had deeply divided views and arrived at ambiguous wording as a compromise, and/or because the legislature used their "somebody else's problem" prerogative i.e. they said "let's leave that for the courts to decide". Ambiguously worded laws isn't a bug, it's a feature!
I don't see how it could break separation of powers, especially if a legislator could provide minutes and/or a paper trail of discussions and revisions pointing the intent in a certain direction. You know, like evidence. The legislature surely has intent while writing the law, otherwise what would be the point in trying to interpret it, and the whole thing being litigated is the authors intent. I don't think the separation of powers doctrine presupposes that the legislature has no idea what their goals are while writing laws, that would be quite an insane assumption to bake into our system, and broken by design. And in this case, I very much doubt it was left intentionally ambiguous, since FOIA was clearly intended to help people get information from obstinate government agencies. What would even be the point in writing the law if obstinate government agencies are supposed to be able to weasel around the ambiguity behind a comma? Regardless, if we are able to ask the people who spent time drafting it, we could ask. There might even be a paper trail!
They are probably still alive, shouldn't be that hard to find. They have no problem giving subpoenas to other witnesses or soliciting expert testimony.
Yep, that was done in the FOIA request related to this lawsuit:
select utc.column_name as colname, uo.object_name as tablename, utc.data_type as type
from user_objects uo
join user_tab_columns utc on uo.object_name = utc.table_name
where uo.object_type = 'TABLE'
Because they know that eventually the data contained in that table is going to be used to support some sort of lawsuit that their parking enforcement activity is biased, and is targeting people of color.
It's already ridiculous that they spent several years blocking this request while it went through court. If the plaintiffs spoke to pretty much anyone involved in maintaining the system, or with any of their internal infosec people, they would know that there's no real security risk to releasing this information.
They've already spent orders of magnitude more time and money litigating the issue than it would take to just release the information in the first place, so this is clearly not a cost or resourcing issue.
They don't want to release it because they'd prefer it's secret, because secrecy makes it harder for the public to hold them accountable. That's all.
There is an explanation for the fight that doesn't involve something nefarious with CANVAS (though I think CANVAS is dodgy from talking with Matt).
The precedent set here will let data journalists (like Matt) setup effectively automated FOIA workflows on _any_ database they can get the name of for a FOIA request. So even if _this_ db isn't dodgy it enables any of them that are to be found quickly.
Or even less cynically, its just going to cost a ton of resources to respond to all those automated FOIA requests.
I said in another comment but I suspect the column names themselves are incriminating (basically saying this person doesn't get a ticket because they are in a special club, that's probably not technically legal)
Public bodies tend to just want to resist FOIAs for the sake of resisting them. I've never really been able to fully understand the motivations, even after a decade of FOIA litigation.
I think it is likely to ne about budgets. That is, sure, FOIA and similar state laws usually allow the agency to collect something related to actual costs, but that's mostly meaningless since even if actually covers staff time it doesn't retroactively give them staff to cover it in the impacts areas, and often the FOIA volume doesn't effectively feedback into legislative budget processes for future staffing either, while their litigation needs are more likely to feed back into the legal staffing levels, so approving FOIA requests drains working resources in the area covering them in a way that fighting them does not in the immediate term, while fighting them also has the longer term benefit (from an agency perspective) of discouraging future requests.
In my experience (and probably in Matt's) this has 100% not been the issue. The people responsible for the FOIA responses aren't in any way connected to budgeting or resources. It is just a body-wide personality issue. Some aspect of maliciousness mixed with laziness... or something.
> In my experience (and probably in Matt's) this has 100% not been the issue. The people responsible for the FOIA responses aren't in any way connected to budgeting or resources.
In my experience working in government, including on state-equivalent-of-FOIA requests, almost everyone working on those kinds of requests is “involved in” budgeting and resources, and more to the point anyone in a position to sign off on a decision of whether something should or should not be denied as exempt is a manager, for whom (that is, for any manager, down to the line level, over any function in any government agency, but FOIA-type requests, eepecially if there are going to be assertions of exemptions in total or in part, generally involve coordination and signofffs between multiple managers, e.g.,
from the most relevant line unit, the public information unit, and legal) managing budgeted resources and doing the work of justifying requests for additional resources that is the root of the agency-initiated budget change request process, and then participating in drills and internal analyses and responses as those proposals work through the budget process is a central part of their job.
All that pompous sounding legalese can still be ambiguous! I feel less bad for not understanding contracts that have 100 word compound sentences.
Legal people can't keep up with our tech jargon but they have their own jargon including "predicate" lol. So same logical thinking, different jargon framework.
Question: why do they want the schema not the data?
You can't ask public bodies to do research for you. That's the public policy balance in our FOIA laws: you can get almost anything (and: talk to Matt, you really can get a lot of stuff), but you have to be specific about what you're asking for, and it has to be "at hand" for the staff responding to the request.
They send emails to IT. The classic example of a thing you can get through FOIA is large-scale dumps of emails from Exchange Servers, which is also not something a Clerk can do themselves, but which IT staff can immediately retrieve.
Leave the "Clerk" bit of this out and just imagine you're requesting straight from the IT department. What you can do: get anything not otherwise exempt that they know how to retrieve (it usually helps to provide example commands in the requests). What you cannot do: ask them to go look around and see what they have. That's research. Research is your job, not theirs, under Illinois FOIA.
Maybe? If you work there, I guess? Or if you're really nice to them? But they're under no obligation to help you. The tradeoff in Illinois (and most other good FOIA law): you can get almost anything you want --- way more than most people think --- but you can't get public staff to go do research work for you.
Again: this is why pulling schemas is so valuable.
Juxtapose this legal process with DOGE hoovering (in more ways than one) data willy-nilly from everywhere. The dissonance between THIS uninteresting DB schema being so rigorously protected while massive amounts of sensitive data is completely misappropriated is painful.
> The one big limitation of Illinois FOIA (with FOIA laws everywhere, really) is that you can’t use them to compel public bodies to create new records.
Unless for some reason they already had a list of columns without table structure.
I had that thought too, but my naive rebuttal would be that the column data already exists by default in any standard RDBMS as information_schema.columns. No new record creation required.
Yes but what if we come up with a directive that every FOIA request must be logged into a DB. Therefore every request is automatically invalid as it requires we create a record!
Got to see this happen day by day on the Midwest Venture Partners Slack. There was another lawsuit Chappman and Tom did for laser based speed detection in Chicago.
This is part of what discouraged me from going to law school. So much of litigation is Kabuki theater, grant rhetoric not in any way intended at achieving a just or logical outcomes, but designed only to the person in power an excuse to decide however they had already wanted to decide before the case was tried.
> So much of litigation is Kabuki theater, grant rhetoric not in any way intended at achieving a just or logical outcome
Agreed, that is what this sounds like. What stood out to me is the remark »“only marginal value” is just self-important message-board hedging«: it's also simply correct, but the author concluded that they shouldn't have said it because "marginal" plus a bunch of explanation didn't have the rhetorical value that "no" would have had
Someone could legitimately configure a WAF-like system to scan for various ways of querying the database schema coming in as HTTP requests (keywords like "information_schema", encodings thereof, etc.), which will always be hacking attempts and can be blocked. If you already have the schema, you can craft a query without needing to bypass that restriction first. Is this likely to be a serious barrier at all? No. Is it anything to do with self-importance? I don't see how that's the case, either. It seems simply correct that this is marginal (situated in the margins, not the point, not important to discuss), but by saying nothing but the truth, now the other side blows that up to something much bigger and tries to get the court to agree that, "see, their own expert says it has value!" And so this expert concludes that they shouldn't have said it, that they should have just said "no value" which I would say is wrong, but so marginally wrong that it's hard to prove for the opposing side that it is not fully correct, and thus being less correct helps you in (this) court... so it's about rhetoric as much as being an expert...
What stands out to me about this article is the time between court appearances. Seems like if you want to accomplish anything in court you need to be prepared to spend years of your life on it.
Can confirm this is the case everywhere. Even before taking anything to trial, one can spend months on trying to come up with a mutually agreeable solution, in my case getting seemingly one step further each time¹. I'm not sure I'd not just give up and move on with my life if this dragged on for years and wasn't about something that majorly impacts my life or that of a loved one
¹ Details: it was a warranty case, so first they agreed to repair it, then they didn't do that (but maintained that they were going to, whenever I asked about the status), then they agreed to refund, then they didn't do that, then I set a deadline, they iirc agreed, then they didn't pay, then I included specifics of what my next steps would be (lots of research here, seeing what even my options are and what I can truthfully claim that won't get shot down by a judge later) if they didn't pay before some other deadline (so I showed I was serious now), then the deadline crept up and they finally refunded the day before it would expire and I was frankly disappointed because, by now, I was prepared and ready, and all I got was the original sum that I had paid them. I checked the legal interest rate and changing my demand to include that simply wasn't worth wasting more time on this, and I didn't find any sort of precedent that I could bill any time I provably spent, not even to the value of minimum wage, so any time you invest is just lost free time (which I didn't have much of during that particular year). Protip: scroll down the reviews before buying something worth more than a few tenners from a small store. I wasn't the first person who had to threaten litigation...
And of course, people and entities (private or as in this case public) who have a lot of resources take advantage of that, a state of affairs which often serves to perpetuate injustice indefinitely.
Do stored procedures count as part of the schema? I've recently found a SQL injection vulnerability in a client's SP that was using concat (very badly)
Wowzers, that was a lot of words to express something that's very simple.
A database schema is just an empty form. By looking at an empty form, you know what fields have be filled in, what type of information they'll contain, etc.
Of course people making data requests need to know what forms are being used to collect and store information.
As for security - not letting people do anything because 'it might be dangerous' is bonkers. The way to secure databases has been known for decades. Let's start living in the 21st century :)
The whole back half of the post is about why the analysis is not as simple as you suppose it is. We had no trouble establishing at Chancery Court that schemas don't endanger security. That's not why the case failed at the Illinois Supreme Court. The IL Supremes did not decide spontaneously that schemas actually are dangerous.
I got to about 1/3rd of the way before I noticed my eyes were kinda struggling to read the article. Toggling different CSS rules, it's the #333 gray color. Turning that off is instantly better. The custom font is much thinner than the default, but that by itself doesn't seem to be the issue if the color is (closer to) black. (There is also a font-weight rule, but toggling it makes no visual difference in Firefox. Maybe the text is intended to look different?)
Since there is no contact method on the website, figured I'd mention it in a comment; hope this helps
I think you have an unrealistically high bar for who is suitable to be an expert witness. People who are not even remotely experts are often trotted up as "expert witnesses". OP is very easily an expert in his field; the only issue is that his communication style is not quite tuned properly for legal matters. Which shouldn't be surprising; that's the case for pretty much anyone who isn't in the legal profession, doing this sort of thing day in and day out.
And I think this is the correct state of affairs. The kind of person who does have their communication style tuned for legal matters probably engages in so much legal work that they aren't doing enough work in their field to truly be considered an "expert".
If you say "Even I don’t know what I meant by that" ... that's not really communication "tuning" now is it?
I don't expect someone -- even an expert -- to have perfect phrasing. But if they can't even tell you what they meant to say? How is that unrealistic expectations?
The only problem seemed to be that he was unable to rule anything out, no matter how unlikely, because he is honest and an expert. He lacked the dishonesty and false confidence that we demand from an expert witness within an adversarial justice system.
No he didn't. The grandparent comment here was just a snarky put-down. No part of my testimony was impacted by a casual write-up I did about it 4 years after the fact.
Spreadsheets are poorly structured. Different entries in the same column can have different data types. There is no concept of a superkey, so duplicates are allowed. There is a concept of ordering by row/column number which does not necessarily exist in a DBMS. Querying facilities are generally poor.
Now you can kinda fix this by restricing the type of column, etc. but most people don't bother
They are good at what they do - quick manipulation of relatively small datasets. WYSIWYG printouts with decent formatting and charts. But they are only a "database" in the same way that say, a bunch of random data is.
For a quick 30-second explanation of what a "databases" and "schemas" even are in the first place for non-technical people, it's more than "good enough", and spreadsheets are the most common example that people are generally already familiar with. Unique keys, typing, etc. really isn't relevant here, especially not in the context of what the court case is about. The important bit to get across is that it's a 2D table with rows and columns, and that's all there is to it (that is: it doesn't include the source code to query it).
Excel sheets are databases. That's their purpose. They store rows/cols like an RDBMS. They allow joins and constraints, including uniqueness. There are even backends that use a spreadsheet as a DB. What else do you want?
This was fine, legally, but I'd be pretty irritated if someone I knew wasted everyone's time on this. The schema clearly is (marginally) useful for hacking, but who cares; it clearly is a file layout also, but who cares; those matter legally but not morally. Morally, this is just dumb: it's not something they really needed, and they're just irritating people and wasting resources for the fun of it. Shameful.
No. I'm involved in local government, and on the citizens commission where we keep track of our our municipality (adjacent to Chicago) stores and manages information. I'm acutely familiar with how people are spending their time in these organizations, and what is and isn't a big lift for them.
Increasingly, year over year, more and more information that would previously have been stored in filing cabinets or shared drives is moving into turnkey applications that municipalities buy and enroll all their data in. Those applications are opaque. But almost all of them are front-ends to SQL databases.
Being able to recover schemas from publicly operated databases is vital to keeping public records and data public, rather than de-facto hidden from inquiry.
Matt's suit was anything but a waste of people's time. Hopefully, it'll result in a change to our state law.
Just because the article gets into fine details doesn't mean it's silly. They're working with what they have.
But after reading more, I agree. The point of FOIA in the first place was "access by all persons to public records promotes the transparency and accountability of public bodies at all levels of government." Not "pushing FOIA statutes to their limits, sniffing out buried data and bulk-extracting it with clever requests."
If he's just asking for his own parking ticket records, ok. This isn't in the spirit of that. Separately, I agree that the SQL schema is software, a type of file layout, marginal attacker benefit, and other things in that exemption, and I'd say that again as an expert witness.
FOIA requester responded in comments saying they received a tip indicating illegal practices, and noted in his article that he had previously uncovered evidence of over-policing in black neighborhoods.
I think a file layout describes the exact arrangement of bytes in a file. A schema is higher level. It describes what is stored, not how it is stored. A database could be one file, or a file per table, or a file per column. Data could be stored across multiple drives.
Hi everyone, I'm the plaintiff in this lawsuit. I'm still working on my companion post for tptacek's post! I'll have it ready Soon TM, but feel free to me any questions in the meantime here.
While you're waiting, check out this older post: https://mchap.io/that-time-the-city-of-seattle-accidentally-...
Matt, you do the Lord's work.
Bear in mind that Matt technically lost this, even with the backing of some of the absolute best civil rights lawyers in the country, Loevy and Loevy, fighting on his behalf. This shows you the absurd difficulty in fighting city hall, especially if you're crazy enough to do it without representation.
The one thing working in our favor is what is proposed in TFA: change the law. Once the state Supreme Court has ruled you're hosed unless you can get an amendment. Illinois has a very strong history of amending its FOIA statute, although a proportion of those changes are to further protect information from disclosure, not always on the side of sunshine.
Another change that needs to happen is strong punishment for bodies who lose these fights. In Illinois this is limited to a "$5000 civil penalty" against the body. What is a civil penalty? It's vaguely defined. They used to throw the money to the plaintiff, but in the later cases I fought they simply awarded the money to the county. As one State's Attorney said to me "I don't care if I lose every case, I just write a check out to myself."
(one final note: be careful what you wish for when you litigate, you can end up with an appellate decision like this that solidifying in law the exact thing you were fighting. It's nobody's fault, but it happens. I ended up with one absurd decision that removed prisoners' rights rather than enhanced them.)
A losing public body is also generally on the hook for attorney's fees, which can be considerable. But the general problem here is that the public bodies are all spending someone else's money, so the real deterrent you have is how much of their time you can credibly threaten to eat up with legal actions.
That's true, as long as you are represented. I knew one lawyer in Illinois who would sit in FOIA court and take all the non-represented persons aside and offer to take their cases and split the attorney fees 50/50. I believe it isn't strictly above-board, but it is a solution to a problem.
People don't like being put under oath, so you can somewhat temper a public body's future refusals by deposing them or sticking as many of them on the stand. Especially with depositions, if you aren't represented then you can't be giving any attorney discipline for asking completely outrageous questions to force the deponent to admit crimes etc under oath.
I went up against my muni over their refusal to release their police General Orders (which seems real dumb in retrospect; we got the General Orders from most of Chicagoland with no protest†). I reached out to Matt Topic, who offered to sue for free, or send a nastygram for a billable hour.
I ended up doing the latter, because I gotta work in this town, but one consequence of fee recovery is that it's much easier to get representation for a FOIA suit.
† https://github.com/jjarmoc/chicago-area-general-orders/
so the attorney gets half of what the attorney gets? Zeno's Paradox.
I don't understand the argument that knowing the column names doesn't help an attacker? Especially in a database that doesn't allow wildcards, doesn't it make things much easier if you know you can do '); SELECT col FROM logins, as opposed to having to guess the column name?
And I don't think I disagree with the court on schema vs. file layouts either. It's not the file layout, but it's analogous: it tells you how the "files" (records) are laid out on the "file system" (database tables). For example, denormalization is very analogous to inlining of data in a file record. The notion that filesystems are effectively databases itself is a well known one too. How do you argue they aren't analogous?
You can always `SELECT table_name, column_name, data_type FROM information_schema.columns`, which is part of the SQL standard. https://www.postgresql.org/docs/current/infoschema-columns.h...
Plus, generally if you have SQL injection, you have multiple tries. You're not going to be locked out after one shot. And there's only so many combinations of `SELECT {id,userid,user_id,uid} FROM {user,users,login,logins,customer,customer}` before you find something useful.
You can't always do this, because you don't always have a way to read the results back. Credit where it's due for pointing it out: https://news.ycombinator.com/item?id=43180954
Guessing table names is significantly harder. Maybe they get some tables like that, maybe they don't have time to guess my table called "amt.user_ticket"
> You can always `SELECT table_name, column_name, data_type FROM information_schema.columns`, which is part of the SQL standard. https://www.postgresql.org/docs/current/infoschema-columns.h.
You can "always" do that? Well I just did that. My database said: no such table: information_schema.columns
And what if my database had disabled this capability entirely?
Also, is there anything implying SQL here at all? Can't other databases with injection "capability" have schemas?
> Plus, generally if you have SQL injection, you have multiple tries. You're not going to be locked out after one shot.
No, you can't say it with such certainty at all. It really depends on what else you're triggering in the process of that SQL injection. You could easily be triggering something (like a password reset, a payment transaction...) where you're severely limited in your attempts.
> And there's only so many combinations of `SELECT {id,userid,user_id,uid} FROM {user,users,login,logins,customer,customer}` before you find something useful.
account, accounts, password, passwords, profile, profiles, credential, credentials, auth, auths, authentication, authentications, authentication_info, authentication_infos, authorization, authorizations, passwd, passwds, user_info, user_infos, login_info, login_infos, account_info, account_infos... should I keep going?
And these are just the logins/passwords; what if the information of interest was something else, like parking tickets?
Your reasoning and motivation is reductio ad absurdum. It does not make sense to base your system security on hiding from the public that your 'Users' table is called 'Users'. If you are vulnerable to this attack, the guilt rests on your deplorable application code, not whether or not your schema table names are known. If we should follow your logic, we would have to name our Users table U_ZER_CLEVER_S because naming it something people could guess would be a vulnerability.
There is one further problem with this entire sub-discussion: There are two mitigation strategies discussed:
- A: guaranteed SQL-injection-proof (SQL injection impossible.) - B: Having non-obvious table-names and 'secure-defaults' (e.g. INFORMATIONSCHEMA disabled).
So, the original commenter says, he wants to _hide the schema_, so that B can protect him in case of A. Well, failure of A is Amateur Hour. If you fail on A, I highly doubt you would have delivered correctly on B. To write it out in plain text: If you have set up and manage an application with SQL injection errors, I have a hard time seeing you still taking care to disable /enable obscure security defaults, or take care to avoid obvious and trivial table names.
Just to put icing on the cake: As soon as you have an SQL injection attack, a simple select * from randomTable or DESC randomTable would give you the table COLUMNS, so it utterly makes no sense to want to hide those column names - you have already lost them! (in the case you are arguing you need their protection in). ..Unless you argue that the guy making sql injection applications ALSO has set up a secure default to disallow select *..
In my experience, SQL injection is evidence of work of the sloppiest and immature nature; it was bad in 2003, and presumably still is.
> You can "always" do that? Well I just did that. My database said: no such table: information_schema.columns
Don't expect attackers to give up after one try. It depends on the database software, not everyone implements this exact ANSI standard for reflection but every database supports reflection. That's why the first step after finding a SQLi is to fingerprint the database software and go from there.
> And what if my database had disabled this capability entirely?
You can't disable it, lots of software, database features, ORMs and clients rely on reflection. If a client can query a table they also can retrieve metadata about that table.
You can definitely disable it, in a variety of ways, for whatever role, user, etc. you wish to.
Absolutely, we have very strict lockdowns on the tables and views available to the users that our application uses. The permissions system in Postgres (for example) are very extensive. We even deny delete and update permissions for most tables so they become append only.
Nevermind you are right its possible, but I still think it breaks so much stuff that at least I've never seen anybody doing it or recommending it. All kinds of ORMs and migration tools would break for example. But I guess it would be a defense-in-depth strategy.
Yeah those tools may break if such a change is introduced suddenly, without testing etc. But that's not how normal reality for most companies look like, such rules are there for 2 decades at least. DBs are very old tech without much change in past 20 years and this is DB security 101.
Not even going into reasonability of ORMs, most of the stuff I've seen or implemented added practically 0 added value, and added hard-to-debug issues down the line as software evolved. Cargo culting at its best, often done on trivial schemas that could handle either direct SQL or some sql-query-to-object mapping easily.
Ah so what you're saying is that we ought to rename our logins table to "duckwords" because nobody will ever guess that? Also we should probably store passwords in plaintext but name the column "entercod3" because nobody will think of that. Oh and we should use printf with %s to build our queries right?
That's a good point, has anyone hardened a database by locking out users who select columns that don't exist? Or run other dubious queries? This would obviously interrupt production but if someone is running queries on your db it's probably worth it?
I once did an security assessment for a product such as what you describe. Among other problems with it, the product itself had SQL injection vulnerabilities
For another example of what defenders are up against, see https://users.ece.cmu.edu/~adrian/731-sp04/readings/Ptacek-N.... This paper all but caused an upheaval in the WAF industry.
If you are mature enough to do that, you're mature enough to net SQL injections in the first place. There shouldn't be that many handwritten queries to review in the first place as most mundane DB access is usually through a framework that handles injection properly...
I disagree, if all it took was maturity then we wouldn't see giant data breaches of the largest companies in the world weekly.
Zane Lackey (with Dan Kaminsky) gave a talk that discussed doing literally that sort of things, back in 2013. Zane went on to found Signal Sciences (acquired by Fastly), doing this sort of stuff in the 'WAF' space.
https://youtu.be/jQblKuMuS0Y?t=866 (timestamp is when Zane starts talking about it)
I guess the main difference is that a WAF attempts to spot things like injection (unbalanced delimiters, SQL keywords in HTTP payloads where SQL shouldn't exist, etc.) typically without knowledge of the schema, whereas GP is talking about the DBMS spotting queries where queries must exist but disagree with the schema. Might as well do both, I suppose.
That’s not what the talk is about - it’s using dbms query error logs to spot attackers. Stuff like “table doesn’t exist” or “invalid syntax” on your production database can be extremely high signal indications that something is wrong, potentially maliciously so.
In the very early 2000’s I worked at a company building something along those lines. We could analyze SQL and SMB traffic on the fly and spot anomalous access to tables/columns/files, etc. Dynamic firewalling would have been the next progression if the company didn’t have other issues.
WAFs help with this, but at the HTTP level. By putting “information_schema”, “sys.tables” in the filters.
Not the real solution, IMO, but WAFs are useful for more than SQLi, and is the kind of tech you can ask money for.
On the surface that’s a very attractive idea.
A sort of “you shouldn’t be in here, even if we left the door unlocked.”
So if you deploy code before you run the associated db migration, or misspell a column name, you magnify the impact from whichever code paths (& application tier nodes) are running the broken SQL, to your entire production environment.
Simple variation to a hard shutoff: immediately page "significant risk a successful sql exploit was found", and then slow down attackers:
If an SQL query requests an unknown table, log the error, but have that query time out instead of responding with an error. Or, even better, the offending query appears to succeed, but returns fake table data, turning it into a honeypot built-in to the DB. This could be done at the application layer, or in the DB.
The goal is to buy an hour for defenders to determine how to respond, or if its a red herring. There are a variety of ways of doing this without significant user impact.
Yeah it's definitely something that could do more harm than good to a company long term. But I'm sure there are instances where this tradeoff is worth it. They would invest more heavily in runbooks or maybe even ci that runs migrations on deploy. Deleting columns would need to be done on your deploy + 1. Probably no rollback at all.
A good DBA would restrict the account so that it can't access the information schema. It's easy to imagine an environment with a vigilant DBA and less vigilant web developers.
This makes sense, but the the vast majority of tooling including ORMs, autocomplete SQL IDEs, and even suspect application code relies on table descriptions and listings provided by the information schema
My ide logging into my local dev copy of the DB and my public facing prod application should not be using the same SQL login.
That is why we have development and production environments. The production environment is expected to operate in a potentially hostile space and does not need developer conveniences beyond the ability to generate alerts and produce logs, which should be stored in a safe way, everything else should be locked down as much as possible.
Being able to inject doesnt mean you get the output of a select. The inject can be on non-select statements.
If you have an injection friendly application then that is the security problem.
Say someone hacks the db, is the problem easy to guess table names? The column should never have be called "passwords"?
Perhaps 30 years ago that would sound good.
Obscurity should hardly ever be a line of defense. If it is the only defense the problem isn't that it wasn't obscure enough.
Edit:
I'll do you one better. If you so much as suggest that obscurity is good security you actually openly invite people to fool around with your applications. The odds holes are to be found are much better than elsewhere.
What do you do when you know you've got a pile of poorly written insecure software and no money to improve it?
I probably delete everything and pretend it never happened. It depends ofc on the worse case scenario. What can i do/afford to deal with the greatest risk? I might use it on a machine without internet.
> I don't understand the argument that knowing the column names doesn't help an attacker?
So Kevin Mitnick supposedly did most of his hacking using "social engineering". He'd call up some person, pretend to be in some other department within their organization, and ask them for some specific bit of information he needed to further his attack (or ask them to change some specific thing that would allow him to further his attack).
Would knowing the structure of Illinois governmental organizations help someone perform social engineering attacks against them? Yes, absolutely.
Should Illinois therefore keep the internal structures of their organizations -- the department names and the officials who run them -- secret? No, absolutely not.
First of all, if an attacker doesn't know them, they'll just use other social engineering attacks to figure them out; i.e., hiding the structure doesn't stop social engineering attacks, it just slows them down. Secondly, the value to the public of being able to navigate governmental structures far outweighs the cost of potential attacks.
This seems to me to be a direct analog: The "organizational structure" is the "database schema", and the "willingness to help a random person on the phone who seems to know what they're talking about" is the "SQL injection vulnerability". If an attacker knows the schema, their job is faster; but if they don't know the schema, they'll just use attacks to figure out the schema; so keeping it private doesn't stop an attack, only slow it down. And the benefit to the public of being able to issue FOIA requests far outweighs the cost of potential attacks.
The Department of Justice disagrees and voluntarily releases column and table names: https://www.justice.gov/afp/media/1186431/dl?inline=
> And I don't think I disagree with the court on schema vs. file layouts either.
I disagree that the law should prohibit disclosing "file layouts" but it's pretty clear that the law does block that, and I fundamentally agree with you that schemas are directly analogous to file layouts and thus restricted.
A SQL schema literally does not indicate the locations of data inside of a file. In fact, the whole reason schemas exist is to decouple the relationships between table rows and the pages and indexes that store that data. We had relational databases before SQL, and there are non-SQL relational (and non-relational) databases today, but you program them, at the query level, with code that is aware of what tables live where.
A schema is the opposite of a file layout. A schema is to a file layout what a Google search is to an IP address.
Let me put this differently.
If you tell me that you have a closet for your jackets and another closet for your shirts, you're telling me how clothes are laid out in your wardrobe. Specifically, you're telling me that you're laying those out separately, and able to deal with them independently, with little interference between the two. It's not the entirety of the layout information, but it sure is some of it.
If you tell me that you have a column for your first names and another column for your last names, you're telling me how names are laid out in your database('s files). Specifically, you're telling me that you're laying those out separately, and able to deal with them independently, with little interference between the two. It's not the entirety of the layout information, but it sure is some of it.
Sure -- in theory, you could be actually throwing everything together into a dumpster, then paying enough people to search it all in parallel when you want to retrieve that red jacket. If you're actually doing that, maybe you could legitimately claim that you haven't divulged anything about your closet's layout by telling me that shirts and jackets are separate. But chances are pretty darn good you're not actually doing that (and I would know this for a fact if I already somehow knew you were actually using closets built by Joe down the street), and thus actually are exposing layout information by telling me that you're storing them separately. One security implication of which is that, the moment that I get a glimpse of your closet and notice that it contains a shirt, I know it's not the one with the jackets, and I can skip it when trying to steal that expensive red jacket.
It's either a file layout or it is not a file layout. If you write an affidavit saying it's "sort of like a file layout", the conclusion will be that it is not one. Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts), and then they pulled up a definition of "schema" from Mirriam-Webster, and the definition of "schema" was so abstract it could have matched anything.
If anybody on the Illinois Supreme Court had known what a schema actually was, we'd have won the case. Further, if the definition of "file layout" had been more material to the Chancery case, it would have been in the trial record that it wasn't one.
> Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts)
"Wrongly" was exactly what I just spent an hour writing a long comment disputing, with a detailed explanation. Specifically, with a real-world analogy between “a description of the arrangement of the data in a file” and “a description of the arrangement of the clothes in your closet.”
If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk. Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
Now if you wanted to argue that a schema serves the same purpose as a file layout, ie that it's how a programmer interfaces with the data, and that it impacts workload performance, that would be fair enough. And given that laws are all about intent perhaps that would be relevant. (Or perhaps not. I didn't read about the case yet.)
But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
> If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk.
That's one thing I'm saying would be sufficient to consider this file layout, yes. I'm not saying it's necessary. Databases can obviously be row-oriented too. Knowing that they don't cluster would also be layout information. As could any number of other things.
> Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
It doesn't have to include offsets or any of those other things. File layout information could be as simple as "data should be aligned to a page boundary for performance" or "this field must reserve space for up to 16 characters" or even "data from different records should not be stored in an overlapping manner, to allow fast erasure"... I could go on. And notice the wardrobe layout example doesn't have offsets either, but the decision to separate jackets from shirts is absolutely one about layout nonetheless.
> But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
It is not complete file layout information. But it certainly can be part of the file layout information.
Imagine you had a table with columns name1 VARCHAR(64) and name2 VARCHAR(64) in that order. Now imagine you modified a couple of bytes on the disk, such that you swap the 1 and the 2. You can imagine a database where that would be sufficient to confuse it into thinking the two columns had swapped contents, right? Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
Note that "some information related to the file layout" or "some information that has an impact on the file layout" is not "the file layout" in a literal sense. Thus it seems to me to follow that the answer to the question "is this a file layout" should be no.
Symbolically it isn't [ schema -> file layout ] it's [ schema, engine version -> file layout ]. Even if you had that additional information, neither item by itself nor even the pair together would be correctly considered a file layout. If I have a function f( foo, bar ) -> baz neither a foo nor a bar is a baz. I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.
For that matter, even the [ schema -> file layout ] case isn't technically a file layout any more than a json blob is an xml blob. Being trivially translatable doesn't change the definition.
Compare that with the question (also commonly asked by courts) "is thing equivalent in intent (or use, or ...) to other thing" in which case the answer might feasibly be yes.
> Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
In that example you have made an educated guess about the file layout and then taken advantage of that (guessed) information. "You can imagine a database" tells you everything you need to know here, namely that this is entirely dependent on the implementation. So yes, I would claim that the schema did not on its own contain any file layout information though in conjunction with knowledge of the implementation it could be used to derive such.
> I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.
What is "sandwich" in this analogy? Nobody is claiming the schema is a "database", or a "table". I was saying it's one component of the file layout.
Using your own analogy: if you know you put the jam near the peanut butter, you know part of the ingredient layout. You can't say "it's not ingredient layout if you haven't told me where the bread is."
I dont think "file layout" has to mean the exact location of every byte. An abstract file layout is still a file layout.
How can you literally interpret the two words "file layout" without it pertaining to the layout of a file?
We can successfully interpret the two words “guinea pig” without it pertaining to either pigs or things coming from Guinea, so I’m sure this is also possible.
DBs can be files on disk though? Besides they're a bit like easy hand rolling powder mix for filesystems. Filesystem entries has properties like filenames and inode numbers and file contents. Databases has columns like emails and membership IDs and their favorite cookies. I don't think "file layout" is an absurd framing.
It is in literally no sense a layout; the whole point of a schema is that it doesn't tie you down to a layout. SQL schemas make sense even in the absence of files!
You suggest that we interpret "file formats" as exactly this -- no more, no less. This approach is also called "textualism". The other option is to interpret "file formats" in the context of the law that includes these words. Or: what exactly did the lawmakers have in mind when they said that (a) government needs to provide information; (b) except for several cases, of which one is (c) "file formats". What kind of information did they think it was ok for the government not to provide?
I agree with the Court's argument that "the information about how the actual information is stored and connected one piece to another" is what the lawmakers meant in this case.
- If the actual information is stored in the files, the government does not need to disclose how these files are organized ("file formats").
- If the actual information is stored in the database, the government does not need to disclose how the database is organized (database schema).
- If the actual information is stored in the block memory -- with structs and pointers -- the government does not need to disclose the structs and the pointers.
The "textualist" opponent would of course argue, as OP did, that the second and the third example aren't excepted by clause (c) because "when there is no file, there could be no file format". This however is missing the point (in my opinion), as it doesn't see the forest for the trees.
>> And I don't think I disagree with the court on schema vs. file layouts either.
> I disagree that the law should prohibit disclosing "file layouts"
Note, the court wasn't ruling what the law should say, only what the law says. At least that's my understanding of it. I certainly wasn't opining on what the law should say.
Understood. I mention that distinction only because I find many people (not you) who say that "X law doesn't apply because if it did, it would be bad" vs directing your ire at the actual laws, which are poorly written and the legislators who are negligent in fixing those laws.
Courts should decide based on the law, not based on what is "good".
It seems like an unnecessarily ambiguous term.
Without additional context, I would interpret the term “file layout” to mean the file and directory structure of an application.
Such an application could potentially store data as plain files, the names of those files may contain personal or sensitive information.
> It seems like an unnecessarily ambiguous term.
Agree, and, I don't even understand why it's in there in the first place (it should just not be) but that's a job for the legislature to resolve, not the courts.
> Without additional context, I would interpret the term “file layout” to mean the file and directory structure of an application.
I would interpret it to mean a description of what the file contains and where. This is information you need if you have a mysterious file and you want to parse it. It's also information you need if you have some data and you want to create a readable file that expresses it. But for the concept to apply to a database schema, (a) the database would have to be a file, and (b) the schema would have to specify where the information in the database is stored. That's difficult to do, since the schema has no knowledge of how much information there is in the database or how it might be written down.
And this part seems self-defeating:
> Attackers like me use SQL injection attacks to recover SQL schemas. The schema is the product of an attack, not one of its predicates”.
If it's the product of an attack, but not the end goal, surely it's of value to the attacker?
It seems clear to me that the statute does, as worded, in principle allow the city not to disclose the database schema - it would compromise the security of the system, or at the very least, it would for some systems, so each request needs to be litigated individually.
The proposed amendment sounds like a good way to fix this - is it likely that will pass?
Lots of things are "of value". That's not the bar the statute sets. To the extent something isn't per se exempted by the statute (as the outcome of the case established schemas are), the burden is on the public body to demonstrate that disclosure Would jeopardize the security of the system.
It still seems like a massively gray area: despite the distinction between "would jeopardize" and "could jeopardize" as explained by TFA, the definition of "jeopardize" includes "danger" which means "could lead to harm" not "would lead to harm" at which point it hardly matters whether a thing "could endanger" or "would endanger" the security of the system.
"Would" versus "could" has nothing to do with why your analysis doesn't hold. If something doesn't enable people to attack a system, but is merely one of the valuable things you could get from that system, it does not jeopardize that system under Illinois law. The standard of proof for the jeopardy doesn't enter into it, because no claim of jeopardy has been made.
Again: this part of the case is settled. We didn't lose at the State Supreme Court because the court was worried there was jeopardy, but because they re-read the statute as per se exempting schemas as "file layouts".
How is it that this wording stuff isn't already decided globally? I mean, the concept of dangling modifier has existing for centuries, do the courts really decide this kind of thing on a case-by-case basis by random dice roll?
Whereas math, science, and engineering use language as a vehicle for attaining truth, the legal profession too often regards it as truth.
The greatest legal scholars of the state of Illinois believe there is more decorum in querying Merriam-Webster than there is in reading tea leaves or consulting a Ouija board, but they are wrong. All too often, jurists make decisions based on unconscious accidents of wording by their predecessors, then compound it with their own fallible powers of interpretation and deduction, further cementing their wrongness as "precedent." Instead of addressing this core ambiguity of the FOIA exemption, or attempting to appeal this nonsense interpretation of an undefined term, or introduce better linguistic standards to the legal profession at large, the path of least resistance for victims of litigious violence is to add more complexity in the form of endless amendments. This is what Matt and friends must now pin their hopes on.
Little wonder how one can spend a lifetime specializing in the (martial) art of litigation.
> this part of the case is settled.
Maybe for this case, but it sounds like enough hinges on the details of the system that in another database, a court could uphold that there "would" be jeopardy instead of there "could" be. So you won on the more fragile part of the ruling.
On the other hand, interpreting the law as exempting database schemas is something that can be applied to any computer system, and it presumably sets a binding precedent (I'm not familiar with Illinois jurisprudence, but that's how I'd expect something called the State Supreme Court to work) so losing on that point is worse for future cases.
Losing on what point? Everybody agrees it is bad schemas are per se exempt from FOIA. On the security concerns of releasing schemas, we won in basically every court.
> If something doesn't enable people to attack a system, but is merely one of the valuable things you could get from that system, it does not jeopardize that system under Illinois law.
The problem I have with this is that the schema isn't something an attacker recovers for its own sake. It's something the attacker recovers in order to further their attack. This necessarily means that it does enable people to attack the system. That's the only value an attacker sees in it.
> Again: this part of the case is settled. We didn't lose at the State Supreme Court because the court was worried there was jeopardy
Doesn't matter to the discussion; the court, Supreme or trial, can be wrong as easily as it can be right.
I don't understand your argument. If I have a SQLI, I can, as you acknowledge, fetch the schema. So what does it matter if the schema is published a priori? All that matters is whether I have SQLI.
No, as other comments in the thread have pointed out, you can easily have an SQLI that doesn't send information back to you. You may find value in changing what's in the database even if you can't read from it.
If you do have the ability to retrieve information, then one of the first things you'll do is retrieve the schema.
And the reason you'll retrieve the schema, if you can, is that it facilitates the attacks you actually want to make. It has no value to you other than enabling your attacks. This observation seems sufficient to answer the question "does knowing the schema enable attacks?".
There is a whole sub-field of software security dedicated to retrieving information from SQL injections that don't directly return results. This is not a plausible objection.
> If it's the product of an attack, but not the end goal, surely it's of value to the attacker?
Well sure, but it doesn't help them attack. That's like arguing that since the bank robber wants dollar bills, dollar bills must be a useful tool for breaking into bank vaults.
If both sides agreed to the analogy of giving the bank robber the blueprints to the vault, I think any lay judge would agree that endangers the bank's security.
I'd say it's more like knowing the layout of the drawers inside the cage. If a robber is inside the cage, they've already won. And if an auditor is checking the bank has what it says it does, they've got legitimate grounds to ask which money is in which drawer, and "no, it's a security risk" is not a good answer.
>It's not the file layout, but it's analogous...How do you argue they aren't analogous?
laws don't get to be analogous
foia request: "I'd like the report the committee prepared about the costs for the new bridge"
response: "denied. the report contains costs laid out in tables with headings, which while not being schemas are analogous, with schemas not being files but being analogous"
I agree with you. Knowing the exact column names can speed up an attack and, in some cases, make it more feasible.
Why don’t they just request disclosure of what’s actually stored and allow renaming of the columns? It seems odd that knowing the exact column names would be necessary if the goal is simply to understand what data is being stored and its intended purpose.
I wonder if that would be considered a "new report", which they don't have to provide.
They can either have their cake or eat it. If they don't want to obfuscate the column names, they have to provide the data with the original ones.
> Knowing the exact column names can speed up an attack and, in some cases, make it more feasible.
If I'm looking at a database, I like knowing column names, but I like knowing table names more.
Yeah, I think it's still useful info for an attacker. But only if the system was actually developed by amateurs who never heard of parameterized queries.
I find it a bit bizarre that the city uses "our system was developed with no consideration for security" as a valid defense.
'); SELECT * FROM logins --
Look everyone, it's Little Bobby Tables.
`Especially in a database that doesn't allow wildcards`
Such as...
This fails if either the UI sanitizes wildcards, or if the database prohibits them, or if it produces so much data that you can't ingest it in time, etc.
It also fails if the system was written using parameterized queries. I wouldn't expect a system to be sanitizing anything if fails to take the most basic step for db access. This whole discussion is only relevant for systems developed by amateurs. SQL injection can only work at all if you use string concatenation to create queries, which you should never do.
Injections don't always need ''. The statements
and if injected into a query will give different answers if SQLI exists.There are MANY other tricks that don't involve ''.
Besides, consider the number of valid queries done by the application that involve '*'. You are not going to turn that off.
Sanitization almost always fails. This becomes an arms race.
If you do it wrong, yes. Sure, there is no 100% security, but honestly, it's 2025. We already know the techniques how to prevent SQL injection of any kind. I wrote about this here: https://valentin.willscher.de/posts/sql-api/
The parser isn't shown there, so it isn't clear what would happen with weird input.
Have you had anyone do a penetration test on it?
Right but the case that is being imagined here is a site that perfectly sanitises * but somehow still allows SQL injection? I don't think so.
> Right but the case that is being imagined here is a site that perfectly sanitises * but somehow still allows SQL injection? I don't think so.
It could literally just reject anything with asterisks.
It doesn't even need to do anything perfectly, it just needs to do it enough to produce hurdles for you. Like blowing through the number of attempts you realistically have remaining.
There are trivial ways around all of those. `LIMIT 1`, `SELECT .. FROM information_schema...`, etc.
> There are trivial ways around all of those. `LIMIT 1`
LIMIT 1 limits row count. The issue here was columns. Like a giant blob someone might've stored in there.
> `SELECT .. FROM information_schema...`
no such table: information_schema.columns
> etc.
https://news.ycombinator.com/item?id=43181799
I don't want to take away any steam from your sails but giving bad information in regards to case law shouldn't be taken lightly. Your "expert witness" did you a disservice.
Schema is very much a critical field in terms of AuthZ privileges. Just knowing the structure is not far off from knowing the max entropy a password may hold. In regards to InfoSec, table structure is the recon phase which limits effort and minimizes time. Someone with that much time in security knows DBs will be hacked, not if but when. Time is an incredibly important tool which is why we have expirations on so many authN and authZ windows of attack.
I'm glad that you are challenging them but I believe a credible engineer would have made mince meat of your expert and hurt the rest of us who want to see you successful.
It's possible rewriting certain statutes can help us but there is no company worth its salt that would share DB schema.
> Just knowing the structure is not far off from knowing the max entropy a password may hold
Not if the password is hashed, as it should be. Unless the schema somehow indicates that it uses a hash algorithm such as bcrypt that has a maximum password length. And even then, if they pre-hash the password, the password itself could have more entropy than that. And if there is a maximum password length, then you can probably figure that out via other means, like it being listed in the requirements when you set your password. It does tell you the size of the hash of the password, but if the maximum entropy is sufficiently high, as it should be, then it doesn't really matter; it would still be impractical to brute force.
> there is no company worth its salt that would share DB schema
So you are saying that every company with a self-hosted or open source product that uses a database isn't worth their salt? If your DB is running on a customer's infrastructure, that customer will by necessity have access to the schema. And likewise if the source code for a product is publicly available it is trivial to determine the schema.
Out of curiosity, could you ask for something like "one row of data from every table in the CANVAS database"?
This is a technical solution to a people problem. My reading is that the city doesn’t want to give up this information. If that’s the case, a technical solution wouldn’t work, no matter how easy it is. And given that this has already gone to the Illinois Supreme Court (and lost), the only solution is what is discussed at the end: updating the law.
I agree this is something of a technical solution, but the court wasn't interpreting whether you could ask for rows from a database, but whether you could ask for the schema directly. I don't think the court had the option of saying "you can't ask for the schema, but asking for a sample row is ok".
The short answer is yes, you can do this. I've seen this work for emails, where the request is basically, "Give me the most recent email of blah@gov.com".
And yeah, the plan was to eventually submit a batch of requests using the table names, similar to `SELECT * FROM {table_name_from_schema_request} LIMIT 1`, but one FOIA request per-table.
I have once wrote a script that translated sql requests into proper Ukrainian legalize invoking the equivalent of FOI to quite citizenship statistics from the agency. It worked, but they were not very happy when I had to get to them on the phone.
No offense, but how can you be 1) insisting it's safe to give up the information to you and 2) openly planning to use the information obtained for further exploitation, at the same time? You can't have the cake and eat it too, unless the information available in 2) technically do not depend on 1) but doing it this way would only save them massive time or something.
Seems like you could asked for a verbally masked description? Like an enigma coda specific to the FOIA.
"Describe to me the columns, in simple non-programmatic english, and what the purpose of the table is for, for each table related to parking tickets"
Essentially a human to schema DSL That is only technically decipherable by the admin of the database. Then you're not having actual code and only the admin could decipher.
But yah, as you said, if the humans don't want to disclose their foibles, how the request is filled is technically meaningless.
I wish it were that easy easy. I'll go more into this specific question in my post, but the short answer is that FOIA does not statutorily require the creation of new records in response to a request. The gov agency creating a description of the data in response to the FOIA request would be creating new records. It's silly.
Yeah I can see that, seems like masking isn't creating a new record, but obviously that's not how it's interpreted, because you're using the human filling out the form to interpret then return the data. FOIA typically allow for redactions and that seemingly creates new records because they have to redact things and knowing what to redact is providing masked information and that's a new record.
As such, they could claim all FOIAs that require redactions shouldn't be fulfilled because a redacted record is a new record.
They don't do describe, as it creates the new document, which is a blind spot of FOI
> the only solution is what is discussed at the end: updating the law.
That, and actually penetrating the data system and subsequently "leaking" parts of it. Which is nearly always illegal, but could be considered a form of "Civil Disobedience" especially if done ethically - e.g. removing sensitive data or leaking only aggregates of the data. Either from outside, or by a whistle-blower.
I'm not saying "hack the government!". But I am arguing that the pressure of "getting hacked" is like the pressure of protests, blockades, occupying facilities etc, all of which civil disobedience, and often simply illegal too. All are tools in the belts of civilians to keep a government in check. Extracting information that a government is not willing to give but that would benefit the governed, should IMO often be considered such a tool as well.
Kudos to you for enduring through this fight! We can only achieve transparency when people choose not to be complacent. Thank you.
What do you think are the next steps?
My first step is to actually finish my post :)
But after that, getting a reasonable law passed to fix this now-broken nonsense.
Have you tried looking for information from the developer about CANVAS? With any luck the developer has support documentation online that describes CANVAS and maybe you'll be able to narrow down your FOIA request.
I think the point of the lawsuit is less about CANVAS schema itself and more about the ability of the government to hide this kind of information from FOIA requests.
Damn, this is impressive. I've been fighting with a state agency since December for 17,000 emails. I don't think I've ever tried to request emails and received zero push-back, but a $33 million estimate just, chef's kiss
Very interesting case! Just one question: to what extent do changes in database schemata fall under FOIA in Illinois? That is, if they should change the database schema to conceal whatever it is they're fighting tooth and nail to hide, are they compelled to retain detailed information about that change? Or can they later present you (should the legislation pass) with a cleaned-up, nothing-to-see-here updated version?
What are the administrators of CANVAS hiding?
Hard to say. One of my personal drivers for this lawsuit is a tip I received that said that Chicago has a list of vendors whose tickets are dropped in the back-end. When I requested that info, the city said they had no such list. I trust my source, so having schema information could help figure out the extent and if they were lying.
Considering how much they fought to not release the schema, there's probably a column named "exempt_from_penalty" or something equally obvious.
If they lose in court they have to pay court-determined attorneys' fees. That might be sufficient to get them to appeal automatically.
This is a tension you sometimes see discussed in the context of wrongful imprisonment, where one faction says that if you get tossed in jail for 30 years over something there was never any evidence that you did, the state should have to pay a penalty, and another faction says that if you penalize the state for randomly imprisoning innocent people, those people will never be allowed out of jail.
Earnest question: If you suspect them of lying on the issue, why would you trust them to release the full schema in response to the FOIA request, and not just omit any possibly incriminating columns?
It's always a possibility that some low level official not in on the scam sees the FOIA request before management tells them not to work on it. The more you ask for, the less filtering there is going to be, simply because of how people work.
If you're running the scam, you don't want to tell low level employees about it, because they have no incentive not to blow the whistle.
How is this different from literally any other FOIA transaction, computer-y or otherwise?
What is the theory then for why they do not want to release this schema? Don’t misunderstand me I appreciate how important it is that people push the boundaries of FOIA.
The statute says they're not required to. For a couple years, the statute did say that they had to, as we won multiple cases in lower courts, but Chicago appealed to the Illinois Supreme Court, and the outcome was that now the statute exempts schemas.
By that logic there's no point investigating any crime or doing any kind of audit. You increase the costs of covering up, and put them in a dilemma - remember this is exactly what brought down Nixon.
Because this is not how government works. Most of the time it's not a heavily entranched conspiracy. Once the request is approved to go through by the legal department, some technician will happily give you everything you want and it won't be censored or tampered with in process.
Many times the people answering the requests aren't part of the conspiracy to commit random acts of malice. Sometimes they're roped into it under threat of termination.
And often times, the denials eventually lead to significant reorg once judges and Congress can revise laws to fix the ambiguities.
Well that certainly sounds suspicious. But it could also provide more damming evidence of targeting groups, people skimming the till, bribes to make tickets go away, all sort of fun shenanigans.
And boy they’re fighting suspiciously hard.
Good luck.
Bribes are most certainly not logged in the system under the "bribes" column or codified in any way. The data discovered through foi could show some patterns which are suggestive of bribes, but the actual thing is negotiated "off chain".
That’s what I meant. For example, people who have a suspicious number of tickets dismissed. Or perhaps certain employees that dismiss a suspicious number.
'ethnicity' header, 'net_income' header... wouldn't doubt chicago could be cave man enough to do this
When can we submit witness slips? Is there a mailing list for updates we can join? Good luck!
They can produce a report using english language labels instead of the db column names. Their argument isn't fact it's vexatious obstenance.
As mentioned in the post FOIA tends to only include existing records/information, it doesn't extend to producing new work. So producing a new report would be considered too much work. (But fighting a lawsuit to not reveal the schema is fine )
What I want to know: How much malort does the city expensive a year?
This older post was such a fantastic read, thanks for sharing your story!
It's dated from ~2 weeks ago... is there other date information I am missing?
The HN post [0] is from February 9th, 2025, but the post the person you replied to was referencing [1] is from October 19th, 2018.
[0] https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/ [1] https://mchap.io/that-time-the-city-of-seattle-accidentally-...
ah no, I just said "older" since OP said it was older and I wanted to distinguish from the SQL post that this post is about
> Normally, a flustered public records officer would just reject a giant request for being for “unduly burdensome”… but this sort of estimate is practically unheard of. So much so that other FOIA nerds have told me that this is the second biggest request they've ever seen. The passive aggression is thick. Needless to say, it's not something I'm willing to pay for!
Welcome to Seattle :-)
> that's the second biggest FOIA request I've ever seen!
-Guybrush, from The Secret of Monkey Island
Thanks for fighting the good fight for us all!
The footer links to dead x account.
While I believe that the city should share the schema, and that the city is effectively argues for security through obscurity, I disagree with the main premise of the article: that knowing SQL schema doesn't help the attacker.
If I understand the argument of the author here:
> Attackers like me use SQL injection attacks to recover SQL schemas. The schema is the product of an attack, not one of its predicates
The author appears to imply that once the vulnerability is found, the schema can be recovered anyway. It is not always the case. It is perfectly viable to find a SQL injection that would allow to fetch some data from the table that is being queried, but not from any other table, including `information_schema` or similar. If all the signal you get from the vunlerability is also "query failed" or "query succeeded, here's the data", knowing the schema makes it much easier to exploit.
> the problem is that every computer system connected to the Internet is being attacked every minute of every day
If you specifically log failed DB queries, than for all the possible injections that such 24/7 attacks would find you have already patched them. The log would then be not deafening until someone stumbles on the actual injection (that, for example, only exists for logged in users, and thus is not found by bots), in which case you have time to see it and patch before the attacker finds a way to actually utilize it.
Knowing schema both expedites their ability to take advantage of the vulnerability, but also increases their chances of probing the injection without triggering the query failure to begin with.
> that knowing SQL schema doesn't help the attacker.
Knowing the name of the service helps the attacker, knowing the name of government officials working at city hall helps attackers, knowing the legal description of what a parking ticket is helps attackers. If you are sued and decide you want to hack the government knowing the details of the suit against you helps you in your attack.
The barrier is not “any helpful information must be censored” the barrier is “don’t disclose passwords or code that would divulge backdoors” a schema cannot be that.
I'm not an attacker, just a boring old software dev. If there's an SQL Injection I'd say all bets are off re: schema.
That said I've definitely worked on applications where knowing the schema could help you exfill data in the absence of a full injection. The most obvious being a query that's constructed based on url parameters, where the parameters aren't whitelisted.
So I actually do agree that the schema could potentially be of marginal benefit to the attacker.
Wouldn't admitting this in court pin you with some sort of negligence? (if you knew having a schema revealed would compromise your app in some way).
"Defense in depth" is an easy argument to make. I sure hope I don't have any SQL injection holes, but I can't prove it with 100% certainty.
I can't imagine how the schema would reveal SQL injection holes. Maybe other holes, though. Any poor choices for PKs, dumb use of MD5 computed fields, insecure random, misuse of NULL, weird uniqueness constraints (this also ties back to NULLs), vulnerable extensions, wrong timestamp type, too-small integer type, varchar limits, predictable index speed...
Edit: More NULL, or maybe lack thereof cause they use the string "NULL" instead? https://news.ycombinator.com/item?id=20676904
> I can't imagine how the schema would reveal SQL injection holes.
It wouldn't. I'm just assuming that the thrust of the hypothetical negligence accusation was "The schema is useless unless you have SQL injection holes. So give us the schema or admit you are negligent!" But you're correct that there are other justifications one could make to keep the schema secret.
The schema can provide an insight into what the application developer was thinking when writing the code, which in turn can direct an attacker towards tricky corners where mistakes might have been made.
That's true.
This is the city government here. The people arguing the case didnt write the code and dont have time to look through all their code but one thing they do know is that it was written by monkeys. They probably have some level of reason to believe their are SQL injections available in the code.
Reminds me that the recently discovered “leak emails using YouTube” exploit kicked off from reading what is essentially, a schema.
https://brutecat.com/articles/leaking-youtube-emails
> kicked off from reading what is essentially, a schema.
I wouldn't call json a schema.
In the HN discussion tptacek replied that "$10,000 feels extraordinarily high for a server-side web bug": https://news.ycombinator.com/item?id=43025038
However his comment assumes monetisation is selling the bug; (tptacek deeply understands the market for bugs). However I would have thought monetisation could be by scanning as many YouTube users as possible for their email addresses: and then selling that limited database to a threat actor. You'd start the scan with estimated high value anonymous users. Only Google can guess how many emails would have been captured before some telemetry kicked off a successful security audit. The value of that list could possibly well exceed $10000. Kinda depends on who is doxxed and who wants to pay for the dox.
It's hard to know what the reputational cost to Google would be for doxxing popular anonymous accounts. I'm guessing video is not so often anonymous so influencers are generally not unknown?
I'm guessing trying to blackmail Google wouldn't work (once you show Google an account that is doxxed, they would look at telemetry logs or perhaps increase telemetry). I wonder if you could introduce enough noise and time delay to avoid Google reverse-engineering the vulnerability? Or how long before a security audit of code would find the vulnerability?
Certainly I can see some governments paying good money to dox anonymous videos that those governments dislike. The Saudis have money! You could likely get different government security departments to bid against each other... Thousands seems doable per dox? The value would likely decrease as you dox more.
> I wouldn't call json a schema.
What you see there is a protobuf, serialized as JSON. If a protobuf definition isn’t a schema, I don’t know what is.
Right, thank you for the correction
> "query failed" or "query succeeded, here's the data"
Blind SQL injection is a type where no error is produced, but some subtle signal can indicate success or failure. The most interesting one that I know about is where the presence of a successful injection was a normal looking response that was one byte longer than an unsuccessful injection. This was used to not only figure out the schema, but to fully exfiltrate the entire database.
There is nothing in the log on the server that indicates an error.
Most of the relatively introductory SQL injection exercises that I taught proceed without any knowledge of the schema.
This is why SQL injection is so insidious.
Not just with SQLi, but I've managed to statistically proof "information" with timing attacks.
Where if you join another table (by e.g. requesting extra info in a graphql query) the response goes from ms to s or even m. Indicating the size of the joined table.
Or where I could change a "?sort[updated_at]=desc" to a "?sort[password_hash]" through trial-and-error and suddenly see the response time drop from ms to seconds (in this case finding columns that exist but aren't indexed).
Even if the response content is exactly the same, we know things exist, are big, not indexed, or simply present, by timing the attack.
A famous one is obviously the timing trick to find out that an email is in the system because "user = user.find(email) && user.password_matches(password)" short cirquits if the email does not exist but spends significant time on hashing the password for matching it. A big lot of backends and apps make this mistake.
If you specifically log failed database queries, where "failure" means "indicative of SQL injection", then nothing you can do with the schema is going to reduce the signal in that feed --- even a single SQL syntax error would be worth following up on. No, I don't think your logic holds.
I don't understand your logic. Knowledge of the schema can give an attacker an edge because they now know the exact column names to probe. Whether these probes get logged is irrelevant; even if it makes the system more vulnerable for an instant, it's still more vulnerable.
Even if logging failed queries is your metric, then knowledge of column names would make it more likely for an attacker to craft correct queries, which would not get logged, thus making your logs less useful than if the attacker had to guess at column names and, in so doing, incur failed queries.
To probe for what? How does knowledge of a column name make it easier for me to discern whether a SQL injection vulnerability exists? I've spent a lot of time in my career probing for SQL injection, and I can't remember an instance where my stimulus/response setup involved the table names.
SQL injection is a property of a SQL query, not of the schema itself. To have a meaningful chance of blind-one-shotting a query, getting a TRUE/FALSE answer about susceptibility without ever generating a SQL syntax error, I would need to see the queries themselves.
Knowledge of the column names doesn't give you insight into whether a vulnerability exists. It gives you insight into what you can do with a vulnerability, should it exist. For example, if you want to set your account balance to $1 million, you'd need to know the column name in order to generate a valid query. Without advance knowledge of the column name, your job becomes harder.
SQL injection will give you the entire schema anyway. It doesn't help if someone tells you the col names beforehand. I'm more wondering about non-SQL-injection vulns.
SQL injection isnt just an ssh tunnel to the database. If the line you've injected isnt a select and the backend never fetches it how does the injection give you the column names?
I've seen this done by enumerating possible table names.
That's a typical way, but the errors might alert them, and of course maybe the names aren't so easily guessed.
Oops you're right, it's possible that you have no way to read things back.
> How does knowledge of a column name make it easier for me to discern whether a SQL injection vulnerability exists?
It doesn't. It just means that as soon as you find one, you can immediately begin crafting valid queries instead of randomly guessing table names and columns, therefore not setting off the "DB query failed" alert.
EDIT: I guess this is the part I missed:
> To have a meaningful chance of blind-one-shotting a query, getting a TRUE/FALSE answer about susceptibility without ever generating a SQL syntax error, I would need to see the queries themselves.
Really? I guess I have to take your word for it because I've never attempted it, but I would have thought that in some (horribly broken) systems `bobby tables' or 1=1 --` would have a very reasonable chance of detecting SQL injection without alerting anyone.
You can craft valid queries that don't reference any table or column name.
Right, and that's what you use to find the vulnerability. But imagine you've found the vulnerability and now you want to use it to update all of your parking tickets as paid. Without the schema, this is going to be quite tricky and will generate a lot of failed SQL. With the schema, you might be able to do it on your first try.
Which is why in the ordinary course of a pentest you'd use the SQL injection vulnerability to recover the information in the schema.
Is there not any SQLi vulnerability in practice that doesn't allow such an information recovery? That is, is the schema-recovery step so foolproof that it can always be performed on any target form? GP is suggesting that this may be difficult, depending on the kind of signal that gets returned from the form.
In my entire experience as a software security practitioner, which at the time of my testimony encompassed some hundreds of assessments of SQL-backed websites, the availability of a schema has never impacted my ability to exploit a SQL injection. It's not my job as an expert witness, nor Matt's job as a plaintiff, to invent improbable scenarios where security could hinge on schema availability. The court (all of them, in fact) found that testimony dispositive, so I'm happy to leave the issue there.
Maybe I'm ignorant, but if the account the app is using doesn't have access to the information_schema how do you do this?
I don’t think that’s a very common setup but perhaps I’m just exposing my own ignorance. Just consider the popularity of ORMs. They explicitly load the schema into the application in many cases.
Not just that, but perhaps the app is smart enough to lock you out the second it detects an attempt to gather the schema, e.g. by logging and automatically responding to a query that displays the schema. Then you have to look for other ways in (another IP, etc.). But if you know the schema in advance, you have a better chance of a one-shot injection that accomplishes your malicious goal.
In other words, advance knowledge of the schema may make it easier to act maliciously.
> nothing you can do with the schema is going to reduce the signal in that feed --- even a single SQL syntax error would be worth following up on
Syntax errors coming from your web application mean there is a page somewhere with a bugged feature, or perhaps the whole page is broken. Of course that's worth following up on?
Edit: maybe I should add a concrete example. I semi-regularly look at the apache error logs for some of my hobby projects (mainly I check when I'm working on it anyway and notice another preexisting bug). I've found broken pages based on that and either fixed them or at least silenced the issue if it was an outdated script or page anyway. Professionals might handle this more professionally, or less because it's about money and not just making good software, idk
> Syntax errors coming from your web application mean there is a page somewhere with a bugged feature, or perhaps the whole page is broken. Of course that's worth following up on?
This is a government system, with apps probably built by lowest-bid contractors.
I imagine most of us would be horrified by the volume of everyday failed queries from deployed apps.
Can be, but I'm not sure it's worth investigating whether a particular deployment has such a specific monitoring system before being able to do a FOIA. The schema is marginally relevant for attacks at best (with heavy emphasis on just how marginal it is) and that's no barrier to releasing it
That's where the court's technical distinction between the words: "could" and "would", is important. It appears they have reduced the distinction to a risk assessment which is more objective than opining wildly!
For example: I've just re-wired a three gang light switch. I verified power on with my multimeter (test the meter), cut the power and then retested all the circuits to make sure I had got it right.
It turns out that switch three is on a separate ring main. Cool I didn't get to test my body's ability to take a whopper of a shock. In the UK it is common to have upstairs and downstairs rings for light circuits. Our kitchen has quite a few lights in it so it got a separate ring as well. Anyway there are quite a lot of wires in there because all of them are two way switches. Oh and I am allowed to work on them because of the switch location - not kitchen and not bathroom, ie a low risk location
I noted down the connections, and took them all out. I put Wagos over the flying ends to make them safe, turned the power back on and got on with the job in hand.
I then cut the power (both circuits) checked again with my Fluke. Oh bollocks ... enable power, test the Fluke and then cut power again and recheck the circuits.
Now I re-terminated all the connections. There was plenty of additional wire so I decided to cut and re-strip the conductors, to make sure that I avoided potential failures due to "work hardening" from the inevitable pushing and pulling and "gentle" forcing into position. Once all the conductors were screwed down I pulled on them fairly forcefully to make sure they wont fall out.
I screwed down the switch face plate and restored power. Its a brushed metal finish switch so I did test it was not live, because I'm careful. I tested the functionality ie all three switch circuits (three) from all the switches (six).
So, given that description is it possible that the connectors might fall out in the future and short on say, the metal back box. Of course it is possible. It could happen but would it happen?
You could postulate all sorts of scenarios. Perhaps I may be careful but I might be cack handed and forgetful and got something wrong anyway and a wire might still drop out. Now we are at the point of whataboutery! and that wont wash.
The would/could distinction is a powerful one and it is analogous to how we do risk assessments.
I'm certainly not saying you are wrong in your assessment but I think you are fiddling with details to conjure up a "could" and not a "would". I agree that knowing the schema would assist a hacking attempt but would it make a successful crack more likely - no I don't think so. It is a classic case of obscurity despite security but a rather more complicated one than putting the ssh daemon on port 2222.
Cripes - I need to get out more!
Kurt posted this to troll me. Just know my audience here was, mostly, non-technical people involved in politics in my local Chicagoland municipality.
Permit me a PSA about local politics: engaging in national politics is bleak and dispiriting, like being a gnat bouncing off the glass plate window of a skyscraper. Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed --- in my spare time and at practically no expense (drastically unlike national politics).
An amazing thing about local politics, at least in a lot of places, is that they revolve around message boards. The boards won't be in places you want to be (in particular: a lot of them are Facebook Groups) and you just have to suck it up. But if you enjoy participating in a community like HN, you can participate in politics, too, and message-board your way towards making things happen.
> Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed
You live in a country where local governments have the power to make laws… in a lot of other countries they don’t - or, to be more precise, their lawmaking power is extremely limited.
Actually, even in the US, that’s often true too - only local governments with “home rule” can enact laws on any topic (provided it doesn’t contradict state or federal law), those without it can only enact laws on specific topics authorised by the state legislature. Some states grant home rule to all counties and municipalities, others none, others to some but not others (e.g. in Texas a municipality can give itself home rule powers, with approval of its voters, but only once it reaches a population of 5000).
Even state legislators are, by their nature, pretty much locally driven given the relatively small size of their constituencies and thus the margin of victory.
Voters significantly underestimate their power even up to the House level; AOC’s first campaign was very scrappy and resulted in a bartender unseating the chair of the Congressional Democrat Caucus and likely successor to Nancy Pelosi, and that was the first campaign in which anyone bothered to primary him.
Would you care to elaborate which law you helped to pass?
Also, can you link to some good resources for someone who wants to get off the sidelines and get more involved in Chicago politics, whether the resources are on FB or elsewhere? I've previously tried Googling for some but with very limited success.
Thanks.
We're the first municipality in Illinois to draft and adopt an instance of ACLU's CCOPS model legislation, which requires board approval at a recorded public board meeting before any agency (most especially our police force) can adopt any form of surveillance technology, given a broad (ACLU-supplied) definition of "surveillance". Previous to that, our police force could acquire arbitrary surveillance products so long as they kept under a discretionary budget threshold; they used that latitude to acquire a pilot deployment of Flock ALPR cameras, and CCOPS was a response to that.
My real goal is zoning.
In Chicago itself, I have less clarity, but am optimistic that somewhere on Facebook is a message board where the staff at your alderman's office reads posts, and the most politically engaged people in your neighborhood argue with each other. That's your starting point (and maybe your ending point). Just go, listen, and chime in with high-effort comments. If you're used to clearing the bar for HN comments, you're way past the threshold of coding like a super-thoughtful person in local politics.
The categorical elimination of single-family zoning along with any building envelope restrictions that would make as-of-right 3-flats uneconomical.
Rather than the complete elimination of single family (and by extension even larger lots) I feel like it ought to follow something resembling an iterated 80/20 rule out to huge rural lots at the far end. Notice that this would imply a plurality of the land being zoned for the highest density at any given time.
The thing that really kills density in most cases is the height restrictions. A lot of the upzoning in my area has resulted in ugly, wall-to-wall low-single-digit floor count buildings with near zero setback. It's better than single family but it isn't particularly dense and it's a huge step backwards aesthetically.
A step in the right direction last week for the largest upzoning effort in the city! https://archive.is/QuOcJ
Of course the a vocal minority is fuming about higher density.
It's might actually be easier to win the economics battle by chipping away at restrictions on taller buildings. The builders in my area are copy/pasting a 3-flat design all over the place but it requires bargain-basement land prices (literally building on former toxic waste dumps) or money from the township because 3-flats make you have to build wide.
The muni I live in is very constrained (we're just 4 square miles, right on the border of the west side of Chicago) and our land is overwhelmingly SFZ, so most of the ballgame is getting SFZ lots opened up. The emerging consensus is towards "missing middle" housing, which is 2-40 units (but really, a medium term sweet spot in the teens), where you're talking about buildings spanning multiple lots.
That very little can economically be built on existing SFZ lots even with relaxed zoning is actually a feature, not a bug, for getting this done. People want change to be slow. At least to begin with, it's better strategically if it takes a couple years and gradual tweaking to make lots of building happen.
Kam Buckner is trying to get something passed at the state level (but wouldn't apply to Oak Park. https://ilga.gov/legislation/BillStatus.asp?DocNum=3288&GAID... )
That would be an outstanding outcome! Is this just for Oak Park, or beyond?
You'd hope that Oak Park, Evanston, Wilmette, and then Berwyn and Schaumburg could get this done, and then your next step would be either Chicago (tough because of aldermanic structure) or statewide, the way California did. Either way: you start in one municipality and work from there.
It helps that zoning matters more in Oak Park (and Evanston) than almost anywhere else in Chicagoland.
There is no way you get Wilmette to change zoning. They've fought with Small Cheval about the size of their sign for like 9 months. I doubt you'd get any village in the NT district to rezone - the Optima project was pulling teeth, everyone is worried about overcrowding NT, which as a single HS is pretty packed now
The whole project is going to take many years. Even if we fix Oak Park zoning in the coming year, it'll still be years before anything significant gets built, and years past that for us to serve as a test case.
New Trier can just build another campus like they did for freshmen.
Why does zoning matter more in Oak Park and Evanston? High demand from being on the El and close to Chicago?
Yep. Historically both of these places basically exist to concentrate the interests of the upper middle class and to reinforce segregation. They're both basically Chicago but with a better funded school system (because lawyers and doctors get to funnel all their property taxes into the school down the street from them), which makes them highly desirable.
“Never doubt that a small group of thoughtful, committed citizens can change the world: indeed, it's the only thing that ever has.” - Margaret Mead
Like a hedge fund? Or are we including those committed to violence?
Probably not the intent of the attributed author [0] but literally speaking the statement doesn't specific "ethical" or "peaceful", no.
[0] https://quoteinvestigator.com/2017/11/12/change-world/
It's about that it's a small-dedicated group that brings change and not government or private institution. If it's still hard to grasp, then think about how national movements started.
Would would you ever exclude ones committed to violence? Violence consistently works.
Snipers, patient 0's, drunk drivers...
>The boards won't be in places you want to be (in particular: a lot of them are Facebook Groups) and you just have to suck it up. But if you enjoy participating in a community like HN, you can participate in politics, too, and message-board your way towards making things happen.
How do you figure out where to go?
The way you'd expect: I bumbled through a bunch of different Facebook Groups, starting with the one simply labeled for my neighborhood, and followed cross-posts. Eventually I found the two really important ones in my area (one is an organizing group for local progressives --- I live in a very blue muni, and the other is the main high-signal political group for the area, in which all the village electeds participate).
Aaaaaaa! I need to finish my post! :(
Is it not absurd that the supreme and appeal courts disagreed on a syntactical matter? Never mind that this isn't uncommon, or that (IMHO) it would be ridiculous to interpret it as "any file layouts at all, and other stuff too, but only bad other stuff". It's crazy to me that were happy for laws to sit on the books being utterly ambiguous.
I know this suits the courts who benefit from the leeway, and that (despite valiant efforts) we're not going to get "formal formal" language into statutes. I know that the law is an ass. I know that the laws are written by fallible and naive humans.
Even after all that, if the basic sentence structure of what's in the law isn't clear to the courts, hasn't the whole system fallen at the first hurdle?
I am not a lawyer, but my understanding is that's just how the justice system works. Reasonable people can disagree about what exactly a complicated statement says, since language is full of ambiguities. People have been discussing what the U.S. Constitution says exactly from the day it was written and there are still a lot of disagreements.
The standard response to this is that laws should be written in ways that are non-ambiguous but that's easier said than done. Not to mention that sometimes the lawmakers can't fully agree themselves so they leave some statements intentionally ambiguous so that they can be interpreted by the courts.
Nobody reasonably expects all laws to be written completely unambiguously. But since laws (and indeed all manner of legal documents) are filled with lists and modifiers, I don't think it's unreasonable to require that they be written to a certain standard which defines how these lists and modifiers should be interpreted, similar to RFC 2119 https://microformats.org/wiki/rfc-2119.
I’ve often thought we’d get more sensible results in court cases on computer-related issues if we had specialised courts where the judges were required to have a relevant degree (computer science, software engineering, computer engineering, information systems, etc). But I doubt it is going to happen any time soon.
It happens from time to time. https://www.theverge.com/2017/10/19/16503076/oracle-vs-googl... ( https://news.ycombinator.com/item?id=15834800 42 comments)
> These days, he often looks for some kind of STEM background for the IP desk. It’s not necessary, but it helps. Bill Toth, the IP clerk during Oracle v. Google, didn’t have a STEM background, but he told me that the judge had specifically asked him to take a computer science course in preparation for his clerkship. When I asked Alsup about it, he laughed a little — he had no recollection of “making” Toth take any classes — but he did acknowledge that sometimes he gives clerks a heads up about what kind of cases are coming their way, and what kind of classes might be useful ahead of time.
Note that it's not necessarily the judge that's important as an individual knowing the material, but that the clerks who work for the judge are.
Civil code law uses that way of thinking, where there are specialised courts for different areas: administrative, civil, labor, family, commercial and so on. I actually am not so sure it is great as these courts increase the depths of the bureaucracy to the point of being self serving. They also serve to segment expertise.
> Civil code law uses that way of thinking, where there are specialised courts for different areas: administrative, civil, labor, family, commercial and so on.
This happens in common law countries too. For example, the US has specialised courts (at the federal level) for bankruptcy, federal government contract disputes (US Court of Federal Claims), taxation (US Tax Court), among others. It also has a nationwide appellate court (Federal Circuit) with jurisdiction limited to certain topics (patents, trademarks, federal government contracts, among others), and another (DC Circuit) which despite being technically geographic in practice also has topical jurisdiction (many-but not all-lawsuits against federal agencies). Many states have specialised courts for various areas of law
It is very common in common law countries to have specialised courts/tribunals (or divisions thereof-there isn’t a big difference between a specialist court and a specialist division of a generalist court) to deal with certain types of cases, especially bankruptcy, family law, probate, child welfare, juvenile crime, patents, taxation, administrative law, military law, immigration, small claims - the exact set varies, but specialised courts/tribunals/divisions are very common.
But I’ve never heard of a specialised court/tribunal/division for computer cases
Correction, that is how common law legal system works.
Alternatives like codified law exist and are practiced, just not in the US or Canada.
To me it feels like the kind of dispute that is exactly why we have multiple levels of appeals court. The "file format" thing is super dumb, and they got it wrong, but the "that if disclosed" statutory interpretation is a thing that seems important to get a final, consistent determination on.
Of course I can't disagree that it's good that it's now settled. Still I can't help but imagine a world where the meaning, at least in terms of which words apply to which others (rather than qualifiers like "reasonable"), should be settled before the law is debated, voted on, and passed.
Even (some) programmers have learnt the dangers of parsing at run time (e.g. "eval is evil"). How can we decide it's the law we want if we don't know what it means yet?
> How can we decide it's the law we want if we don't know what it means yet?
FWIW, judicial interpretation of legislation is generally seen as an exercise in figuring out what the legislature meant. Courts start by looking at the "plain meaning" of the words used, but where that doesn't yield an unambiguous answer they will often look at the overall scheme or purpose of the legislation to try and figure out which interpretation is most consistent with that.
It's far from perfect of course, but it's not like legislation just consists of a bunch of random symbols that are later imbued with meaning by a court operating in a vacuum. The meaning of most legislation is clear most of the time. I'm sure the authors of the bill thought it was sufficiently clear, for any scenario they could contemplate (or, at least, the ones they cared about). But it's hard to see every potential corner case (and if every potential corner case did have to be identified and settled before the bill could even be debated, it's likely Illinois wouldn't have a FOIA today).
> It's far from perfect of course, but it's not like legislation just consists of a bunch of random symbols that are later imbued with meaning by a court operating in a vacuum.
Isn't this exactly what happened? A court of computer laypeople reached for Merriam-Webster in order to disambiguate a sample of programmer argot that was written into law by another group of computer laypeople. The legal profession isn't just dirty, it seems doomed to defeat itself in even its most rigorous practice.
> Courts start by looking at the "plain meaning" of the words used, but where that doesn't yield an unambiguous answer they will often look at the overall scheme or purpose of the legislation to try and figure out which interpretation is most consistent with that.
There is also the concept of a "canon of construction", which exists specifically to handle these kinds of reoccurring grammatical issues. I'm surprised there isn't one for dangling modifiers.
That's not the only alternative though. Why are experts not involved in the interpretation and it's left up to how two seperate non-technical groups interpret it?
Other countries have legal specialists for different areas and update their laws continuously based on expert opinion, common law gets expert testimony but is based on generalists to make the final determination
Something something something Article III of the US Constitution something.
I find it slightly odd that you get hung up on the file format thing. The law as you quoted it says "including but not limited to" and the first example given is then "software".
I'm confused why file layout is included in the list of exceptions in the first place. If an adversary knowing your file format is a security problem, then you are doing something very wrong!
And with the ruling that the condition only applies to "other information" (which to me seems like a very strange reading, and probably not the intent of the law), regardless of if a SQL schema is considered a "file layout", creates a massive loophole, where the government can just use some obtuse custom file layout to avoid FOIA requests.
Am I the only one slightly perplexed/worried by the point-blank source code exemption?
It's easy to imagine a scenario where the city decides to develop a specific software in-house and hide the "biases" in the source code, or any other thing one might not find desirable.
Hell, they don't even need to make everything from scratch! Could just patch and use a permissively licensed 3rd-party component.
In my opinion, the proposed amendment does not go far enough.
It shouldn't be surprising ?
It is the same problem people trying to open sourcing closed projects experience, there is all sorts of locked-in proprietary code which the developer and the customer only have the license to use but not share the source.
Even projects which from day one are staunchly open and built without direct commercial interests like government contractors need also suffer from this. The Linux kernel challenges for supporting ZFS or binary blob drivers in kernel/user space and so on are well known[1]
Paradoxically on one hand information wants to be free, and economics dictate that open source software will crowd out closed competitors over time, it is also expensive to open source a project and sometimes prohibitively so and that deters many managers and companies open sourcing their older tools etc, even if they would like to do so, involving legal and trying to find even the rights holder for each component can deter most managers.
If a government put requirements in contracts that the vendor should only use open source components in their entire dependency tree, it could drive the costs very high because a lot of those dependencies may not have equivalent open source ones or those lack features of the closed ones so would need budgets to flesh them out. In the short term and no legislature will accept that kind of additional expense, while in long term public will benefit.
---
[1] yes kernel problems are largely a function of GPL, more permissive licenses like Apache 2 /MIT would not have, BSD variants after all had no challenges in supporting ZFS.
However a principled stance on public applications being open source by government would be closer to GPL than MIT in terms of licensing. Otherwise a vendor can just import the actual important parts as binary blobs "vendored" code and have some meaningless scaffolding in the open source component to comply.
Maybe FOIA should trump licensing in this case. Suppose I write a manual on how to issue bad parking tickets and hide them in a database, and then license that (in since restrictive manner) to the state of Illinois. I think the public's right to see that document is more important than my right to prevent copying and dissemination.
That is true for all kinds of IP . The balance between the two is what IP laws do. Give inventors some protections to encourage innovations while keeping the public benefits in mind .
Copyright is time limited author’s death and 70 years for individuals and 95 years for corporations .
While there are arguments to be made for lesser duration , better preservation requirements etc the balancing of public good to private value is the basis of all copyright laws since statute of Anne 1709.
In a court case you can get access to all types of information as part of discovery, if you are harmed or believed to have been, there are other avenues available for you . If you have standing to sue and the discovery requests are made by a competent lawyer you can get access to internal communications to trade secrets to any other document supporting your claim . you or your lawyer can not use such information for economic benefit or disclose it, they are still protected .
Given that you have options legally to get this data , there is no public need that trumps private property rights because of real or potential harm that justifies blanket access by default
PS: note software is not just copyrighted , it is also covered by patents (20 years) and trade secrets (no expiry ). Also while the law provides protection it does not require disclosure on expiry .
If it were enough that government data were available via discovery then we wouldn't need FOIA laws in the first place.
Patents aren't relevant here since they are disclosed upon granting and cover the design rather than the implementation, for trade secrets the situation is more complicated ( https://www.americanbar.org/groups/litigation/resources/news... ).
In theory the decision to put those biases in the code should be public information. You can ask for the criteria the software was made to, just not the software itself.
Though rulings like this might have a chilling effect.
Only if they are written down. For instance, DOGE makes sure everything is done by voice so there is nothing to catch them out on in future. I've found that once you start hitting a public body with FOIAs regularly they learn to stop putting incriminating things down in writing.
That's why it's important to push for "public money - open source" initiatives like some countries in the EU are trying to implement.
Off the top of my head, I think the last (now failed) German coalition had this in their programme but didn't deliver. Maybe the new government will.
Very interesting read.
It does seem absurd to think of divulging schema as protected, as described it allows for a magical sort of outcome where: "well it's in a database you can't know anything about, and if you can't tell me how to find it you're sol".
Working at a small company with lots of clients I wouldn't want to hand out DB schema outright, but I also go out of my way to search / get the client the data they want ... not reject them.
A private company wouldn't want to divulge their DB schemas because it's advantageous for competitors to see how you're doing things. That doesn't apply to government databases.
Not quite, and the details get hairier the closer you look. The database in-question here is an IBM system. The database itself is used for government functions, making it FOIA'able, despite it being managed by a third party company. IBM even tried to argue that the schema was trade secret, but the statute isn't straight forward. Here's my (successful) response when they tried:
You mentioned on Thursday over the phone that IBM is not too keen on having its database schema released, and, between IBM and Chicago, is seeking an exemption under 5 ILCS 140/7(1)(g) - an exemption that is only valid if the release of records would cause competitive harm. This email preemptively seeks to address that exemption within the context of this request in the hopes of a speedier release of records. It is FOI's belief that there is little room for the case for the valid use of 5 ILCS 140/7(1)(g) when considering the insignificance of the records in conjunction with the release of past documents:
1. Chicago released CANVAS's technical specification [1] seven years ago. To the extent that the specification's continued publication does not cause competitive harm, it is very unlikely that the release of CANVAS's database schema would cause any harm. 2. The claim that the release of a database schema would cause competitive harm is not unlike suggesting that the release of filing cabinets' labels can cause competitive harm.
Furthermore, in your response, please be mindful that the burden of proving competitive harm rests on the public body [2].
[1] https://www.cityofchicago.org/content/dam/city/depts/dps/Con... [2] http://foia.ilattorneygeneral.net/pdf/opinions/2018/18-004.p...
The schema on the last project I worked on was probably our most important IP. Specifically, the ways in which we solved certain circular dependency issues.
I wouldn't take the ability to design a schema for granted. I don't think many people are any good at it. Do not underestimate the value of your work products.
Is that not exactly what the person you're replying to is saying?
Private companies don't divulge schema because it's valuable IP.
Public entities IP belongs to the public, so there is nothing to protect
Part of the reason I’m so… enthusiastic… about tech debt is that I’ve worked a few times where we had a competitor whose lunch we were stealing or who was stealing ours and the ability or inability to copy features cheaply was substantially the difference between us.
That quad graph of value versus difficulty that everyone loves? It’s not quadrants it’s a gradient and the difficulty dimension depends quite a bit on context. What’s a 4 difficulty for me might be a 6 for someone else. Accidental versus intrinsic complexity plus similarity to or distinctions from things we have already done.
Maybe. But now I'm really curious how bad that schema must be for them to hide it so viciously.
I think it's just an excuse to avoid making it feasible for the public to get the data.
Your imagination can't cover how bad you might think it is (and yet it isn't that bad).
Or at least I don't want to explain to "20 years later Monday Morning Quarterback".
Maybe their schema has triggers and stuff
Used to be relevant data was in a document but much is no stored in specialized web apps whose data in turn is stored in a db.
It's Matt Champan! https://mchap.io/
I helped him process and visualize the original batch of parking ticket data waaaay back in 2016.
I can't believe he's still on this in 2025. We need more junkyard dogs like him fighting for what's right.
"Retrieve the data of every parking ticket issued to ‘Bob O’ and also all the rest of the information in the database including everyone’s passwords."
This is the example of SQL Injection written in plain English, yet "everyone's" is problematic here in that it's an orphaned single quote. If "Bob O'Conner" is bad, so is "everyone's"
I understand freedom of information, but what exactly does the public gain by Matt getting the database schema ?
If the answer is "the ability of the request data from a specific table/column", I would say that this should possible to do by asking for the relevant data directly (instead of asking for "the timestamps of each ticket" ask for the "time-related data of each ticket" for example) ?
And yes, having your db schema out in the wild can be a vector of attack, if only because it allows targeting the sql injections (the blog author himself argues this in court).
The court was right to reject this. Maybe the exact word of the law doesn't ask for it, but the spirit certainly does.
Municipalities obstinately refuse reasonable requests because they resent that the Freedom of Information Act allows regular civilians to get all up in their business. The excuses they make for noncompliance (it's burdensome! it violates privacy! sql injection!) are not serious. They don't want to comply because they don't like accountability. That's it.
The blog author argued no such thing, because that is not true.
I FOIA'ed >1M pages of docs for my project cleartap.com, a DB of water quality of the USA.
Most states would charge a small amount to gather the documents.
Michigan wanted $50K to for the FOIA request. I think because of the Flint lead crisis. They wanted me to go away.
I noticed that you do have data for Flint. Did you have to pay it, or is there some appeals process if you're quoted an unreasonable amount?
Great project by the way!
Ended up finding the majority of Michigan through scraping.
For example, https://www.cityofflint.com/wp-content/uploads/2023/06/Annua...
Given the Illinois Supremes decision, seems like an opportunistic time to say "Everything is a file".
1. https://en.m.wikipedia.org/wiki/Everything_is_a_file
> You also generally can't FOIA the source code of programs they run.
Alas, that part should be illegal under FOIA.
Source code should be open source and verifiable. Being exempt from FOIA circumvents public confidence in the government's use of software.
I'd be curious to learn if/where courts have decided such things already.
I assume that - even though there's a strong public interest argument for it - government orgs are prone to blanket banning the release of source code, for the same primary reason that businesses are prone to doing so. That is, too high a chance of sensitive data (passwords, tokens, IP addresses, etc) being hard-coded in all-too-often non-12-factor-aspiring code; and too much security / liability headache if said sensitive data gets out.
There's probably also some actual business logic that government orgs want to and are legally permitted to keep secret. In the OP's case of a parking ticket database, maybe there's software talking to that database, whose source code includes the logic of picking when / where parking inspectors should conduct a "random" blitz of issuing fines.
> maybe there's software talking to that database, whose source code includes the logic of picking when / where parking inspectors should conduct a "random" blitz of issuing fines.
Oh yes, and that "random" blitz of issuing fines definitely doesn't have any racist part to its algorithm. Just trust the government on that one. The government and the "business" what wrote the code in the first place. Yup, makes sense.
Great read. Frustrating that the court ruled that a schema was a file layout, since I don't think it is, but at the same time if it didn't fall under that exception, there is a strong arguments that would be considered "documentation pertaining to all logical ... design of computerized systems". A schema is literally, the logical design of the database, and the database is a part of the computerized system. Once it was ruled that those examples are "per se" exempt it was a long shot to argue that schema wasn't covered by any of the examples.
I completely agree with you that (unlike/despite the Supreme Court ruling), database table/column schema design (and other system designs) should fall under the Illinois statute as "documentation pertaining to all logical and physical design of computerized systems". It's interesting that the law did pick up on that distinction between logical and physical design but none of the parties described in this article did. Logical/physical designs are not just about servers and integrations, they are also about data.
I'm not sure why that wasn't argued by the state and the state argued the database schema was a "file format". Per my reasoning, the state still would have won, but for different reasons.
I disagree with you slightly however and would say that the schema table/column names should be considered not logical but "physical design" while the business naming/meaning of tables would be a "logical design" (or conceptual design). See Wikipedia: https://en.wikipedia.org/wiki/Logical_schema
SQL injection is really about physical schema designs, not logical ones (I do get that every bit of information including business naming of tables/columns helps in an attack, but it does change the degree of threat and thus the balancing tests of the risk which are relevant per the definitions and case law described in the original article.)
So in terms of what the law /SHOULD/ be, the law should not include logical design as a security exception, only physical design. It /SHOULD/ be possible for citizens to do FOIA requests and get a logical understanding of all the database fields without giving them the SQL names that can accelerate SQL injection attacks. In that way citizens could ask for the data by a logical/business-named handle rather than a physical one.
And the state should create logical models or provide data dictionaries with business (not technical terms) on request as part of their FOIAable obligations to their citizens for the data they are maintaining.
My 2 cents as someone designing database schemas for 25+ years.
Schema is definitely software, a operating protocol, source code, and file layout. Maybe also documentation.
A schema isn't software in the sense imagined by the ILGA. If it was, every Excel spreadsheet would be too, and Excel spreadsheets are the basic currency of FOIA.
An "operating protocol" is a step-by-step list of things to accomplish some action. It's a finite state machine for humans. Obviously, a schema isn't that; a schema is declarative, and an operating protocol is imperative.
The court definitively established that SQL schemas aren't source code in the sense imagined by the ILGA. SQL queries can be. Schemas are not.
See downthread for why a schema isn't a file format. In fact, a schema is almost the opposite of a file format.
A court will look at the term "documentation" in the ordinary sense of the word; as in, "a prose description and set of instructions".
"Associated with automated data processing operations" isn't an element in the statute; it's a description of all of the elements.
If the Excel spreadsheet has formulas in it, it's software. If you're just talking about the data in the sheet, i.e. what you'd get exporting it as a CSV, then it's not.
Col types, unique/FK/PK constraints, default values, and computed cols define the steps for handling row inserts/updates/deletes. Even adding a uniqueness constraint to an already-unique col will change how the code interacts with it, specifically how it deals with concurrency/locking. If they said it has to be an imperative programming language, then it's not that.
If they said the schema isn't source code then ok, but I still think it is.
I assure you that Excel spreadsheets with formulas in them are FOIA-able in Illinois. Since we can take that as axiomatic, I think we can put "schemas are software" to bed.
SQL schemas aren't Excel spreadsheets.
That's fascinating, but you just claimed Excel spreadsheets were "software" in the sense of the Illinois FOIA statute definition, and they are not. QED.
You said that SQL schemas aren't software, and that's what this lawsuit was about. If they explicitly say that Excel docs (even w/ formulas) aren't software, I think they're wrong, but that doesn't matter because Excel docs aren't SQL schema.
Now if you want to go by Illinois definitions, SQL schemas are file layouts, that's why the plaintiff lost.
Again: the post explains why the court determined schemas to be file layouts, and none of it involves any of the logic you've supplied here. Even Chicago didn't try to claim that a schema was a "software".
They didn't need to. In the first appeal, it didn't matter because it didn't jeopardize security. In the second appeal, they said it's a file layout.
You also said SQL schemas are declarative, not imperative. Those are types of programming languages, so software.
An Excel formula should be considerd a kind of software, because you cab do code golf in it.
I think a schema will definitely be part of the source listing, either in the main programming language source code or in a some other file used to define or initialize the database. But I don't think it is software, any more than a protocol is software. Software does something.
One tricky aspect of this is that even if the schema itself as a higher level concept doesn't fit into any of those definitions, all existing instances of the schema are likely considered either source listings or documentation. So the instances are barred from release per se, and you can't ask the government to create new documents.
The schema defines how the DBMS sets up its tables and such, so it does quite a bit imo. And if the schema isn't stored in any doc cause just manually punched in CREATE TABLE once, yeah what you said about creating new docs.
How is a database schema not a file layout?
The article describes why. 2 different db engines (or even instances) can use different file layouts for the same schema.
In many was sql is all about divorcing the schema from the files.
But on the other hand, in all database systems the schema is used to determine how the files are laid out. Although I suppose the same thing could be argued for any data that is stored in a file, excepting that a schema is metadata that determines the organisation of data so it's a bit of a special case.
In a Microsoft Word document, the section headings also tell Word how to lay out the Word document file.
Do you mean that section headings aren't a file layout? That's their entire purpose.
Edit: If you're talking about the byte representation only, I don't think section headings indicate the placement of the body's bytes.
Does your interpretation not mean that(coupled with the court ruling that file formats can't be foia'd) any document with sections cannot be requested via FOIA?
If this format is reused across many files, they might want to give the contents of those docs in a different format from the original.
You have found an argument that proves too much.
Yea coupled with the courts arguments the interpretation of sections in a document as a "file format" means no files with sections can be released via FOIA requests
Arguably, all requests for files could be returned with all of the letters in the document but scrambled in a random order soas to obfuscate the file layout.
There's a solid chance that the schema gives away what DBMS is being used. But even if it didn't, I'd still call it a file layout in this context.
The gov't releasing the hardware and software licencing used in CANVAS already gives that away.
The DBMS is almost definitely going to be mentioned in RFP or specification documentation. As it was in this lawsuit.
So?
So if you have the schema and the DBMS, you probably know how data is arranged in the files ("files" in the filesystem sense).
Is your argument that government agencies should also withhold the names of filing cabinet manufacturers? :)
Just that it's a file layout. Or even if you strictly define a file layout as say an ext4, NTFS, or FAT file tree, that revealing the schema is revealing the file layout.
I don't know why they don't want to reveal file layouts, but for whatever reason, they decided it was "per se" exempt regardless of the security implications.
It's obviously not a file format. The same SQL schema can generate N different files, with N different layouts, for N different databases. By the logic you're using ("schema" + "database vendor" = "file format"), a Word document outline is also a file format.
The parent asks "how is it not a file layout" not "can you guess the file layout?" given it.
I am a human, you know I have a kidney, but I am not a kidney.
If you send a copy of the code, is that sending the code? If it is, what about sending a copy of the code with a Caesar Shift?
Another way to think about it is that if a SQL schema is a file, so is an Excel spreadsheet template.
It's interesting that the opening analogy in the post uses an Excel spreadsheet as a great way to explain a database. It's such an easy next step to say the way an xls/ods file is saved is a file format but the column layout in the tabs/tables are the schemas. The court (and the city) playing these games is so scary since it is so biased toward all modern government data being covered by FOIA exemptions.
File or file layout? Cause both of these are probably stored as files, .sql and .xltx respectively.
An Excel spreadsheet template is an arrangement of rows/columns/cells which is encoded in a XML document which is encoded in a ZIP file archive.
I don't follow your point.
Yes, it's a file format.
(Kinda a file format inside a file format inside a file format.)
"Excel" is a file format, but my point is that if a schema is a file format, so are the contents of an Excel spreadsheet.
The schema describes the database layout. The file layout (if you were going to call it that) in a modern RDBMS would describe how the RDBMS implemented a particular database layout as described by the schema.
It literally does not describe a file, and does not literally describe the data layout of anything on disk (though with enough knowledge, you may be able to infer facts about probable layouts).
> does not literally describe the data layout of anything on disk
Huh? Depends on the DMBS, but each InnoDB table is a file.
And the schema determines the file structure.
Schema is an abstraction over the file structure. Different RDBMSes will use different file layouts for a given schema. The same RDBMS may even have different engines that use different file layouts, or may change file layout between major versions.
"Determines" is too weak: it must be "is". If "schema is file layout" is true, then sure, a schema is a file layout. But if it is merely "schema determines file layout", then no, a schema is not a file layout.
Abstractions are notoriously leaky in DBMSes. First off, they don't even use the same SQL spec. Give me a schema that uses anything Postgres-specific, and I can tell you what the bytes on disk look like for a given row or index.
I think it's a moot point anyway because the language is broader than just files in the filesystem sense, which is basically what the court said too.
> but each InnoDB table is a file.
A table isn't a schema, it is a component of a schema, and most databases don't use InnoDB.
> it is a component of a schema
So if you have the schema, you have the tables.
Because it doesn't describe how data is laid out on disk.
Neither does a file layout. FS will decide that... even then, not physically.
We're talking about "file layout" at the application level, not the filesystem level.
But your comment illustrates just how difficult it is to nail these things down, based on inherently imprecise language.
So you mean the filetree and file contents, as seen by userspace program?
It's meant to be imprecise, because they didn't want some "gotcha." If they say we won't reveal the disk layout, technically you can't tell that from the filetree. If they won't reveal the filetree, but this is SQLite, it's always a single file. If it's file tree + contents, well the CPU byte endianness might matter for some DBMSes, even though you could just try both.
We can't FOIA details about how xls file laid out internally, despite that xls file being FOIA'ble itself. That's the file-format we're talking about.
> Each spreadsheet has a header row, labeling the columns, like “price” and “quantity” and “name”. A database schema is simply the names of all the tabs, and each of those header rows.
This is also how I explain it to my relatives, I'm kind of surprised this analogy (one so direct that it's almost literal) didn't fly with the judges.
If database column names cannot be revealed, then shouldn't that mean the state is also able to redact the headers of all their spreadsheets?
Knowing a spreadsheet header doesn't help an attacker gain access to that spreadsheet in any way. Knowing SQL column names may give an attacker an advantage in accessing a database.
Compare: "Knowing the writing style of current employees may give an attacker an advantage while phishing, therefore, we cannot turn over any memos or emails whatsoever."
Ditto for the org-chart.
Per the post, this also wouldn't fly.
> Believe it or not, there’s case law on “would” versus “could” with respect to safety. “Could” means you could imagine something happening. But the legal standard for “would” is “clear evidence of harm leaving no reasonable doubt to the judge”. The statute set the bar for me very low and I managed to clear it.
Reminds me of Shall versus May in RFCs. (Though those are, of course, statements of obligation rather than natural consequence.)
It's a reverse vlookup
> Congratulations! You now understand databases.
Data engineering: doing a lot of fancy work to make a very simple product
Random thought: someone should drive to Chicago, get a parking ticket, and then make a FOIA request for all of their information contained in that database.
It won't be the whole database schema, but it would be a start.
Short answer -- already been done.
This (spoiler) visualization's going into my eventual post about the lawsuit: https://observablehq.com/d/026992341cc47ff0
How were you able to stand as an expert witness when you have a personal relationship with the plaintiff? I don’t know the specifics of the law in Illinois, but my understanding is that that would generally be a disqualifying conflict of interest.
I have this cousin, Vinny, who's a lawyer, and he was able to use his girlfriend as an expert witness. Both sides agreed she really knows her stuff because that's what really matters.
I suppose I need to change all my column names to random 16-character strings so I don't leave my database insecure!
There is no fredom of information if the public is not allowed to know what data the government has.
> [Public bodies] shall provide a sufficient description of the structures of all databases under the control of the public body to allow a requester to request the public body to perform specific database queries.
I sure hope the impact of this is not that government entities switch to schema less databases!
"Schemaless" is like "serverless" in that there's always a schema, even if it's not enforced by the database and instead applied dynamically by the application layer.
In the new language proposed in SB0226 (as linked, didnt search for authoritative sources, can't tell how durable that link will be for posterity, arrgh archiving the web is hard etc), doesn't that language leave open a hole for excessive complexity to be a reservoir for FOIA resistance?
Feels like there is an important theme here that SB0226 is dancing around --could government be legible in addition to being "plain-text" transparent?
"plain-text description" of "each field of each database of the public body" and "specific database queries" may not do what you mean.
Not sure how to fix it though.
I could see gratuitous ORMs and database-of-databases patterns winning tax dollars with taunt-them-with-the-schema listed as a feature.
>just self-important message-board hedging
I can confidently say it does not stop at message boards for many people, self included
It's a real issue when writing an affidavit or testifying. Lots of ingrained bad habits.
When a law is ambiguous by wording, why do they never ask the people who drafted the law what was intended?
That would be against the separation of powers doctrine inherent in all Western democracies. The job of the legislature is to write the law. The job of the judiciary is to interpret the law.
Besides, when the law is ambiguous, it's very often because the legislature themselves weren't sure what they intended, and/or because the legislature had deeply divided views and arrived at ambiguous wording as a compromise, and/or because the legislature used their "somebody else's problem" prerogative i.e. they said "let's leave that for the courts to decide". Ambiguously worded laws isn't a bug, it's a feature!
I don't see how it could break separation of powers, especially if a legislator could provide minutes and/or a paper trail of discussions and revisions pointing the intent in a certain direction. You know, like evidence. The legislature surely has intent while writing the law, otherwise what would be the point in trying to interpret it, and the whole thing being litigated is the authors intent. I don't think the separation of powers doctrine presupposes that the legislature has no idea what their goals are while writing laws, that would be quite an insane assumption to bake into our system, and broken by design. And in this case, I very much doubt it was left intentionally ambiguous, since FOIA was clearly intended to help people get information from obstinate government agencies. What would even be the point in writing the law if obstinate government agencies are supposed to be able to weasel around the ambiguity behind a comma? Regardless, if we are able to ask the people who spent time drafting it, we could ask. There might even be a paper trail!
The current sitting ILGA is not the ILGA that passed the statute.
They are probably still alive, shouldn't be that hard to find. They have no problem giving subpoenas to other witnesses or soliciting expert testimony.
> where the only way to get at the underlying data is to FOIA a database query
Was this ever attempted?
Yep, that was done in the FOIA request related to this lawsuit:
https://www.muckrock.com/foi/chicago-169/canvas-database-sch...Yeah, it's obvious the double standard here, then. Curious indeed why they are so adamant to keep the schema/data secret.
Because they know that eventually the data contained in that table is going to be used to support some sort of lawsuit that their parking enforcement activity is biased, and is targeting people of color.
It's already ridiculous that they spent several years blocking this request while it went through court. If the plaintiffs spoke to pretty much anyone involved in maintaining the system, or with any of their internal infosec people, they would know that there's no real security risk to releasing this information.
They've already spent orders of magnitude more time and money litigating the issue than it would take to just release the information in the first place, so this is clearly not a cost or resourcing issue.
They don't want to release it because they'd prefer it's secret, because secrecy makes it harder for the public to hold them accountable. That's all.
There is an explanation for the fight that doesn't involve something nefarious with CANVAS (though I think CANVAS is dodgy from talking with Matt).
The precedent set here will let data journalists (like Matt) setup effectively automated FOIA workflows on _any_ database they can get the name of for a FOIA request. So even if _this_ db isn't dodgy it enables any of them that are to be found quickly.
Or even less cynically, its just going to cost a ton of resources to respond to all those automated FOIA requests.
I said in another comment but I suspect the column names themselves are incriminating (basically saying this person doesn't get a ticket because they are in a special club, that's probably not technically legal)
is_cop bool not null default false
Public bodies tend to just want to resist FOIAs for the sake of resisting them. I've never really been able to fully understand the motivations, even after a decade of FOIA litigation.
I think it is likely to ne about budgets. That is, sure, FOIA and similar state laws usually allow the agency to collect something related to actual costs, but that's mostly meaningless since even if actually covers staff time it doesn't retroactively give them staff to cover it in the impacts areas, and often the FOIA volume doesn't effectively feedback into legislative budget processes for future staffing either, while their litigation needs are more likely to feed back into the legal staffing levels, so approving FOIA requests drains working resources in the area covering them in a way that fighting them does not in the immediate term, while fighting them also has the longer term benefit (from an agency perspective) of discouraging future requests.
In my experience (and probably in Matt's) this has 100% not been the issue. The people responsible for the FOIA responses aren't in any way connected to budgeting or resources. It is just a body-wide personality issue. Some aspect of maliciousness mixed with laziness... or something.
> In my experience (and probably in Matt's) this has 100% not been the issue. The people responsible for the FOIA responses aren't in any way connected to budgeting or resources.
In my experience working in government, including on state-equivalent-of-FOIA requests, almost everyone working on those kinds of requests is “involved in” budgeting and resources, and more to the point anyone in a position to sign off on a decision of whether something should or should not be denied as exempt is a manager, for whom (that is, for any manager, down to the line level, over any function in any government agency, but FOIA-type requests, eepecially if there are going to be assertions of exemptions in total or in part, generally involve coordination and signofffs between multiple managers, e.g., from the most relevant line unit, the public information unit, and legal) managing budgeted resources and doing the work of justifying requests for additional resources that is the root of the agency-initiated budget change request process, and then participating in drills and internal analyses and responses as those proposals work through the budget process is a central part of their job.
Interesting takeaways from me:
All that pompous sounding legalese can still be ambiguous! I feel less bad for not understanding contracts that have 100 word compound sentences.
Legal people can't keep up with our tech jargon but they have their own jargon including "predicate" lol. So same logical thinking, different jargon framework.
Question: why do they want the schema not the data?
Because once you have the schema you can issue FOIA requests that include queries for them to run.
Could you ask them to run an introspection query? Something like SELECT * FROM information_schema.tables?
What if you guess common table names? Wonder if they send back the error message.
Oh wow! If that is necessary, that is so kafkaesque!
"I want your data"
"What data?"
"What do you have?"
"Ha ha. No. Tell me what you want"
"Your data that is the metadata of your data"
"Well actually..."
...
You can't ask public bodies to do research for you. That's the public policy balance in our FOIA laws: you can get almost anything (and: talk to Matt, you really can get a lot of stuff), but you have to be specific about what you're asking for, and it has to be "at hand" for the staff responding to the request.
Clerks fielding FOIA requests have SQL consoles "at hand"?
They send emails to IT. The classic example of a thing you can get through FOIA is large-scale dumps of emails from Exchange Servers, which is also not something a Clerk can do themselves, but which IT staff can immediately retrieve.
Leave the "Clerk" bit of this out and just imagine you're requesting straight from the IT department. What you can do: get anything not otherwise exempt that they know how to retrieve (it usually helps to provide example commands in the requests). What you cannot do: ask them to go look around and see what they have. That's research. Research is your job, not theirs, under Illinois FOIA.
If research is my job, and looking around is research, then couldn't I look around and see what they have instead of asking them to do so?
Maybe? If you work there, I guess? Or if you're really nice to them? But they're under no obligation to help you. The tradeoff in Illinois (and most other good FOIA law): you can get almost anything you want --- way more than most people think --- but you can't get public staff to go do research work for you.
Again: this is why pulling schemas is so valuable.
> [...] where the only way to get at the underlying data is to FOIA a database query.
Can you request the desired information using natural language, based on your guesses of what information they store?
Probably not, because then you'd be asking them to go do research. You FOIA for specific documents and records.
Juxtapose this legal process with DOGE hoovering (in more ways than one) data willy-nilly from everywhere. The dissonance between THIS uninteresting DB schema being so rigorously protected while massive amounts of sensitive data is completely misappropriated is painful.
> I’ll conclude this long piece by saying (1) obviously the bill should pass, and (2) it should be called “The Chapman Act”.
(3) I imagine Chicago greatly regrets towing Matt Chapman "over a facially bogus ticket".
Anyone with a legal background willing to opine about potential workarounds to this ruling?
Specifically, would a request for “data field labels” (i.e. a column list without any table structure info) likely circumvent the exemption?
I think that would run afoul of
> The one big limitation of Illinois FOIA (with FOIA laws everywhere, really) is that you can’t use them to compel public bodies to create new records.
Unless for some reason they already had a list of columns without table structure.
(Not that I claim to have a legal background)
I had that thought too, but my naive rebuttal would be that the column data already exists by default in any standard RDBMS as information_schema.columns. No new record creation required.
Yes, but that requires someone to execute a query on a database and package it as a report?
Yes but what if we come up with a directive that every FOIA request must be logged into a DB. Therefore every request is automatically invalid as it requires we create a record!
/s
Not a lawyer, but why not use opensource as an example? Many successful public e-commerce websites have public schemas and aren't all hacked.
Got to see this happen day by day on the Midwest Venture Partners Slack. There was another lawsuit Chappman and Tom did for laser based speed detection in Chicago.
This is part of what discouraged me from going to law school. So much of litigation is Kabuki theater, grant rhetoric not in any way intended at achieving a just or logical outcomes, but designed only to the person in power an excuse to decide however they had already wanted to decide before the case was tried.
> So much of litigation is Kabuki theater, grant rhetoric not in any way intended at achieving a just or logical outcome
Agreed, that is what this sounds like. What stood out to me is the remark »“only marginal value” is just self-important message-board hedging«: it's also simply correct, but the author concluded that they shouldn't have said it because "marginal" plus a bunch of explanation didn't have the rhetorical value that "no" would have had
Someone could legitimately configure a WAF-like system to scan for various ways of querying the database schema coming in as HTTP requests (keywords like "information_schema", encodings thereof, etc.), which will always be hacking attempts and can be blocked. If you already have the schema, you can craft a query without needing to bypass that restriction first. Is this likely to be a serious barrier at all? No. Is it anything to do with self-importance? I don't see how that's the case, either. It seems simply correct that this is marginal (situated in the margins, not the point, not important to discuss), but by saying nothing but the truth, now the other side blows that up to something much bigger and tries to get the court to agree that, "see, their own expert says it has value!" And so this expert concludes that they shouldn't have said it, that they should have just said "no value" which I would say is wrong, but so marginally wrong that it's hard to prove for the opposing side that it is not fully correct, and thus being less correct helps you in (this) court... so it's about rhetoric as much as being an expert...
What stands out to me about this article is the time between court appearances. Seems like if you want to accomplish anything in court you need to be prepared to spend years of your life on it.
Can confirm this is the case everywhere. Even before taking anything to trial, one can spend months on trying to come up with a mutually agreeable solution, in my case getting seemingly one step further each time¹. I'm not sure I'd not just give up and move on with my life if this dragged on for years and wasn't about something that majorly impacts my life or that of a loved one
¹ Details: it was a warranty case, so first they agreed to repair it, then they didn't do that (but maintained that they were going to, whenever I asked about the status), then they agreed to refund, then they didn't do that, then I set a deadline, they iirc agreed, then they didn't pay, then I included specifics of what my next steps would be (lots of research here, seeing what even my options are and what I can truthfully claim that won't get shot down by a judge later) if they didn't pay before some other deadline (so I showed I was serious now), then the deadline crept up and they finally refunded the day before it would expire and I was frankly disappointed because, by now, I was prepared and ready, and all I got was the original sum that I had paid them. I checked the legal interest rate and changing my demand to include that simply wasn't worth wasting more time on this, and I didn't find any sort of precedent that I could bill any time I provably spent, not even to the value of minimum wage, so any time you invest is just lost free time (which I didn't have much of during that particular year). Protip: scroll down the reviews before buying something worth more than a few tenners from a small store. I wasn't the first person who had to threaten litigation...
And of course, people and entities (private or as in this case public) who have a lot of resources take advantage of that, a state of affairs which often serves to perpetuate injustice indefinitely.
[dead]
I thought the same thing. Sure it's async but still you have to keep this in your mind for a very long time.
Do stored procedures count as part of the schema? I've recently found a SQL injection vulnerability in a client's SP that was using concat (very badly)
Should have used mongodb in the first place.
lol'd so hard at this
> Does the “would jeopardize” language in the statute apply to everything in the exemption, or just to the nearest noun “any other information”?
I think law and lawmaking would be vastly improved if only lawyers learned the miracle of parentheses.
Comma's can be expensive, too.
Wowzers, that was a lot of words to express something that's very simple.
A database schema is just an empty form. By looking at an empty form, you know what fields have be filled in, what type of information they'll contain, etc.
Of course people making data requests need to know what forms are being used to collect and store information.
As for security - not letting people do anything because 'it might be dangerous' is bonkers. The way to secure databases has been known for decades. Let's start living in the 21st century :)
The whole back half of the post is about why the analysis is not as simple as you suppose it is. We had no trouble establishing at Chancery Court that schemas don't endanger security. That's not why the case failed at the Illinois Supreme Court. The IL Supremes did not decide spontaneously that schemas actually are dangerous.
Does disclosure of a database schema really jeopardize the security of the system? Yes
How plausible or likely does that jeopardy need to be? Very
Does a database Schemas constitute “source code”? Yes
Is a SQL schema a “file format”? No & yes. In that order.
And, finally, does the “would jeopardize” language apply to everything in the exemption, or just to the nearest noun “any other information”? Yes
am I the only disappointed there's no mention of little Bobby Tables?
sql injection court seems more fun than slave court where they tell you spending anything above 5 is a crime lmaooooo
I got to about 1/3rd of the way before I noticed my eyes were kinda struggling to read the article. Toggling different CSS rules, it's the #333 gray color. Turning that off is instantly better. The custom font is much thinner than the default, but that by itself doesn't seem to be the issue if the color is (closer to) black. (There is also a font-weight rule, but toggling it makes no visual difference in Firefox. Maybe the text is intended to look different?)
Since there is no contact method on the website, figured I'd mention it in a comment; hope this helps
[dead]
[dead]
[flagged]
[flagged]
I think you have an unrealistically high bar for who is suitable to be an expert witness. People who are not even remotely experts are often trotted up as "expert witnesses". OP is very easily an expert in his field; the only issue is that his communication style is not quite tuned properly for legal matters. Which shouldn't be surprising; that's the case for pretty much anyone who isn't in the legal profession, doing this sort of thing day in and day out.
And I think this is the correct state of affairs. The kind of person who does have their communication style tuned for legal matters probably engages in so much legal work that they aren't doing enough work in their field to truly be considered an "expert".
If you say "Even I don’t know what I meant by that" ... that's not really communication "tuning" now is it?
I don't expect someone -- even an expert -- to have perfect phrasing. But if they can't even tell you what they meant to say? How is that unrealistic expectations?
The only problem seemed to be that he was unable to rule anything out, no matter how unlikely, because he is honest and an expert. He lacked the dishonesty and false confidence that we demand from an expert witness within an adversarial justice system.
No he didn't. The grandparent comment here was just a snarky put-down. No part of my testimony was impacted by a casual write-up I did about it 4 years after the fact.
The author already said he messed up. What are you adding by saying this?
Occasionally on HN, you will see comments that callout a portion of the article as being particularly important or insightful.
[flagged]
Why not? Conceptually, that's literally what it is. Rows of values labeled by columns.
Spreadsheets are poorly structured. Different entries in the same column can have different data types. There is no concept of a superkey, so duplicates are allowed. There is a concept of ordering by row/column number which does not necessarily exist in a DBMS. Querying facilities are generally poor.
Now you can kinda fix this by restricing the type of column, etc. but most people don't bother
They are good at what they do - quick manipulation of relatively small datasets. WYSIWYG printouts with decent formatting and charts. But they are only a "database" in the same way that say, a bunch of random data is.
For a quick 30-second explanation of what a "databases" and "schemas" even are in the first place for non-technical people, it's more than "good enough", and spreadsheets are the most common example that people are generally already familiar with. Unique keys, typing, etc. really isn't relevant here, especially not in the context of what the court case is about. The important bit to get across is that it's a 2D table with rows and columns, and that's all there is to it (that is: it doesn't include the source code to query it).
Excel sheets are databases. That's their purpose. They store rows/cols like an RDBMS. They allow joins and constraints, including uniqueness. There are even backends that use a spreadsheet as a DB. What else do you want?
This was fine, legally, but I'd be pretty irritated if someone I knew wasted everyone's time on this. The schema clearly is (marginally) useful for hacking, but who cares; it clearly is a file layout also, but who cares; those matter legally but not morally. Morally, this is just dumb: it's not something they really needed, and they're just irritating people and wasting resources for the fun of it. Shameful.
No. I'm involved in local government, and on the citizens commission where we keep track of our our municipality (adjacent to Chicago) stores and manages information. I'm acutely familiar with how people are spending their time in these organizations, and what is and isn't a big lift for them.
Increasingly, year over year, more and more information that would previously have been stored in filing cabinets or shared drives is moving into turnkey applications that municipalities buy and enroll all their data in. Those applications are opaque. But almost all of them are front-ends to SQL databases.
Being able to recover schemas from publicly operated databases is vital to keeping public records and data public, rather than de-facto hidden from inquiry.
Matt's suit was anything but a waste of people's time. Hopefully, it'll result in a change to our state law.
Just because the article gets into fine details doesn't mean it's silly. They're working with what they have.
But after reading more, I agree. The point of FOIA in the first place was "access by all persons to public records promotes the transparency and accountability of public bodies at all levels of government." Not "pushing FOIA statutes to their limits, sniffing out buried data and bulk-extracting it with clever requests."
If he's just asking for his own parking ticket records, ok. This isn't in the spirit of that. Separately, I agree that the SQL schema is software, a type of file layout, marginal attacker benefit, and other things in that exemption, and I'd say that again as an expert witness.
See here: https://news.ycombinator.com/item?id=43176625
FOIA requester responded in comments saying they received a tip indicating illegal practices, and noted in his article that he had previously uncovered evidence of over-policing in black neighborhoods.
I think a file layout describes the exact arrangement of bytes in a file. A schema is higher level. It describes what is stored, not how it is stored. A database could be one file, or a file per table, or a file per column. Data could be stored across multiple drives.