Why SQL is beating NoSQL, and what this means for the future of data

blog.timescale.com

405 points by nreece 9 years ago

This was a nice read for the history. Honestly, being in a big legacy industry (Insurance), it's as though NoSQL never happened. We're too big, at least my organization, to have made the wholesale change and we've been plugging along mostly in Teradata and DB2 for a long time. Teradata in particular has performed well across a variety of use cases, its only large downside being the cost.

Anyhow, only point I'd add is that I'm not sure that SQL-like languages added on top of NoSQL solutions was really that bad of a thing. It's not as though there's one flavor of SQL to rule them all on RDBMSs. I get that the bones are largely the same - SELECT, FROM, WHERE, most join types, aggregates, etc. But take a really common analytic use case - persisted transactional data. Something like this -

SELECT A, B, C, FROM SomeTbl WHERE SomeDate BETWEEN SomeBegin AND SomeEnd QUALIFY RANK() OVER (PARTITION BY A, B ORDER BY SomeOtherDate DESC) = 1

Pretty sure that SELECT, FROM, WHERE will work just about anywhere. That QUALIFY clause is written for Teradata, and once you get into the ordered analytic functions (as one example), you need to know the implementation relevant to the RDBMS you're dealing with. It's not uncommon for a large enterprise to have MS SQL, DB2, Teradata, Oracle, etc. all under one roof. As a data engineer, you still need to know the differences - first to get the data out, and then to optimize.

Dunno, I could be way off on this, but that's been my experience at least.

I think the bigger issue is that you can use an RDBMS to do NoSQL-like things, such as key-value stores, with the flexibility to structure your data if you need that later. So why not start with a relational database?

Clubber 9 years ago

>So why not start with a relational database?
I asked the same question. The biggest reason I've heard is RDBMS's don't horizontally scale well, meaning you can't easily have 50 replicated nodes across the globe and expect it to perform well, or setup easily, because it's fairly complicated with an RDBMS. There's things like Oracle's grid or SQL's high availability clusters, but they get complicated fast, particularly when you need a bunch of nodes.
Obviously it was a solution for a very large amount of unstructured or semi-structured data that needed to be redundant across networks. With NoSQL, the key/values are a lot simpler to replicate apparantly.
It's a Google/Twitter size problem that most industries wouldn't have, but since it's the new shiny thing, you know how that goes.
- SmellTheGlove 9 years ago
  
  > It's a Google/Twitter size problem that most industries wouldn't have, but since it's the new shiny thing, you know how that goes.
  You nailed it on both points. Very few organizations are going to need that kind of scale, but many want to think that they will!
  Even when you get to horizontal scaling, you can simply replicate what needs to be replicated. Key-value is really just a subset of relational data, right? If so, I bet we can query and push only that data out across the nodes.
  It's not to say non-relational models don't have a place, but they're the exception as far as I'm concerned. I prefer to start with relational and only move off of that if I can justify it. As I said upthread, though, I work in insurance and you might imagine relational works well for us.
  
  marrs 9 years ago
  
  Very few organizations are going to need that kind of scale, but many want to think that they will!
  Further to that point, many software engineers know that NoSQL is a bad fit for their organisation but use it anyway because they wish to gain the experience required to work with Big Data in the future.
  
  flamtap 9 years ago
  
  Nothing helps you learn new tech quite like fighting to the death with it!
  
  Clubber 9 years ago
  
  Also, the unstructured data is another thing. For example, lets say you need to store a bunch of contracts (as in your industry) in PDF/TIFF or whatever. The filesystem isn't a really good place to store it because you have a loose coupling between the index in the table and the file on the filesystem. That can get messy and unless security is tight, some developer or admin could muck something up on the filesystem. This is particularly dangerous when you need to keep contracts over a period of time (say 7 years). If you have a really large amount of files, you'll either need a huge array, or figure out a way to distribute them across several NAS's or whatever, then that gets complicated.
  The RDBMS is a little better at storing this type of data, but it has it's own problems. Backups quickly become huge and a massive undertaking. Development databases now have a bunch of missing data (because who wants to replicate that down to lower levels), you still have the same problem when you have a large amount of data that you need to shard across NAS/arrays, etc.
  The NoSQL seems like a third option that may well be worth looking into. Again this is for unstructured data that you just need to reference and not necessarily query against. (like stored contracts).
  
  oldandtired 9 years ago
  
  Contractual information can quite often be quite relational in nature. I have worked for organisations that had custom contracts and custom sales documentation (different industries) that were based on templates that could be filled in from a database.
  In some cases, the companies were successful in storing the relevant information in a database and use that to generate the documents required based on document templates. This is not hard, but it can be tedious to set up. The companies also have to have a will to walk down that path and many do not see a benefit to this kind of process.
  It takes a very insightful designer to manage and lead this process as well as a client who has an incentive to move in this direction. One of the considerations is that the approved contract must be in some unalterable form that can be tested against the contract generation process from the information within the database. So that there is always a means of ensuring that any information in the database cannot be corrupted (in a legal sense). That is a matter of safeguards not database information storage capabilities and there are many different ways to do this kind of security.
  The problem of sharing of data and replication is not often thought through properly. Oft times, I have seen replication being performed wholesale instead of piecemeal. Similarly for data sharing and the consequences have been less than efficient. Oft times, the actual requirements for sharing and repliation are very small and it is a matter of understanding what is actually required and what is desired for the business.
  Finally, very little data is unstructured if you look at it correctly. Many times, I have seen people talk about unstructured data (including documents of all kinds) and when you look at the data, it is highly structured. It becomes a matter of seeing what that structure is and working out what is variable and what is constant data within the structure.
  
  dhd415 9 years ago
  
  It's proprietary and therefore not an option for everyone, but SQL Server's "Filestream" feature [1] was designed for the "large blob of unstructured data" use case (small blobs of unstructured data generally work fine in regular tables). It stores large blob data efficiently in individual files directly on disk that are still managed by the database engine and written/read in the same transaction as table data. It's well-integrated with their other standard features such as backups, HA clusters, etc. It's a pretty impressive feature.
  [1] https://technet.microsoft.com/en-us/library/bb933993(v=sql.1...
  
  nwatson 9 years ago
  
  Postgres has the BLOB functionality, but also has the JSONB data type that not only stores structured JSON data efficiently, but also allows indexing on the contents of that data. So you can use BLOBs and explicitly store in the table what you need to search on, or else you can avoid that and take the pseudo-NoSQL route and store indexed JSON.
  
  sk5t 9 years ago
  
  > That can get messy and unless security is tight, some developer or admin could muck something up on the filesystem
  This isn't a great reason not to use a filesystem--it's a reason not to let staff have at the filesystem like cowboys, and not to avoid replicating and backing up the keys and values on the filesystem as if they are really important.
  That said, there isn't really a solid replacement for a database transaction involving both fields + blob, but most situations are okay with, say, stowing a blob in S3, and later making a half-hearted attempt to delete it if the associated database transaction fails.
  
  koolba 9 years ago
  
  This is a perfect example of not having to use the same hammer for everything. The "standard" solution to this problem is to store the files themselves in cheap yet durable object storage (S3, NFS, etc) with a reference to the unique ID or URL in your primary database. The record locator is small (which is good for the RDBMS) and can be transactionally updated in the DB with any changes that involve that object (ex: update contract associated with a client).
  There's no need to store the full details as a BLOB in your DB. That has nothing to do with SQL, NoSQL, distributed, or centralized. It's just a vanilla separation of immutable object data (the raw bytes of the PDFs) and application state (the record locator).
  
  SmellTheGlove 9 years ago
  
  > For example, lets say you need to store a bunch of contracts (as in your industry) in PDF/TIFF or whatever.
  Contracts are not as unstructured as you would think. At least in my part of the insurance world, most of them are produced by a contract writing system of some kind. Think of it as assembling as a series of contract provisions, each having a valid set of choices to select. Use your auto policy as an example -
  Liability - $100k Comprehensive - $10k / $1000 Deductible Collision - $50k / $1000 Deductible
  Yes there are other attributes captured, but you see the pattern. It's provisions and boilerplate. The use cases needing the imaged executed contract where the attribute/provision soup is not sufficient is uncommon.
  The operational terms of the contract exist in some other system (and database) where it's consumable in some structured way.
- nathan_long 9 years ago
  
  > With NoSQL, the key/values are a lot simpler to replicate apparently.
  If all you need is a key/value store, I can see how that simplifies replication. Eg, if you have the key "foobar" and a 12-node cluster, you convert it to a number using a hash function, do "module 12", and that tells you which node to store the value on. If you want to store it on 2 nodes, do this again with a second hash function.
  That's lovely - as long as you don't need to join records, and never will. It's a risky prediction.
- tarsinge 9 years ago
  
  > The biggest reason I've heard is RDBMS's don't horizontally scale well
  Not true anymore with "NewSQL" databases (Google Spanner, CockroachDB, ...)
  
  tyingq 9 years ago
  
  Both are laudable. But, Spanner is closed source and hosted on a high egress cost cloud. So use cases are limited. And Cockroachdb is promising, but in it's 1.0 release. Follow the current bug list and performance numbers...you wouldn't use it just yet for something important.
  
  vgt 9 years ago
  
  While I agree that there are situations where "network egress" can become a large part of one's bill, I encourage you to think of the technical impact of having something like Spanner and something like the Google network.
  A large part of what Spanner does to sustain its high cross-regional availability at strong consistency relies on Google's powerful network [1]. Spanner isn't just source code - it's vertically integrated custom hardware + software + Google network + Google SRE - you would have very limited use from Spanner if you had just source code.
  Example: Recently Google introduced "hot potato" network tiers that closer match AWS and Azure offerings (aka degraded Google network)[0]
  (work at G)
  [0] https://cloudplatform.googleblog.com/2017/08/introducing-Net...
  [1] https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...
  
  tyingq 9 years ago
  
  Sure. I wasn't trying to make it sound like a conspiracy. Just that the constraints of running it only on the Google cloud invalidate some typical use cases. For many companies, you would have to migrate quite a high percentage of your business data out there to avoid huge amounts of egress charges. Spanner also doesn't support INSERT, UPDATE, etc, which may also be limiting.
  
  vgt 9 years ago
  
  I think your point is a microcosm of the popular "cloud TCO" debate - benefits of higher levels of abstraction. Can you get Google's network elsewhere? No. Does it cost more? Sometimes. Is it useful? For a lot of folks running products and services very much so.
  Higher levels of automation and abstraction (Spanner) may cost more on paper, but when you start calculating the cost of dealing with shoddy network, cost of maintaining your own service, cost of equivalent to what Google SRE gives you, cost of upgrades/maintenance, sharding/replicating your MySQL database and so on, your TCO math may look a little bit different.
  I'll opine that network egress is just one of the many factors that you should think about when evaluating a technology like Spanner. Kind of like price-performance. I encourage you to look at cost-value, not just cost.
  (work at G)
  
  tyingq 9 years ago
  
  None of that is my particular issue.
  This is specific to moving a largish companies data to the cloud. Typically, nobody wants to move it all at once. They want to move one thing first. Think like a trucking company. They have databases of customers, drivers, equipment, orders, invoices, quotes, etc, etc. With the current model where egress traffic is priced so high, it's difficult to move one of those subject areas to the cloud. The reason is because none of them act like an island. They are used together. So unless you move it all at once, huge amounts of data go back and forth all day long.
  Not every customer with concerns is just someone that needs to be educated about how awesome Google is :) I do like Spanner. I'm just pointing out a barrier to adoption.
- autokad 9 years ago
  
  RDBMS are such a production kill.
  imagine you just received two TB data sets, and you'd like to join them and run a process. In hive, you create the schema (pretty simple) and join them. queries may take a few minutes but you get to doing the work you intended all along.
  In RDBMS, you have to fus with its schema, "you said 125 characters for that string field and you just received 130, FAIL". Its a STRING JUST DEAL WITH IT. you have to create indexes and fus with all sorts of relational nonsense when it has little to do with the work you are trying to accomplish.
  take your billion row database, and now you need to add a new field. its going to read the entire table into memory and re-write it back to disk, meanwhile you aren't doing the work you were trying to accomplish.
  relational databases are great for small static data, but when you deal with new large messy data every day, they sap all your time.
  
  taco_emoji 9 years ago
  
  > "you said 125 characters for that string field and you just received 130, FAIL"
  Most (all?) RDBMS's have something like a varchar(max) which accepts reasonably large strings.
  > you have to create indexes and fus with all sorts of relational nonsense
  No, you don't.
  > its going to read the entire table into memory and re-write it back to disk
  Probably not, if you're adding a nullable column.
  
  collyw 9 years ago
  
  Creating an index in one command is a bonus of using SQL databases. Messing around creating yet another database seems to be the equivalent on NoSQL.
  
  autokad 9 years ago
  
  you obviously have not used databases very much if you think you can join 2 billion row tables on a key that is not indexed in a reasonable amount of time.
  
  testvox 9 years ago
  
  So? It's not like a nosql db without indexes will be any faster at it.
  
  autokad 9 years ago
  
  yes, its absolutely faster. try it. Granted a nosql db like hive is 'cheating' by taking advantage of lots of disks/hardware on lots of machines, but i routinely join billion row tables with million row tables and wait only a minute.
  
  taco_emoji 9 years ago
  
  > Granted a nosql db like hive is 'cheating' by taking advantage of lots of disks/hardware on lots of machines
  That's not "cheating", it's literally the only relevant factor.
  Who would win in a fight: a superintelligent ninja cat with laser nunchucks and body armor, or a dog?
  
  Clubber 9 years ago
  
  My cat beats my dog in fights all the time, and he doesn't have any of that training or gear.
  
  dragonwriter 9 years ago
  
  > In RDBMS, you have to fus with its schema, "you said 125 characters for that string field and you just received 130, FAIL". Its a STRING JUST DEAL WITH IT.
  While standard SQL may have this problem, most concrete RDBMS do not, because they support a simple length-unspecified string type (or, in the case of SQLite, treat column types as documentation and mild hints rather than actual rules.)
  > you have to create indexes
  Absolutely not. You can, for efficiency, but (ironically, unlike some noSQL systems that require explicit indexes to support queries) you can query and join tables of relational databases without defining explicit indexes. This may, for some queries, be extraordinarily inefficient (it may not, if you've defined the schema well, because many RDBMSs implicitly create indexes based on schema features like PKs and unique constraints), but you can do it.
ojosilva 9 years ago

>I think the bigger issue is that you can use an RDBMS to do NoSQL-like things, such as key-value stores, with the flexibility to structure your data if you need that later. So why not start with a relational database?
I've tried that in the past and failed miserably.
1) values in a key-value table will endup needing to hold nested data structures, such as a JS object/hash. Ie. mykey={...}
2) turning values to JSON (or some other serialization) makes it impossible to concurrently update or search subkeys.
3) so you convert your complex (sets, hashes) key-value data into several rows that hold one subkey per row, so now you have updatable rows but still no indexable solution and a serious performance problem.
4) so you create a multi column type-segregated table (one column for each DB type) for indexing your key-values and making them searcheable. That also requires a metadata table to go with it so that you know in which column your key or subkey is indexed.
5) say you successfully implemented your key-value store with your RDBMS. You still don't have usable joins (you don't have relational data) or a real migration path out of your key-values.
Trust me, don't just put keys and values in a relational DB. Start with the right tool for the job, either make your schema relational from the beginning or use a proper KV or document store.
- kazagistar 9 years ago
  
  1. Postgres has a json and jsonb datatype specifically to be your values.
  2. It has built in query and modification tools that can manage the nested objects and array and whatnot.
  3. It let's you build indexes to help with all this, including reverse indexes for asking something like "is this string in the array inside this object".
  4. You don't have to specify your schema in sone horrid way, you can just throw what you want at it and specify later, and get performance numbers that are competitive and even exceed some popular NoSql stores.
  5. You can mix and match freely with all the niceties of a proper SQL database with transactions, joins, constraints, etc, as you see fit to evolve.
  So yeah, don't just throw your keys and values into some text table somewhere; but use instead one of the best proper kv stores, that just happens to also be one of the best sql databases, and figure out how you want your schema you need it.
  
  ojosilva 9 years ago
  
  Postgres JSON(B) falls nearly in the same category. Do not use Postgres as a key value store, use a KV store for that.
  Postgres JSON is great for storing the original JSON data coming from external APIs and Webhooks, such as Stripe payments. But it's not a silver bullet and really not a first class citizen. SQL query syntax for JSON data is awful, non-standard and requires casting types, which defeats the purpose of schemaless. I believe Postgres JSON types solve a different use case: an app based on a relational schema that needs to store JSON structures eventually.
  I don't recommend starting off your app basing it on Postgres JSON. Use the full relational Postgres instead and do it right from the start.
  
  Mikushi 9 years ago
  
  Same can be said of MySQL. Not comparing to Postgres, but I've been amazed at how many people do not know that either of these as a powerful JSON type.
  
  ivanhoe 9 years ago
  
  Yes, too many devs I know haven't even heard about generated columns and virtual indexes in mysql.
- memsom 9 years ago
  
  I did this too. The issue with using an RDMS for a KV Store is that the data you are storing often has to drop to lowest common form, or the database structure gets wildly complicated. So we initially had:
  subject (varchar), key (varchar), value (varchar)
  But that gets very messy very quickly. So we added a field to specify the value's type. That worked, but then there's an overhead reading and writing the data - because there's a constant conversion. So we then added the most common types as extra value fields. That was the best option. But then the problem is that it gets fiendishly complicated to extract the data. I mean, this was the simplest version that never made it to production. In the end, reporting killed the idea. Because using SQL Server Reporting Services and needing to pivot the data to create any reports became a massive performance issue. Around that time EF Code first was coming in the .Net world, so we went that route instead. And really, I feel like it was the right solution.
- primeblue 9 years ago
  
  Modern SQL servers implement this
tomnipotent 9 years ago

>> Pretty sure that SELECT, FROM, WHERE will work just about anywhere
Cassandra & CQL is a prime example. CQL become the de facto interface for getting data out of Cassandra, even without the benefits of JOIN's & sets.
- pixelmonkey 9 years ago
  
  CQL is not SQL. CQL does not even have SQL SELECT. There is no support for arbitrary WHERE clauses, for example. You need to either slice an ordered column, or select a row by primary key -- and nothing else! They have an ALLOW FILTERING option which lets you "do it anyway", but this reduces down to paging the whole dataset without the WHERE applied, and then doing client-side filtering.
  I always felt that CQL did more harm than good for Cassandra. It's not like Redshift -- which implemented a full mostly-Postgres compatibility layer. Cassandra's CQL "looks like" SQL in the same way JavaScript "looks like" Java. Similarity by superficial appearance only; under the hood, totally different.
  
  stubish 9 years ago
  
  I think the opposite, and that CQL is the only reason people still adopt Cassandra today. Remember Thrift, which started by using terms like columns and rows for completely different concepts and went downhill from there? I felt it was a very interesting choice to have CQL only capable of expressing queries that Cassandra could perform quickly. It forced people to adjust their data models immediately, rather than defer the problem until the performance issues got too deep to ignore. Its a different mindset to relational databases, for different sorts of problems. If only they had used the same sort of restraint on other features attempting to make Cassandra do things it is unsuitable for, such as 'transactions' and counters, which are just an embarrassment of problems.
  
  tomnipotent 9 years ago
  
  You missed the point - both SQL & CQL are declarative languages.
  >> CQL does not even have SQL SELECT.
  https://docs.datastax.com/en/cql/3.1/cql/cql_reference/selec...
  This sure looks a lot like SQL SELECT. It returns a projection of data. Beyond that it's just pedantry.
  
  pixelmonkey 9 years ago
  
  Not supporting WHERE, GROUP BY, all aggregate functions, almost all other "standard" functions, most forms of DISTINCT, ORDER BY (except one order defined at table create time), all forms of sub-select/CTEs, etc. is pedantry?
  OK. :)
  
  ma2rten 9 years ago
  
  It's not just a subset, but there are also incompatibilities like limit limits the number of rows scanned not the number of rows returned.
INTPenis 9 years ago

The thing about the NoSQL trend is that it's still very valid in some areas. Even big and slow companies have some project or supplier that uses a software with a NoSQL service in it.
You just rarely see those bundled redis or elasticsearch servers that are crucial for some app to do its session management and search engine.
The most disturbing part of the NoSQL trend is that people are treating it like a battle between two competing systems.
I have at least one major system under my own belt that uses relational SQL for backend data, ES for search engine and cassandra for TSDB.
I need and trust all those services to work as one unit.
- SmellTheGlove 9 years ago
  
  > The thing about the NoSQL trend is that it's still very valid in some areas. Even big and slow companies have some project or supplier that uses a software with a NoSQL service in it.
  This is a good point and I didn't do a good job of acknowledging it in my prior comment. There are use cases for NoSQL, but to me it's a tool for a narrower set of circumstances. My default is relational, unless I have a clear idea of why that isn't the right fit.
jjirsa 9 years ago

The reason people don’t start with a relational DB has nothing to do with the data model or query language - people choose databases like Cassandra because scaling to 500 Postgres or MySQL instances holding a combined petabyte of data is horrific, but it’s dirt simple in things like Cassandra
The query language is a side effect of the underlying storage engine - you don’t choose it because you want a key value store, you choose it because you want horizontal scalability and cross wan HA
- 6nf 9 years ago
  
  You make it sound like many people are using petabyte scale databases. This is not even close to true. ESPECIALLY when you're just starting out. Most new projects will start with almost no data at all and the vast majority of projects will never ever get to the petabyte scales.
  
  jjirsa 9 years ago
  
  Many people ARE using petabyte scale databases.
  They don't start out that way, but they will be if they're successful, and if they're not they probably shouldn't be running their own databases anyway (go RDS it up or something).
  
  happymellon 9 years ago
  
  What? No they aren't, I work with satellite imagery processing which is quite large in it raw data format, and after a decade we are not dealing with petabytes of active data, hundreds of gigs for a full earth coverage.
  Before that I have held positions in finance, dealing with realtime transaction processing. We did not work in petabytes.
  If you are working in petabytes you are storing crap in your production database, and 99% of that data is wasted.
  
  altstar 9 years ago
  
  Yeah right. We are using nosql in manufacturing. One second data for thousands of instruments create a lot of data very quickly. And that is per plant. Having a lot of plants at various sites in the world adds up.
  
  jjirsa 9 years ago
  
  Exactly this. Whenever you collect millions of things, even at low resolution, it adds up quickly
  Collecting high resolution and filtering down to low resolution for lower importance is still difficult and can require petabytes for real world use cases
  
  GrayShade 9 years ago
  
  > What? No they aren't, I work with satellite imagery processing which is quite large in it raw data format, and after a decade we are not dealing with petabytes of active data, hundreds of gigs for a full earth coverage.
  This get large rather fast when processing time series from multiple sensors.
  But I agree with you otherwise, and it's not like you're storing satellite products in a database.
  Now I'm curious what NSA is using :).
  
  ddorian43 9 years ago
  
  https://accumulo.apache.org/
  See (History): https://en.wikipedia.org/wiki/Apache_Accumulo
  
  GrayShade 9 years ago
  
  Thanks, I wasn't aware of it.
  
  EvilGrin 9 years ago
  
  You could just check their Github? :)
  https://github.com/nationalsecurityagency
  
  eberkund 9 years ago
  
  Unless the amount of data you are collecting is increasing at a vastly unsustainable rate, you will be making megabucks and be able to afford a team of engineers to undertake a major redesign for version 2. But picking technologies based on scale at the onset is going to make getting to that point all the more unlikely.
- bildung 9 years ago
  
  Postgres (as an example) has no upper limit for database size, and the upper limit for a single table was 64TB a while ago, perhaps higher now. And you can buy stock hardware with 6 TB RAM and 76TB SSD e.g. from Dell.
  If you max out that single server, you could easily hire one database engineer to care about horizontal scaling.
  
  jbergens 9 years ago
  
  It doesn't help when the nbr of queries and updates increases a lot. For many systems having TB of data also means having thousands or millions of users.
  You also probably want backups and possibly slaves that are updated inside transactions to avoid data loss. These things can lower your throughput even more.
  
  jjirsa 9 years ago
  
  Show me a single person who runs 64TB on a single table on a single node, and I'll show you someone who will cry because of a prod failure sooner than later.
  
  throwme211345 9 years ago
  
  Great. Now my 10G prod network can start replicating some fraction of 64TB at rate while you masturbate on the hardware. Utter nonsense.
- collyw 9 years ago
  
  The vast majority of hipster NoSQL developers I have debated with are using a single machine.
  
  jjirsa 9 years ago
  
  Those people are objectively wrong.
  
  collyw 9 years ago
  
  You try and explain that to them, but whatever database they are using "is fast". Not that I have ever had a problem with relational databases being slow. Certain queries, yes, but a bit of rewriting them and proper indexing usually solves that.
eximius 9 years ago

Not familiar with teradata, but that looks like the equivalent of
select a, b, c from table where somedate between somebegin and someend and someotherdate = (select max(someotherdate) from table group by a, b)
which will work in all sqls. My point being, the advanced features make life nice, but are by no means necessary.
(would be more efficient as a join, but not sure if joining on subtables is always supported...)

craigkerstiens 9 years ago

I don't think the article could have said it much better.

SQL is super powerful and makes much sense in so many ways. Nearly all apps have a relational structure to them and SQL is a reasonable way to interact with them.

Some of my favorite conversations from the Postgres community 5-6 years back were when they were talking about a time when Postgres was being disrupted. The gray bearded DBAs (Hi Berkus and others) were talking about JSON like it's fad and how it's going to come to pass. They were saying so because they'd heard about this disruption before... there was this new thing XML and these document databases were so much more powerful. They waited a few years and added an XML datatype [1]... And then XML databases came and passed.

At first they scoffed a bit on JSON as the new hip thing, but came around a little in Postgres 9.2, and then in 9.4 [2] we all really got what we wanted.

SQL has never been a particularly elegant language, but it's always been a powerful one. The lone knock against has either been usability or scalability [3]. On the usability side that's a bit of a hard one as well it's not super pretty. But once you understand it's foundations around relational algebra or relational calculus you're so far ahead. On the scalability side there are definitely some options such as Citus, but also core Postgres is massively improving here.

[1] https://www.postgresql.org/docs/9.3/static/datatype-xml.html

[2] https://blog.codeship.com/unleash-the-power-of-storing-json-...

[3] https://www.citusdata.com/blog/2017/02/16/citus61-released/

unclebucknasty 9 years ago

I think I'm one of those graybeards.
I see it in so many things tech. It's a pattern, and once you've seen it repeat a half-dozen times and also gain a depth of experience over that time, you can actually recognize when something represents genuine progress vs yet another passing fad. Spoiler alert: those that are most rabidly promoted are often the latter.
But, if you try to raise the point in the midst of the latest fad, you generally get shouted down. So, you wait until the less-jaded figure it out...again. It was plainly obvious for NoSQl, just as it now is for SPAs (or at least our current approach).
Don't believe me? Wait 5 years.
And, the thing with SQL that's so much like other tech is that we largely figured it out years ago. Improvements have been incremental at best and, frequently, not improvements at all.
- matwood 9 years ago
  
  My beard does in fact have some gray ;) , and I've never stopped beating the SQL drum. One other thing you learn is that data always outlive whatever app is using it today. I want my data to be stored in a safe way, and self contained apart from whatever flavor of the month app is generating/using it today. RDBMSs are perfect for this.
- craigkerstiens 9 years ago
  
  Oh, I generally agree.
  I didn't mean the graybeard in a bad way at all.
  I feel a bit 50-50 on some of the reactions to JSON as a new thing at the time. But, also the practicality of how JSONB finally came to exist, the foundation it was built on and the new index types[1] in Postgres to leverage it felt like it came in the right time and in the right way.
  [1] http://www.craigkerstiens.com/2017/03/12/getting-started-wit...
  
  unclebucknasty 9 years ago
  
  >I didn't mean the graybeard in a bad way at all.
  Oh, no offense taken. What you read in my comment was genuine recognition on my part that I've likely crossed over into the graybeard camp--both in experience and disposition.
  >I feel a bit 50-50 on some of the reactions to JSON as a new thing at the time.
  Yeah, there have certainly been real improvements in tech that are perfectly reasonable. And, actually, I think it's the incremental improvements that add up over time which actually comprise the bulk of our real progress.
  I think where things get off track is when we make sudden pronouncements that we must now cease doing everything we've done heretofore with tech Y in favor of new and shiny tech X. Every use case. Period. Whether it's killing XML because, JSON or killing SQL because, NoSQL or adopting J2EE for every "enterprise" app, or never writing Vanilla JS again because, Framework Fill-in-The-Blank; the list goes on.
  There's just a tendency in the tech community to lurch in different directions with this extreme vigor. But, when you really think about it, once the foundations are laid there just aren't many quantum leaps that truly merit such strong reactions.
  And, what's really strange is that we somehow manage to overlook glaring weaknesses in the new tech, and likewise disregard (or even impugn) the strengths of the tech we're displacing in ways that are obviously unfavorable. For instance, suddenly deciding we don't need ACID transactions. What?
  You see that stuff and it's an automatic, "yeah, this will all blow over soon". We then generally a.) forget it ever happened or b.) settle into assigning the proper use-cases, adopting/integrating the best properties of the tech, and moving on. Incremental.
  
  wolco 9 years ago
  
  It is greybeard or graybeard?
  
  Wildgoose 9 years ago
  
  American spelling is gray, English/UK spelling is grey.
  Easy way to remember is use "a" for "American" in the American spelling, and "e" (for English) in UK (and presumably the rest of the Commonwealth).
  
  dwringer 9 years ago
  
  In American English, both "grey" and "gray" are considered acceptable and it's not uncommon to see both used here in my part of the US. Personally I usually write "grey" but if it's a particularly dark shade I might say "gray" - but that's personal preference rather than any convention I've seen elsewhere.
  EDIT: There do seem to be a couple of blogs that come up in Google pushing the idea of a difference between US/UK spellings, but that's definitely not something I or most people I know grew up with. Other search results indicate that there is only a modest correlation with location. Perhaps it goes more one-way, with both being in common use in the US but "grey" exclusively preferred in the UK?
hvidgaard 9 years ago

> SQL has never been a particularly elegant language
I hear that often. But SQL is close to a 1-to-1 mapping with sets of data. It's refreshing to use something as math like when programming. I find that elegant.
- kazagistar 9 years ago
  
  You might find the criticisms in this article interesting (on the jooq blog, a java query builder library).
  https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-tr...
  
  hvidgaard 9 years ago
  
  Indeed that is valid criticism. I suppose after using SQL for a number of years, the logical order is just ingrained. I would be nice for it to be required to be written in the logical order though.
- xeonoex 9 years ago
  
  There's a lot of simple improvements that could be made to the syntax. Some DBMSs (mostly Postgres) do make syntax improvements, but I'll never understand why Oracle still doesn't support "is <not> distinct from". If I want to do a join where null == null is true, I have to write
  (NOT (a <> b OR a IS NULL OR b IS NULL) OR (a IS NULL AND b IS NULL))
  instead of "a is not distinct from b".
  The query structure is odd as well. I think it should more like
  from <tables> [declare] <calculations, scalar variables> [group by] [where] select
  Where declare is a section to declare variables (or virtual variables), so that you do not have to rewrite a calculation every time it is used. You could also declare scalar variables there. There's also no reason to have both a where and having clause.
- matwood 9 years ago
  
  I agree. Of course any language has its warts, but when dealing with data I don't know of a better language than SQL.
- pjmlp 9 years ago
  
  I like that it feels very declarative and that the imperative extensions are all kind of Ada like.
x0x0 9 years ago

Some of the resurgence of sql has to be the size of data you could fit on a single machine / fake with a distributed system increased.
But postgres I think continues to be at the forefront: first, the article misstated slightly; jsonb gives you not only full text search (who needs that?) but indexing all the way down. So you can eg say I want records where "field1 -> field2 has an element named md5 with a given value...". It turns out that a lot of plain normalized tabular data plus a medium about of schema-free jsonb really is the perfect combination for most purposes.
- danudey 9 years ago
  
  The biggest change for us was SSDs coming down in price.
  Whereas before I might need four read slaves to ensure that at peak load I'm handling all my transactions within X ms, now I can guarantee it on one server.
  More importantly, in our industry where we're vastly more write-constrained than read-constrained and we're faced with e.g. MySQL not being able to easily spread writes over multiple servers simply, the appeal of something like MongoDB or Cassandra with built-in sharding and rebalancing to spread out both reads and writes sounds very appealing.
  And again, I can move from a giant complicated, expensive, heat-producing multi-disk raid10 to a pair of SSDs in RAID1 (or better) and easily meet my iops requirements.
  Without being able to upgrade to SSDs I think we would have been looking into other systems like Cassandra a lot sooner, but right now we can pretty easily throw some money at the problem and it goes away.
pjungwir 9 years ago

Hi Craig! Postgres refers to itself as an "object-relational database", but do you think the "object" part is meaningful today, or just a label left over from the 90s? As best as I can tell, it just means that Postgres has table inheritance, which is great for partitioning but has never been all that feature-ful. (I guess it is improving a lot with pg 10 though!)
Is there any theory or academic research on what an "object-relational database" might mean? Or just a buzzword (like Snodgrass titling his book "Time-Oriented Database Applications", a la object-oriented)? (Edit to add: In other words, is "object-relational" something that still gives some kind of principled direction to the project, something that informs decisions made by the core contributors?)
I sort of wonder if nowadays with JSON Postgres has become a "document-relational database". :-)
- ww520 9 years ago
  
  Mike Stonebraker started Ingres to build a relational database. After Ingres, he and Larry Rowe wanted to see what could be done to add the OO aspect and other complex data types in addition to the relational model; OO was starting to take off back then. They started the Postgres project (Post-Ingres) in UCB to research on object-relational database. It turned out the object-relational part was better done on the client side, evidenced with the legions of client side object-relational libraries. The relational part sticks around and becomes the better part of the PostgreSQL database.
  Edit: the object part was to build complex data types from simple type in the relational model, and had the database to manage the storage and querying of the complex object. E.g. build Location object from (x: int, y: int), and array of Location[].
  
  e12e 9 years ago
  
  I'll note that postgres is (sort-of) a storage option for one of the (rather) early object databases, the much maligned, yet perhaps under-used zodb:
  http://www.zodb.org/
  Still evolving in its old age, along with postgres:
  http://www.newtdb.org/en/latest/how-it-works.html
  Other than that I'm aware of gemstone/s:
  https://gemtalksystems.com/products/gs64/
  Which can be paired with the maglev ruby implementation for similar (but presumably higher performing, more featurefull) experience with distributed objects in ruby:
  http://maglev.github.io/
  I suppose that with jsonb, and a node/js server, one could indeed argue that postgres is (again) an object oriented db - but I don't think anyone would push that argument too strongly.
  
  snuxoll 9 years ago
  
  I suppose you could say Postgres did end up as an object-relational database still, composite UDT’s, array’s and domains are all features that match the original vision.
  
  pjungwir 9 years ago
  
  I think that is a great way of thinking about it! Composite types and user-defined types are very useful and it makes sense to see them as inspired by object-oriented ideas.
- craigkerstiens 9 years ago
  
  The ability to partition/distribute an object is far detached from Postgres or an object-relational database. I'll be the first to admit I'm rusty on some of my college days of what a true object/relational database means, and while Postgres breeds itself as that more at its core is ACID[1] which focuses on both some of the relationship of the data and consistency between it.
  As things have evolved it hasn't actually given up on ACID or the relational nature as even with JSON you can have complex constraints and guarantees about your data. The Postgres community has simply applied a lot of the underlying principles to what has been build on top.
  [1] https://brandur.org/acid
- anarazel 9 years ago
  
  > Postgres refers to itself as an "object-relational database", but do you think the "object" part is meaningful today, or just a label left over from the 90s?
  I think it's largely a leftover - the things that made it "object" back then are partially not used much, partially deprecated, partially removed, partially repurposed (inheritance as a partitioning mechanism).
  You could perhaps just do a s/object/multi-modal with a focus on/. But that's a bit awkward ;). There's a bit document store (json, xml), arguably some graph (recursive queries), plenty people just use it for KV, ...
  
  Twisell 9 years ago
  
  Except that maybe it's precisely because it was meant as an OO database that you can now enjoy jsonb or geography datatype that are pretty complex.
  Also a lot of OO mechanisms become really useful without being clearly identified. For instance every table automatically generate an object definition (aka composite type aka rowtype) that you can use anywhere to ensure that a return type conforms to this definition. You can also check and validate single row well ahead of an insert attempts filter out and store unmatching data structure in an generic human readable rowtype while accepting correctly formated entries.
  PS: I don't now if my examples were good or understandable but heck I tried, feel free to correct me if needed ;)
misterbowfinger 9 years ago

SQL is easily the best option at the moment, but unlike the experimentation in programming languages that we've seen in the last 10 years or so, we've yet to see much experimentation in query languages (as far as I know!)
ReQL, what RethinkDB uses, was an interesting direction because it allows people to start reasoning about a particular query. This, IMO, is the biggest thing that SQL lacks - the ability to reason about a query. I'd love to see a way to extend SQL that solves this.
The other thing that really, really sucks is that SQL has some terrible documentation and tutorials. Even for PG, for example, it just mentally takes a long time to parse through the documentation to understand how to use a function that solves your use case. This might just be a lack of Stack Overflow posts out there, but it's definitely not as friendly as, say, finding answers for Ruby code.
jbergens 9 years ago

You do have the impedance mismatch to OO languages, which are probably used in 90% of all applications.
Farter more, I don't really think that all systems have relational data, it's just possible to store it in that format. Sometimes 3 document types are easier to handle from code than 30-60 tables.
primeblue 9 years ago

XML databases lol
- sctb 9 years ago
  
  We've already asked you to please post substantively, so we've banned the account. We're happy to unban accounts if you email us at hn@ycombinator.com and we believe you'll post within the guidelines in the future.
  https://news.ycombinator.com/newsguidelines.html

manigandham 9 years ago

There's really no such thing as NoSQL first of all. Relational, graph, key/value (including wide column), document-store, etc. There are lots of database types but it turns out relational works 95% of the time and we're getting better at recognizing the use-cases for the others.

SQL is also just a query language, that's literally the name. Any database can implement it, not just relational. Is SQL a great interface for both OLTP and OLAP? Yes, it's proven itself over decades. Is it the only valid interface? No. Does it work with other newer/different data systems? Yes, Spark, Hadoop, Kafka, and even these new distributed relational databases are all examples of such.

It would be far better for the industry if we can get past these ideological and faulty "debates" and move on to more important things.

jevgeni 9 years ago

Thank you!

randomdrake 9 years ago

I don't buy it. The makers of the software that this appears to be a carefully written advertisement for, came to the same conclusion as the rest of the IT world (that some of us saw a mile away): NoSQL was, and still is, only good for very specific things in very specific cases; it's generally dreadful for anything that SQL engines could already do well.

The subtitle for this contains: "After years of being left for dead" and the author throws phrases around like: "And boy did the software developer community eat up NoSQL, embracing it arguably much more broadly than the original Google/Amazon authors intended."

Who? Where's the data? This blog post has a lot of links, references, and studies, but where's the data to back up the premise?

A quick search on Google Trends comparing SQL databases to NoSQL as an entire term, or any of the popular flavors of NoSQL, reveals that it is not even a blip in comparison.

But don't take my word for it, the author and their company had the "DUH" moment too (emphasis mine):

> ...we soon realized that we’d have to do a lot more work: e.g., deciding syntax, building various connectors, educating users, etc. We also found ourselves constantly looking up the proper syntax to queries that we could already express in SQL, for a query language we had written ourselves! One day we realized that building our own query language made no sense. That the key was to embrace SQL.

You might have had hubris stemming from discarding or not knowing all of the history that you decided to share with us in this blog post. A great deal of the NoSQL community was completely unpalatable for this reason. The folks who had been doing data for decades, and built stable and powerful systems on top of many prior decades of mistakes, built them for a reason.

And here we see the author trying to proclaim that suddenly SQL is back?

SQL never went anywhere. NoSQL is a neat tool that was developed and continues to be developed and probably isn't going anywhere. But the idea that NoSQL suddenly overtook SQL, and now SQL is seeing some huge resurgence, feels like it comes from the perspective of someone who only saw a window into the last 6-8 years of development.

The king is dead. Long live the king.

coldtea 9 years ago

>Who? Where's the data? This blog post has a lot of links, references, and studies, but where's the data to back up the premise?
If you had been following startup blogs and HN, then you don't need any more data to back their premise.
It's not like total data matters anyway -- what's important is what use cases people regularly encounter in their periphery and the part of the industry they work on, which might not be what some overall data will show.
I don't care for example if NoSQL only caught on with 1% of developers while 99% of Fortune 500 enterprises and companies in rural Idaho and southern Mongolia used trusty old MS-SQL Server.
For most of us here around HN, judging from posts, comments, and discussions, the NoSQL era was very real, in the kind of companies and environments we knew.
- heedlessly2 9 years ago
  
  >For most of us here around HN, judging from posts, comments, and discussions, the NoSQL era was very real
  so you're favoring buzzword hype over actual production environments?
  Who cares if people are talking about it but not really using it?
  
  coldtea 9 years ago
  
  No, I'm favoring actual buzzword-oriented production environments -- which is most of us work.
paulddraper 9 years ago

I think if there were data to support the authors perception, it would be number of blog posts written.
tn_ 9 years ago

There have been a ton of start-ups that I've talked to / been a part of who used MongoDb thinking their company is going to exponentially explode in MAU and they think they'll save themselves the scaling troubles by using NoSQL. What ends up happening is the codebase gets too gnarly when they try to start doing complex analysis. SQL is appropriate for like 95% of companies. A lot of these places I'm referencing end up porting their codebase to Postgres, mysql or oracle.
- jbergens 9 years ago
  
  Complex analytics should probably be done in a db separate from the production db anyway. The analytical db could be an RDBMS, a datacube or something else.
  
  tn_ 9 years ago
  
  Absolutely agree that if you're going to be manipulating the data w/ heavy complex analysis as well as a ton of data-points, it should occur not on the prod db.
  I was referring to queries such as: User needs to get a list of pharmacies they've gone to in the past 2 months that carries x-medication. Not overly complex, but annoyingly more code that would have to be written + architected in NoSQL vs a relational db.
  
  degyves 9 years ago
  
  That's when having mongodb+RDBMS with the added complexity is really justified. On common cases, just drop mongodb and use the RDBMS for everything.
ianamartin 9 years ago

Thanks for writing this. I was about to write almost that exact thing. It's a weird interpretation of database history with a ton of stuff left out. My best guess is that the author of this article had all the data that would be needed to back this up, but it was stored in Mongo and he couldn't find it when he started writing.
Clubber 9 years ago

It's a marketing piece, no doubt, but he is explicitly talking about SQL the language on top of NoSQL, not SQL as in RDBMS.
But to your point SQL the language or RDBMS was never dead, it just wasn't the prettiest girl in the room for all the new GI's that just shipped in.

memsom 9 years ago

About 15 years ago, I worked on a team with a product that used an object based database. No SQL didn't exist, but it was kind of in a similar vein. The database was a pure object storage and there was an index b-tree to make sense of it. We could store anything at all in the database. As long as it inherited from a specific class. It used the object serialization features of the language (Object Pascal) to read and write the data chunks. It had a query layer that allowed collection interrogation of data with specific values set. But, it was also very problematic. The structure was fairly chaotic. Ignoring the specific implementation, it taught me that databases - despite the rigidity of the storage mechanism, were a bloody good way to keep data organised. And SQL, despite the warts and inconsistencies of the major dialects, is actually a very powerful way to query data. When you strip away the integrity and the ability to easily pull data out in a uniform way, it makes things a lot less fluid.

le-mark 9 years ago

It's almost as if "everyone" forgot that before SQL (relational database) there was only NoSQL databases of various sorts (hierarchical, document, etc) and SQL and the relational model arose to address their shortcomings.
Personally, I knew stuff had gotten stupid when I sat through a presentation by a gemfire evangalist who advised everyone present to just "do your joins in code". If you need to join data you should be using SQL.
- thomascgalvin 9 years ago
  
  > It's almost as if "everyone" forgot that before SQL (relational database) there was only NoSQL databases of various sorts (hierarchical, document, etc) and SQL and the relational model arose to address their shortcomings.
  I think at least 25% of modern software development involves slowly, painfully re-learning lessons we intentionally forgot ten years ago.
- IgorPartola 9 years ago
  
  To be fair, I have seen plenty of cases where doing a join in code is actually better for performance reasons. And I am apparently not alone since Django’s prefetch_related() does exactly this. Reading a two million rows from a database is faster than one trillion.
  Having said that, I’d your database doesn’t support joins, you have no option and are stuck having to do it in application code and that sucks.
- Too 9 years ago
  
  There are cases where doing a join in code can make sense. E.g. when you already have a whole table cached in application memory, and the primary purpose of the join is more ORM-like and not any kind of aggregation or grouping on a joined column.
- DonnyV 9 years ago
  
  If your going to join data that usually means the application needs a representation of the data that is not naturally stored in 1 table. To me thats a red flag which tells me this "logic" should be stored with the app, not with the database.
  
  astine 9 years ago
  
  Joins are the thing that make relational databases worth using. It's not about creating the correct representation of the data; it's about providing the correct data with the correct relationships. The power of the relational database is that it can accommodate data of almost any shape and allows for just about any kind of access. Embedding the 'shape' of your data in your application makes it more fragile and gives you less control, not more.
  
  slfnflctd 9 years ago
  
  It's hairsplitting at that point, though-- somewhat reminds me of similar issues in biology with classifying organisms.
  Most SQL database packages are designed to have a wide range of queries stored with the database (many will always only exist in front ends or developer tools, of course), for several reasons. One reason is that it sheds light on data structure and business logic. Another is avoiding needless duplication, since a particular snapshot which seems only to be needed today may often be needed again.
  Perhaps it would be better if there was more tidy separation of data vs. logic, it's hard to say until someone finds a way to do it that's a clear improvement for everyone. We have to work with the tools we've got right now, though, and straying too far from established practices just makes your work harder for the next person to decipher.
  
  marktangotango 9 years ago
  
  It's most certainly not hair splitting. If the data isn't natural to one table, as the GP says, then it's strong indicator of two things:
  1. NoSQL of any kind is not the solution
  2. Joining at the location of the data is always preferable to joining in application code due to the Principle of Locality[1]. The amount of data sent over a wire should always be minimized.
  [1] https://en.wikipedia.org/wiki/Locality_of_reference
  
  tynpeddler 9 years ago
  
  I'm not sure that point 2 is always clearcut. Because of the cartesian product issue, searching for large amounts of data with multiple joins can produce a huge number of returned rows containing a lot of redundant data. I've had a few small optimization issues that I fixed by breaking up one large query with several joins with 2 or 3 queries.
  
  oldandtired 9 years ago
  
  Something we discovered years ago in relation to joins in all of the SQL DBMS systems we used (from the lowly MS-Access to ORACLE and SQL-Server). The order in which you create the joins in the SQL determines how bad the cartesian product cardinality is. The DBMS's seemed incapable of rearranging the joins internally to get the smallest number of records.
  We had quite a few instances where by a judicious change in the join structure of the SQL meant a many order of magnitude change is speed due to the decrease of the cardinality.
  Our process was to test different orders of joins in problematic SQL and see what effects would be produced by the DBMS. It usually required we knew how many tuples were in each of the tables and how they would be restricted. The upshot was that it should have been possible for the DBMS to do its own analysis to produce the best result. We found none of them capable of this task in any meaningful way.
  
  tynpeddler 9 years ago
  
  That's strange. While the cartesian product is not commutative (because of the order in which items appear in the final product), the cardinality of the result result set should be independent of the order in which records are joined because the cardinality is obtained by multiplying the cardinality of each constituent set.
  https://en.wikipedia.org/wiki/Cartesian_product#Cardinality
  
  1as11ads 9 years ago
  
  wut?
reboog711 9 years ago

What makes you think SQL didn't exist 15 years ago?
SQL has been around since 1974 according to Wikipedia. There was a ton of SQL Work during the first dot com era of the mid-90s.
On another note, I did a lot of Lotus Notes working during that time, which was clearly a no-SQL database.
- sb8244 9 years ago
  
  Author means that NoSQL as a term / fad didn't exist
  
  reboog711 9 years ago
  
  I see that now. His space between no and SQL confused me.

pjungwir 9 years ago

I've always found it a little funny that SQL was originally designed for non-programmers, sort of like AppleScript. I used to think neither of those panned out, but in fact there really are a lot of smart not-programmers who can use it. At a company I work with many of the support staff have been learning SQL to help customers pull reports from our data warehousey reporting database. So maybe the article is onto something about the value of the language.

On the other hand, SQL-the-language isn't essential to relational databases. I have often wondered where C.J. Date has been the last few years. I actually love SQL, but it does have its limitations. I wouldn't mind a solid relational database with an alternative query language. It's such a missed opportunity for a great VC pitch: Tutorial D, NoSQL before it was cool. :-)

davidw 9 years ago
> SQL was originally designed for non-programmers
SQL sounds like something from the Star Trek original series era to me. Read it with a Shatner voice:
```
    Computer, SELECT course WHERE klingons = 0;
```
- majewsky 9 years ago
  
  Now that you mentioned Star Trek, I'll probably read every SQL query ever with the voice of Commander Data.
- bni 9 years ago
  
  And this is a bad thing? Sounds awesome to me, make the computer do stuff for me without specifying how.
  
  smt88 9 years ago
  
  It's very tedious to write and change a big query. It's really hard to read a big block of SQL.
  And it's bizarrely rigid about certain rules, meaning "English-like" is more of a hindrance because you're tempted to think, "This makes sense in English," even though SQL doesn't allow it.
  
  davidw 9 years ago
  
  Good? Bad? It's just... a thing from that era. Another go-round of "I know, if it reads like English, it'll be easy to use", which in the present day manifests itself in stuff like Cucumber.
elvinyung 9 years ago

(Disclaimer: I work at Databricks.)
I really believe that Spark's "more-than-SQL" query interface is how things should be. SQL is of course the gold standard and probably represents north of 90% of analytic workloads, but there a lot of queries that (especially for us programmers) are much easier to express procedurally/functionally, rather than purely declaratively.
- oldandtired 9 years ago
  
  Many years ago, I worked on a project that required changes made to a reporting system that the original author had used procedural means to do data selection and updates. It was a highly convoluted "mess". The section I had to update was going to take some time to analyse the code to ensure that the specific updates would work.
  I looked at what the code was doing and restructured into a set of queries that would do the same task in a simple sequence. It turned out that this simplification also sped up the entire update process by at least 1 if not 2 orders of magnitude.
  There are few cases where processing the data procedurally is better than using SQL queries. Most time that it has been done procedurally, I have found that there are better ways to do what you require by judicious thinking and using SQL (the bronze standard).
  I should say that my definition of procedural processing involves opening queries and processing tuples one by one. If you have a different definition then please let me know.
  
  elvinyung 9 years ago
  
  I mostly agree, but I think it fundamentally relies on a good query plan optimizer to figure out the proper order to execute joins, rewriting predicates, etc.
  So I think I take the position that the machine is not perfect [1], and doesn't always provide a perfect abstraction of a fast declarative answering interface. Sometimes you really do need to tell it how to access the data. This is why, for example, some SQL query engines let the user provide join hints.
  That said, I do agree that procedural queries are mostly a quick fix, and not very future-proof (against future improvements to the query engine).
  And FWIW, Spark's DataFrame API [2] is not actually that procedural; it lets you specify something that feels like a direct query execution plan, but actually still gets optimized underneath.
  [1] http://www.vldb.org/pvldb/vol9/p204-leis.pdf
  [2] https://spark.apache.org/docs/latest/api/scala/index.html#or...
- strictfp 9 years ago
  
  But can we do that in am efficient manner?
Blackthorn 9 years ago

To add onto this, you could theoretically call SQL a logic programming language and we've had good success teaching some of our non-programmers a different logic programming language (sort of like datalog) to do some work in.
Could be something there with logic programming. Or could not be something there.
gopalv 9 years ago

> So maybe the article is onto something about the value of the language.
The strongest feature of SQL is the lack of specificity about "How it should return results" and primarily deals with the details of what the expected results are.
I work on a SQL engine which honestly breaks so many rules of "How to do things" and basically tries to avoid nearly everything a sane SQL engine engineer would do.
But the advantage of the lack of forcing implementation is that a new idea like this could still implement the "give me expected results" part of the implementation.
Whenever it doesn't - it needs fixing, not documenting in a "vs vs" comparison.
> I actually love SQL, but it does have its limitations. I wouldn't mind a solid relational database with an alternative query language
MDX is a pretty interesting thing to think with.
Mostly because if you're used to spread-sheets, it is a more natural way of expressing what you want generally in a straight forward lookup order - get me some columns from these rows, where some condition is satisfied.
That makes index & cube lookups so much easier to detangle for an engine than a more free-form SQL tree which has so much more variety in it.
Clubber 9 years ago

Also, SQL was designed back when most programming was done in assembler or C. Compared to assembler, SQL is easy. Programming itself has gotten a lot easier since.
- skissane 9 years ago
  
  SQL was designed by IBM in the 1970s, to run on mainframes. The community by whom and for whom it was developed was focused on PL/I and COBOL and largely didn't use C.
  The first implementation of SQL in C was probably Oracle V3 (released in 1983). IBM's equivalent SQL offerings around the same time (SQL/DS for VM/CMS and DB2 for MVS) were written in PL/I dialects and assembly, and COBOL, Fortran and PL/I were supported as application languages. I believe IBM's first forays into using C with SQL was the development of OS/2 EE Database Manager in C, somewhat later in the 1980s. (The current DB2 Linux/Unix/Windows code base is originally descended from OS/2 EE Database Manager.)
  
  Clubber 9 years ago
  
  I was going by Oracle's "first commercial implementation," as a starting point. I found it it was originally written in Fortran, but later ported to C. I wouldn't have guessed fortran.
  https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_Q...
  Oracle was originally written in fortran and then redone in C, which it has been written in ever since. In fact, when we ported Oracle to the mainframe for the first time we were faced with an issue -- do we rewrite Oracle in assembler (there was no c compiler on the mainframe back then) or do we write a C compiler? We wrote a C compiler.
  On a side note, I remember watching an interview or reading an article (probably from Larry Ellison) who said IBM gave 1/3 of it's business to Intel, another 1/3 to Microsoft and the final 1/3 to Oracle. Pretty crazy.
  
  skissane 9 years ago
  
  By most accounts Oracle V2 was primarily written in PDP-11 assembler. Tom Kyte's suggestion it was written in Fortran contradicts most other sources.
  IBM's problem was that they were slow to commercialise the technology despite having a significant head start. IBM built multiple prototype RDBMS systems but they weren't willing to ship the result as a commercial product, and Oracle beat them to it. And even then when they did finally ship, their initial offerings were restricted to their mainframe systems (first VM/CMS and then MVS), while Oracle back then was open to porting their database to just about anything. IBM didn't really come to the cross-platform party until the 1990s, by which time Oracle was well established in that market.
  (Disclaimer: Former Oracle employee, although I never worked on the database side of the business, and I'm talking about stuff that happened when I was a kid, or even before I was born.)
dragonwriter 9 years ago

Tutorial D obviously isn't SQL, but its not noSQL, either; despite the name, noSQL refers to non-relational stores, not relational stores with alternative query languages.
But as much as I think it's better than SQL, I don't think Tutorial D (or D as a concept more broadly) offers enough to really displace SQL.
- ak39 9 years ago
  
  > noSQL refers to non-relational stores, not > relational stores with alternative query languages.
  Very well said. The concept of relational databases (based on relational/algebraic theory) is thoroughly independent of the dialects to access said databases. Currently it's SQL.
  
  oldandtired 9 years ago
  
  SQl is not relational for all the hype that ORACLE/IBM/Microsoft have put out over the decades.
  SQL fails to be relational because it doesn't deal with relations. At its base, it deals with bags of records and not relations of tuples.
  Tutorial D is relational but has a lot of warts. D4 was relational but changed to allow access to SQL databases.
  If my understanding is correct Rel (by Dave Voorhis) is a Java implemented Relational DBMS (a true RDBMS). There are a variety of projects also are designed to implement a true RDBMS. Another might be Andl (by David Bennett) and there is also SIRA_PRISE (by Erwin Smout). There are more and are documented at www.thethirdmanifesto.com, if you are interested.
  [EDIT - correct spelling erro]
adamnemecek 9 years ago

Check out datalog. Datomic seems promising. It's from Rick Hickey (the guy behind clojure).
EdiX 9 years ago

> SQL was originally designed for non-programmers
SQL was designed for mainframe programmers, made to look similar to COBOL and PL/I presumably with the idea that at some point in the future it could be integrated into one of those.
Its syntax has all the drawbacks of COBOL's syntax: too many useful words end up as reserved; it doesn't compose very well, leading to statements with very complex syntax rules; and it lulls users into a false sense of security by looking like natural language while being something very different (see HAVING vs. WHERE).
dgudkov 9 years ago

>I work with many of the support staff have been learning SQL to help customers pull reports from our data warehousey reporting database.
Why wouldn't you use a visual query tool for this? Tableau and similar apps generate pretty decent SQL queries nowadays. Even old-school BusinessObjects does a decent job (although requires way more initial modelling).
- richardknop 9 years ago
  
  SQL is just much more powerful. If your staff learns SQL they can pull out very complex reports easily, aggregating and combining, joining data. Expressiveness of SQL allows you to get better results you can get from any visual tool.
  
  dgudkov 9 years ago
  
  I deal a lot with SQL for reporting and analysis and wouldn't call a 300-line query easy. I found that visual tools can cover 90% of the cases for querying, merging, aggregating, filtering and calculating. They also help with hand-written SQL for the remaining 10% because you can use them to generate parts of SQL that can be used as subqueries or as a starting point in a hand-written SQL query. Also debugging complex SQL queries can be a pain, because SQL is a declarative language. You can't just put a breakpoint in a SQL query and see capture its intermediate state.
  
  richardknop 9 years ago
  
  I guess you are right. Also often you will need to aggregate data from various sources (not only relational database but log files, XML, CSV files etc). I have seen Ab Initio used for this before.
kristianp 9 years ago

I think Tutorial D would be more suitable as an ORM than a layer on top of SQL, I feel there would be less of an impedance mismatch if a Tutorial D implementation is a DSL. There is a Haskell implementation, but I'm not going to be doing work in Haskell any time soon.
coldtea 9 years ago

>I've always found it a little funny that SQL was originally designed for non-programmers
That's not true.
- magoghm 9 years ago
  
  "2. SQL is intended to be accessible to users without formal training in mathematics or computer programming. It is designed to be typed on a keyboard. Therefore it is framed in familiar English keywords, and avoids specialized mathematical concepts or symbols." http://researcher.ibm.com/researcher/files/us-dchamber/SQL-e...
  
  abiox 9 years ago
  
  Codd's paper itself says:
  "Closeness to Natural Language - Clearly, the majority of users should not have to learn either the relational calculus or algebra in order to interact with data bases. However, requesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language."
  
  coldtea 9 years ago
  
  I stand corrected.
  Had more in mind the relational calculus, for which SQL is just a flawed expression query language for.
matwood 9 years ago

> I used to think neither of those panned out
I've taught many business analyst to be quite functional in SQL.

HumanDrivenDev 9 years ago

NoSQL is a really terrible term. I don't know why so many people seem eager to the argue the merits of an incredibly disparate umbrella label that includes databases that have almost nothing in common. What meaningful things can you say about a category that includes Cassandara, Datalog, LDMB and Neo4J?

I think when a lot of people talk about NoSQL they just want to rant against a certain kind of strawman programmer. You know the one. Young, stupid, naiive, too arrogant to learn nth normal form or define schemas. This programmer probably uses nodejs or some other such heresy and only wants quick results, integrity be damned!

Don't get me wrong, Relational Databases are really good, and fit a lot of problems really well. But there do exist legitimate use cases (not necessarily scale!) where an RDBMS will simply be a poor fit, or a lot more work. Don't dredge up the strawman of the programmer too arrogant to learn SQL because you're too arogant to learn the merits of something that isn't SQL.

lmickh 9 years ago

I like how it pretty much just glosses over decades of familiarity. If there is anything the last 20 years prove, it is that the majority of developers will stick with what they know over what might be a good fit for the job. It goes even deeper in the SQL world down to the specific database flavor.

From the ops side I actually find RDBMS more difficult to deal with cause the power of relationships is easy to abuse and they are not anti-fragile. Instead of smartly reasoning about the data, it is all to easy to just "JOIN ALL THE THINGS WITH MEGA TEMP TABLES!". I've taken more database outages from bad queries then anything else.

There are bad implementations on both sides. There are reasons to pick both sides over the other given a set a circumstances. At the end of the day, the technical facts don't matter to most people's decision making though.

viraptor 9 years ago

> Instead of smartly reasoning about the data, it is all to easy to just "JOIN ALL THE THINGS WITH MEGA TEMP TABLES!".
This is so common, I'd love for the popular databases to add table flags that prevent it by accident. Letting me configure "this table most not full scan or file sort implicitly" would get rid of half the incident callouts I've been involved in. You could always override it in the query where needed.
- Someone 9 years ago
  
  At best (if your users aren’t allowed to write SQL), that would change your “it is slow” calls to “it doesn’t work”.
  At worst, I fear your users would learn to override it by default, as just one other part of the magic incantation needed to please the SQL gods.
  I think it would be better to have the planner send out emails “this query has to use a full scan” or, “this query is on the brink of changing strategy compared to earlier runs”
  
  viraptor 9 years ago
  
  > that would change your “it is slow” calls to “it doesn’t work”.
  But that's exactly what I'm asking for. "It's slow" means that it will work until the breaking point and then wake me up. "It doesn't work" with the right reporting allows me to teach someone about indexing during office hours.
- paulddraper 9 years ago
  
  MySQL has a slow query log with all non-indexed queries.
  
  viraptor 9 years ago
  
  I found it not very useful. Sometimes you will query without indexes. Sometimes it just doesn't matter. Sometimes it's a table with 2 rows and you don't want an index.
  In practice I don't want to know when it happens. I want an error to be raised instead so the database doesn't suddenly die.
  
  paulddraper 9 years ago
  
  So the problem is (1) it's too noisy and (2) it doesn't actually disrupt the query.
  Simultaneously too much and not enough.
nkozyra 9 years ago

> it is that the majority of developers will stick with what they know over what might be a good fit for the job
Anecdotally, I have seen this shift quite a bit in the last decade. There has been a big move toward "try whatever is new and shiny" and I think a lot of that drove the NoSQL craze.
And don't get me wrong, there are some great NoSQL options if you select the right tool for the right job.
But a lot of people who gravitated to NoSQL did so because they designed poor queries in RDBMS. Guilty myself. Often the quick reaction was "wow, this is faster" instead of "well of course it's faster, I'm not getting X, Y and Z features of a relational database. Do I need those? Did I abuse the RDBMS?" etc.

ChicagoDave 9 years ago

I had the task of replicating data from DynamoDB to Redshift on AWS on two large projects on the last few years. The primary factor in using DynamoDB is that is's cheap and scales like nothing else in the galaxy.

Then the other shoe fell. The data was needed for reports. Reports require a static schema. NoSQL (and the developers who love it) despises static schema. "I can add properties whenever I want!"

This process of analyzing and reporting on production data becomes a very time-consuming, costly, and brittle exercise.

So then you have to determine, did we save enough money on the design side (using NoSQL over SQL) and then piss it away on reporting?

I'd argue AWS and other cloud providers need to create a SQL capable relational database as a service. This would (I hope) solve the problem.

But in the meantime, let's build our micro-services on relational databases so we can actually get aggregate data to stakeholders in real-time.

Clubber 9 years ago

>So then you have to determine, did we save enough money on the design side (using NoSQL over SQL) and then piss it away on reporting?
I spent two years early in my career as a report writer. It sucked horribly, but I learned how important reports are to the business. I remember a professor telling me that any business application reporting is 50% of the value. This is something many developers don't even consider.
Having said that, the solution, as I understand it for the reporting issue with NoSQL is to move your NoSQL data to a proper warehouse for reporting/BI. You'll of course need an ETL layer to do all that, and the ETL layer will need to be smart enough to handle very nullable data and data structures that often get extra, unknown properties, etc.
This solution continues to work even when you have multiple, loosely coupled NoSQL databases handling various parts of a larger system and various separate systems.
daveFNbuck 9 years ago

> I'd argue AWS and other cloud providers need to create a SQL capable relational database as a service.
Like Google's Cloud SQL? https://cloud.google.com/sql/
curun1r 9 years ago

> Reports require a static schema
This may be true, but reports also normally require a different schema from the OLTP database. Data warehousing with star schemas is a thing and it doesn't matter whether your OLTP database is an RDBMS or a NoSQL solution...you're going to have to push data to your warehouse to run those reports. The last thing you want to be doing is running reporting queries against a database that's required for the operation of your product, even if it is off hours. And reports will never run efficiently if they use a schema that's designed around the requirements of your business' application. For one thing, your update performance will tank since every write requires a ton of index writes. You can mitigate that a bit by maintaining separate indexes on a reporting slave, but that kind of maintenance has its own set of headaches.
So I'm not seeing how reporting is a good rationale for choosing one technology over another. Reporting, when done right, is a completely separate system with a completely distinct technology choice.
- ChicagoDave 9 years ago
  
  The level of effort to move data from DynamoDB to a data-warehouse (which is what Redshift is), is not trivial. It may be that the Data Pipeline tools within AWS are still opaque and incomplete, but but I've seen the same problem with MongoDB implementations.
  Reporting is nearly always an afterthought in system architecture and it should be a first-class requirement.
  
  ianamartin 9 years ago
  
  The reason that moving data from a transactional platform to a warehouse platform is not trivial is because you actually have to have ETL processes in place to transform data from one architecture to another.
  A data warehouse has nothing to do with the mechanism that stores it. It is a design that comes with its own challenges just like transactional systems do. Transactional systems are optimized for writing. Warehouses are optimized for reading. They way you choose to execute those optimizations is up to you. But getting data from one of those systems to the other is never going to be trivial, and there is no general purpose tool to get things from one format to another because the warehouse structure is too entity-specific to generalize beyond a certain point (and that point of generalization is really the process of creating the design. See the Kimball group's books for more information.)
  Redshift is not a data warehouse anymore than Postgres is a data warehouse. In fact, Redshift is a fork of Postgres that leaves out the functionality, data types, and indexes that make it difficult to scale horizontally. Postgres also isn't a data warehouse. Nor is it transactional system. It's just a tool that you can use to build either one of those things with.
  
  scapecast 9 years ago
  
  I will print and frame your last sentence.
MBCook 9 years ago

> I'd argue AWS and other cloud providers need to create a SQL capable relational database as a service. This would (I hope) solve the problem.
Isn’t that Aurora?
- PKop 9 years ago
  
  OP probably means not worrying about instances, scaling or over/under provisioning etc, just service usage at a more fine grained level.. more like Spanner
  
  MBCook 9 years ago
  
  It’s been a while since I looked at it but isn’t that how Aurora works? Isn’t that (along with performance) their argument for why you should use it over MySQL or Postgres in RDS?
  
  ChicagoDave 9 years ago
  
  Exactly. I don't want to care about the server or the hardware or the OS. Just give me a database with tables and standard SQL querying capabilities and performance.
  
  jaequery 9 years ago
  
  +1
- nkozyra 9 years ago
  
  It's RDS in general, which supports a number of RDBMS.
rmrfrmrf 9 years ago

Putting data gathering and reporting into one database is just asking for torture. Yet, it happens all the time.
- ChicagoDave 9 years ago
  
  I wasn't suggesting that. I was saying there's no real reason transactional and reporting data can't start with schematic replication and actually is more efficient.

ww520 9 years ago

Way back when I took my first database class by Mike Stonebraker, he talked of some of the old war stories between RDBMS and network-based database, which was hot before RDBMS. He said the relational model by Date and Codd had won out back then, for the simple fact that relational data stood by themselves while the network-based data were tightly coupled with the applications. When NoSQL came around, yep, it looked like the old network-based database again. History has repeated itself.

Relational model, where SQL is merely the querying language, will win again and again, for the simple fact that it enables data to stand by themselves, and data tend to outlive applications, both in longevity and in scope.

Edit: PostgreSQL came from Postgres, the research project started by Stonebraker and others in UCB.

elvinyung 9 years ago

Stonebraker wrote MapReduce: A Huge Step Backwards [1], which basically tells exactly the story you mentioned, the "Hadoop is Codasyl" comparison.
[1] https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_...

Clubber 9 years ago

So the long and short of it is SQL would be a good common language for relational and NoSQL style databases.

I would agree with that, but since they different monsters (RDBMS and NoSQL), it will take a bit of tweaking to find a good dialect (how do joins work, etc). Of course it makes sense to adapt the NoSQL databases to the existing ANSI SQL rather than make existing ANSI SQL users switch to a new type of SQL that accommodates NoSQL, but we'll see what happens.

Google create a "Standard SQL," but I'm not familiar with it.

Here is a link on Google Standard SQL:

https://cloud.google.com/bigquery/docs/reference/standard-sq...

nkozyra 9 years ago

> So the long and short of it is SQL would be a good common language for relational and NoSQL style databases.
Yes, I'm sure the inevitable "SQL for NoSQL Databases" book will be a good read :)

mmaunder 9 years ago

It's not one versus the other. That's like suggesting JSON is going to 'beat' HTML. They're different tools for different jobs and both are used alongside each other to great effect.

joeminichino 9 years ago

the whole debate always seems very confused to me. SQL refers to the language with which you interact with a database, NoSQL refers to the database system itself and it's normally used as opposed to traditional RDBSs. So really the debate should be either SQL vs NON-SQL interfaces or RDBS vs NoSQL databases. Also NoSQL meaning "not only SQL" means SQL can be a subset of NoSQL, so the debate is over. Really we should review the terminology of this entire subject...

nextInt 9 years ago

Lol SQL is making a comeback? It never left.

krf 9 years ago

It only left in terms of attention from startups I suppose. I also disagree with the reason the article suggested - SQL could not handle the loads. My opinion is that startups simply liked the idea of not having a schema as it fit their agile approach. So, they went NoSQL because it allowed them to get going faster and change easier.
- xirdstl 9 years ago
  
  That, and they like to believe they need to scale like Facebook.
  
  Myrmornis 9 years ago
  
  Sadly, this is a common reason.
20years 9 years ago

Exactly! SQL was never dead.
wernercd 9 years ago

SQL is dead. Long live SQL.
SQL2219 9 years ago

This.

methodin 9 years ago

No tool is good when abused. NoSQL was a knee-jerk reaction and the truth lies somewhere in the middle. Putting everything in relational databases is as bad as putting none of it there. If everything is treated like a tool the world becomes brighter at the cost of having to learn and understand more.

taeric 9 years ago

I do find the amusing "backronym" rather hard to take at face value. I also found the old critics of SQL databases odd focus on the query language hard to take. The problem wasn't the queries, per se. The problem was the power of the queries for interactive uses and how that doesn't scale up.

That is, the problem with most SQL databases is that you need some pretty specialized knowledge in order to construct good queries. Not shockingly, the problem with most modern key/value (or otherwise) datastores is that you need some pretty specialized knowledge in order to construct good queries.

Now, for things getting off the ground, this is probably fine. Most of the places you will go south with queries is in the ad-hoc style query. Of course, that style query is perfect for interactive use. Ideal, even. It falls on its face if it is supposed to support an automated case at high TPS. Unfortunately, automated cases at low TPS often turn into automated cases at high TPS. Worse, interactive cases at ridiculously low tps often turn into automated cases at low TPS. Which, of course, just feeds itself.

How this feeds itself, of course, is you can explore your data much more effectively if there is an ad-hoc query engine. So, we are now seeing the resurgence of ad-hoc queries and the learning that those can lead to some powerful insights.

aaron-lebo 9 years ago

The thing is ORMs have made writing easy queries easy and hard queries possible since at the very least 2005 with Rails, and they are easily extensible and composable. It's mostly a solved issue. If you want to do complex data manipulation, surprise, it takes a powerful tool. Seems a lot of the hype is the result of VC $ + naiveity/overselling about what MongoDB and co would be capable of. The marketing is natural - MongoDB made databases "easy" and easy is the only way to sell lowest common denominator tech to become a unicorn. Bad incentives for all involved.
Sql is a kind of rite of passage? If you can't learn it or can't be bothered to learn it, you don't need to go anywhere near a database because you can't be bothered to understand the right tool.
Granted, nosql has some uses, but those are few and far between and probably would have been handled more effectively and easily with old approaches like Prevlayer, but Prevlayer was a thousand lines of code and couldn't be monetized.
- pjmlp 9 years ago
  
  > The thing is ORMs have made writing easy queries easy and hard queries possible since at the very least 2005 with Rails,
  Contrary to the hype it created, Rails was not ground breaking.
  We were using ORMs with Rails like concepts and scaffolding in 1999, based on Tcl, our own mod_tcl and several DB bindings to Oracle, Informix, DB2, MS SQL Server, Sybase SQL Server, running on top of Windows NT/2000, Aix, HP-UX and Solaris.
  An in-house application server which shared many design ideas with AOLserver, designed in 1995.
  
  oblio 9 years ago
  
  Wide scale adoption is almost as good, if not better, than "ground breaking".
  
  pjmlp 9 years ago
  
  Somehow SV has a tendency to re-discover stuff.
  
  taeric 9 years ago
  
  Not unique to SV. And to be fair, fundamental constraints wax and wane, as well.
- oldandtired 9 years ago
  
  The problem is that SQL (in all its forms) has not been relational. Nor is it complete in terms of its querying ability. This means that it can do unexpected things and more so when you have nulls included in your data.
  The Third Manifesto (by Date and Darwen) highlight many of the problems of SQL and SQL DBMS. Though I find TTM to be lacking in various ways, it does, at least, start heading in the right direction in terms of a database query system.
  Over the course of decades, I have written much SQL and have had to modify and document many systems that were based on SQL DBMS's. The language is a PITA, but you do what you have to do to get things working.
  NoSQL was, to me, a retrograde step, though it has some usefulness in some situations. What I have found is that most databases have not been designed with regard to Relational Database Theory (RDT). In general, the attitude that I have observed has been that RDT is good in theory but bad in practice.
  I have built (using mickey mouse systems), fairly complex, multiuser databases based on RDT. I have also seen many databases built (using ORACLE and SQL Server as the DBMS) that have not used RDT and they have been atrocious. One such system, built by some SQL guru, took 25 hours to do a run of one process. Had it been built according to RDT, the entire process (by my estimate) would have taken less than 20 minutes to run.
  When designing any kind of database using any of the various technologies that have been developed over the 60 odd years, there are various design principles that will allow a good database (fit for purpose) to be built. It appears in many cases that none of these principles are known or if know are know then not adhered to.
  Of course, what others might have found can be quite different.
- sk5t 9 years ago
  
  One of the last things I'd want to do is fight with an ORM to build a complex query that doesn't fit the ORM's natural tendencies. A micro-ORM to take care of the boilerplate is great, but otherwise, given the choice between two unpleasant options, I'd rather write 100% of my database interaction code in sprocs than an ORM (promising never to use the raw-SQL escape hatch).
  
  majewsky 9 years ago
  
  Micro-ORM plus plain SQL for big reports is the sweet spot IMO.
  A few years ago, I was refactoring a large legacy Perl application to be much less messy and a bit more object-oriented, and I accidentally a micro-ORM without really realizing it. I think it was a 1000 lines of code in the end, and 95% of all existing SQL usages in the application could be replaced by simple method calls on the ORM's objects.
- taeric 9 years ago
  
  ORMs lure devs into thinking you can have a single schema for your app. At launch, this might work. Once you scale, though, ETL is a fact of life. And running a job from you daily stats to the weekly stats, and then to the monthly, quarterly, etc. becomes a requirement.
  Which isn't tough, but is often unplanned.

smarx007 9 years ago

I am really surprised that while everyone talks about SQL vs NoSQL, nobody has mentioned RDF as a model and SPARQL as a language. Graph-like structures based on triples allow the data relations to be represented properly (relational properties), while not limiting the data structure. This is not a shiny hot thing, but instead something developed over many years by (mostly) academics. Take a look at the tutorial: https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/

And just to prove that RDF is a model, but not a format, look at JSON-LD as a serialisation format for RDF and at the SQLGraph [1] paper from Google to see how RDF can be implemented on top of an SQL RDBMS.

[1]: https://static.googleusercontent.com/media/research.google.c...

bobdc 8 years ago

SPARQL is also a standard from a well-known standards body with plenty of open source and commercial implementations, and it's quite popular at many commercial enterprises: http://www.sparql.club

labster 9 years ago

If SQL is to databases as Javascript is to browsers — ubiquitous and largely standard across the market — are there any languages that transpile to SQL?

People in this thread have commented that SQL is clunky, and JS definitely fit that description for a while. So I'm wondering if there are any alternatives that would prove SQL's clunkiness to me.

amirouche 9 years ago

ORMs transpile to SQL.
Just like JavaScript is not perfect, SQL is not perfect. I would rather have a general purpose programming language like Scheme to run queries against my database.
bigger_cheese 9 years ago

I'm an engineer (the non software kind) that works in an industrial plant. I often have to do ad-hoc querying manipulation etc of data from a lot of diverse sources. Usually to do fault finding and analytics - sometimes also for reporting.
I don't know about transpiling but I use SAS to work around SQL's clunkiness. I can embed SQL seamlessly into my SAS code so it works well. I use SQL for things that are simpler in SQL (selecting, filtering, aggregating data and "case when" statements etc) and I use SAS's data step language for when I have to do more complicated stuff with the data like Arrays, loops, macros, weighted averaging, time series based stuff and regressions.
The thing I love about SQL is it is the lowest common denominator for all of my organisations data it doesn't mater if data is stored in an Oracle table, a Microsoft SQL Server table, or even Access DB or spreadsheet. The language to extract/merge/join all the data is the same and standardized.
We also have things called "Data historians" which are basically time-series datastores (usually they are linked to PLC vendors) and nothing frustrates me more when they use some proprietary "sql like" language which seems similar to ANSI SQL except things like aggregations (SUM, AVG, MAX etc) don't work or they do weird things with data types (especially dates) or similar everyone of these seem to be subtly different - annoying.
pmontra 9 years ago

Every ORM is a transpiler from a language to SQL.
ActiveRecord transpiles from Ruby to SQL. Django ORM and SQLAlchemy are popular ones to write SQL in Python. Ecto is from Elixir. Every language has its own transpilers. Not all of them are easy to use and they are much harder to use than SQL for complex queries, maybe even impossible to build them.
Frankly I'm a little fed up to have to learn a new way of generating SQL for every language and framework. I'm working on or maintaining projects in Rails, Django, Web2py and Elixir for my customers and I'm coming to wish we were using plain SQL. After all I knew SQL long before any of those other technologies existed, I know how to create prepared statements and how to use positional parameters to avoid SQL injections. Then there is how to efficiently extract the data in the resultsets, create data structures in the original language and possibly cache queries. They don't look so hard (maybe handling joins gracefully.)
- mercer 9 years ago
  
  I've heard good things about Ecto's approach (Elixir). Would you say it's at least better than many other ORM's, or is it similar enough in regards to drawbacks/advantages compared SQL that this 'better' doesn't matter much?
  
  pmontra 9 years ago
  
  Ecto is well engineered but it's an overkill for simple queries and it makes them harder than it should. It's ok for average sized queries (maybe composed over a pipeline). It's useless for very complex queries, but every ORM can't do them. Think UNION, HAVING, etc. I write them in SQL and then handle the resultset, which is not in the transpiler. The transpiler is a subset of the functionality of an ORM.
evv555 9 years ago

Ecto for Elixir might fit that description. Queries are composable unlike SQL.
yenwel 9 years ago

LINQ?

bogomipz 9 years ago

The premise that SQL is "beating" NoSQL is pretty dumb. I have never been aware that such a competition between datastore designs exists.

Who is that competition between developers? Users? Most of the places I have worked in the last 5 or 6 years have had both relational and non-relational databases. This is not uncommon. In none of those shops was there a competition between the two databases but rather they were complimentary.

This article and title seem to be very self-serving for their own product. Its seems to willfully conflate SQL the interface and SQL a general term for a relational database.

NoSQL has always been something of a misnomer - non-relational would have been a better term but it doesn't sound as buzz-worthy I suppose.

The SQL like interface has been in Cassandra for longer now than it was absent. The gain in SQL like interfaces for non-relational databases is because it's familiar and works really well. Anywhere there is a database there is going to be a need for selection, filtering and projections of tuples.

>"In Amazon’s own words, its PostgreSQL- and MySQL-compatible database Aurora database product has been the “fastest growing service in the history of AWS”.

Is this really surprising that the world's largest cloud provider is selling more databases than anything else? Almost everyone needs a database, given howrelational database, there are more people around that have experience with it.

>"To understand why SQL is making a comeback ..."

No, SQL never went away. Full stop.

>"But don’t take our word for it. Take Google’s"

No, this practice of blindly adopting ideas just because they work for Google needs to stop.

This post sounds as if the author(s) themselves bought into all of the NoSQL hype that buzzword-obsessed tech journalists were spinning and they are just now having an epiphany that much of that hype was just that.

geebee 9 years ago

"I have never been aware that such a competition between datastore designs exists"
Good! There certainly shouldn't be. Wish I could say the same. Magpie developers, people who want to seem edgy and forward looking, people who like making others look backwards at meetings, people who are worried they'll look like dinosaurs, people who've been around the block long enough to know an approach is totally unnecessary for their project but will be required on the resume for the next project. Trust me, they were all out there, in force.
And of course, many data stores that aren't SQL are absolutely great for the task. Try to solve the 6 degrees of separation problem with SQL joins, then do it with a graph database. There are all kinds of applications that weren't a great fit for a relational model, and many that are. This article does mention the moment where NoSql was defined as "Not Always SQL" (and dismisses it with "nice try"), but honestly, if this had been the approach up front, it would have saved us a lot of grief.
Probably. Maybe. I hope. Oh, who am I kidding. All that stuff I described above would have happened anyway.

takeda 9 years ago

Great article, but I think the author puts too much emphasis on SQL instead on the fact that the data being stored in a relational way.

Personally I actually would like an alternative language that would be capable to be integrated with the language in such way that it could also fall with type checking.

So for example if I rename a column in a database, the type checker would highlight all places in my code that were broken by this change.

JOOQ[1] seems to do something like that, but it's only for Java.

Also, looks like QUEL[2] would be a bit easier to be integrated with a language, too bad it died.

[1] http://www.jooq.org/

[2] https://en.wikipedia.org/wiki/QUEL_query_languages

aneutron 9 years ago

By no means am I an expert, but here's the thing. You can't say "Cars are the new thing. Airplanes came to be but now they're not the shiny thing anymore."

You don't take your car to go intercontinent, and you don't take the plane to go a couple of kilometers.

One thing NoSQL databases excel at, is scaling. Scaling in Cassandra for example means adding a node, everything else is just magic. THAT is what made MySQL and the SQL language popular.

SQL may be popular with developers, but scaling MySQL was a nightmare for quite some time, even with the introduction of vitesse etc.

Point is, please do not compare apples to oranges.

emodendroket 9 years ago

I don't honestly think many people have the kind of scale where this is a sensible trade-off to make.
- sk5t 9 years ago
  
  Absolutely. Even a bigint / long sequence column so often feels a bit ridiculous.
  
  shub 9 years ago
  
  I have 5 tables which have overrun the range of int in the id column. Happily they were defined bigint. A while ago one ran out of tinyints, but doing the schema change...was pretty terrible actually due to the foreign keys. The tinyint decision wasn't me, I want to make clear. New tables all get bigint. Either you don't have many rows, so you can spare the space, or you have a lot of rows and the thought of changing the type of your primary key gives you a queasy feeling in your belly.
  
  oldandtired 9 years ago
  
  This is only a problem because of the laziness of those who have built the DBMS. Not necessarily the programmers fault but those who mandate the requirements of the DBMS (usually managers).
  If you specify that a field is an integer, you shouldn't have to care how big that integer is. That is a concern for the DBMS and should not be yours. In point of fact, your only concern should be what fields you are using not how they are to be stored. You should never ever have to concern yourself with whether a field is a tiny_int, small_int, integer or big_int.
  Any changes to the sizes that the DBMS has chosen for specific relations should be automatically propagated by the DBMS to any other area that refers to the original field. It should never require intervention by a DBA or programmer, ever.
  Over the decades, I have read lots of recommendations for DBA's about how to manage the various databases stored in the DBMS. Most of these recommendations should have been put into the DBMS as a part of the attributes by which the DBMS controlled the database. These recommendations were often portrayed as a result of running various sets of queries and other functions in specific orders. All of them could have been automated out of existence, but to keep the DBMS simple, they forced these processes onto the DBA's and database designers.
  Since we know what has to be done, surely the DBMS producers (like IBM, ORACLE, Microsoft, etc.) should be able to do so as a part of their product development.
nimchimpsky 9 years ago

I disagree with this.
And MySQL != SQL

shepardrtc 9 years ago

SQL never really left the table. People just like to try out new things and talk a lot about them. Its actually a pretty great feature of tech culture; we don't accept that everything is perfect, we always want to try for something better. And in this case, people tried to have a database with a different and potentially "easier" paradigm. Nothing wrong with that at all. But its hard to compete with years and years of tuning and optimization that SQL databases have gone through.

peterwwillis 9 years ago

SQL wins because big iron wins. At the end of the day, the big, clunky, slow, complicated, aggravating, old systems stick around because they are built to stick around.

Looking at, say, an eventually consistent distributed decentralized kv store, one might be tricked into believing it's simple enough to deploy with enough nodes and general enough interfaces that you can build a complicated system on top of it if needed, and rely on its supppsed simplicity and scalability the rest of the time. But nobody tells you about the shitty implementation. The replication that doesn't resume, the stodgy transfers, the imbalanced distribution, the consensus conflict, the infinitely expanding disk, the churning CPU. How at scale, if all the other aspects of your infra aren't scaling along with your data size, the whole thing blows.

Traditional SQL databases end up being many times simpler in practice, and because of their inherent scaling limitations, much easier to manage. And most importantly: their implementations aren't buggy.

SQL is just more reliable.

wjossey 9 years ago

As someone who helped design, then manage a massive NoSQL store which hundreds of terabytes of data, of which most of it was hot, I’ll humbly disagree and say it’s not necessarily as bad as you make it sound.
Did we have someone who managed that system? Sometimes, but mostly it just did it’s own thing. We ironically invested way more time on our MySQL database over the years because we couldn’t get that to scale the way we wanted to, but I think that was specific to a problem we were having.
Did we later invest a lot of time in that system (the NoSQL one)? Yes, because it was very cost effective for us to do so. At certain scale throwing people at optimization problems can pay huge dividends. But, this can be said of most infrstracture. It’s usually worth revisiting every year or two and seeing how what you can squeeze out.
Did we have churning CPU, infinitely expanding disk, replication issues, and more? Sure, but not very commonly and mostly it was fairly easily resolved. More importantly though, it was a solid system that was the underpinning of a colossal system, and it behaved admiraly more than 99.9% of the time.
Will most projects benefit from a hugely distributed KV store? Nope. But I’m still glad they exist!
- cachemiss 9 years ago
  
  Caveat, this comment isn't directed at you (I agree with your comment), but rather the points around what you are saying.
  One thing that helps is if people stop referring to things as SQL / NoSQL as what ends up happening is various things get conflated.
  When talking about stores, it's important to be explicit about a few things:
  1. Storage model
  2. Distribution model
  3. Access model
  4. Transaction model
  5. Maturity and competence of implementation
  What happens is people talk about "SQL" as either an NSM or DSM storage model, over either a single node, or possibly more than that in some of the MPP systems, using SQL as an access model, with linearizable transactions, and a mature competent implementation.
  NoSQL when most people refer to it can be any combination of those things, as long as the access model isn't SQL.
  I work on database engines, and it's important to decouple these things and be explicit about them when discussing various tradeoffs.
  You can do SQL the language over a distributed k/v store (not always a great idea) and other non-tabular / relational models and you can distribute relational engines (though scaling linearizable transactions is difficult and doesn't scale for certain use cases due to physics, but that's unrelated to the relational part of it).
  Generally people talk about joins not scaling in some normalized form, but then what they do is just materialize the join into whatever they are using to store things in a denormalized model, which has its own drawbacks.
  As to the comment above you, SQL vs NoSQL also doesn't have anything to do with the relative maturity of anything. Some of the newer non-relational engines have some operational issues, but that doesn't really have anything to do with their storage model or access method, it just has to due with the competence of the implementation. MongoDB is difficult operationally not because it's not a relational engine, but because it wasn't well designed.
  Just like people put SQL over non-tabular stores, you can build non-tabular / relational engines over relational engines (sharding PostgreSQL etc.). In fact major cloud vendors do just that.
  
  wjossey 9 years ago
  
  Wonderful response. Thank you. Wish I could give multiple upvotes. I’ll add some of those points to my thought process going forward.
jjirsa 9 years ago

There is so much wrong with this post I don’t know where to start
Old school sql replication is awful. It’s awful in Postgres (single threaded replay? A single vacuum can make your slaves fall behind). It’s awful in MySQL (go ahead and tell me how you do dual master without rewriting your app).
People use nosql not because it’s eady or flawless, but because doing it with sql is an order of magnitude harder.
Go ahead and describe resharding a sql DB as your company grows 100x
Or how you’ll handle a dc failover and fail back
Or how you’ll shrink your resources after the holidays - without downtime.
These are trivial with Cassandra and god damn near impossible for most MySQL users.
More reliable? Your “big iron” is still a single point of failure. If not a single cpu, a raid card or a top of rack switch or a DC in a hurricane zone.
- 6nf 9 years ago
  
  > damn near impossible for most MySQL users.
  MySQL? Nobody, and I mean NOBOBY is saying you should use MySQL. All of the things you mention is trivial using a decent SQL solution like PostgreSQL.
  
  jjirsa 9 years ago
  
  Except the person to whom I replied said to use traditional SQL databases:
  > Traditional SQL databases end up being many times simpler in practice, and because of their inherent scaling limitations, much easier to manage. And most importantly: their implementations aren't buggy.
  
  6nf 9 years ago
  
  They mean traditional SQL databases like PostgreSQL, SQL Server, DB2, Oracle etc.
  Not MySQL. MySQL has almost none of the important features of a real SQL database management system. Nobody should be using MySQL for anything.
  
  jjirsa 9 years ago
  
  Quick, tell YouTube and Facebook they’ve been doing it wrong all along
  
  6nf 9 years ago
  
  They are NOT using stock-standard MySQL and haven't done so for many years. If you are big enough to make your own programming languages the normal rules do not apply.
  And if you ask the Youtube team for a recommendation for your project, they will not point you to MySQL.
  
  jaequery 9 years ago
  
  i haven't touched mysql in years but quite frankly, 99% of websites out there (blog/cms) will run fairly fine on a Mysql database. as for the other 1%, they would be big enough to pick the right db of their choice as the time comes. so there is no need to tell everyone you should be using this or that.
  this is kind of like telling everyone that they should ALL be using emacs, when vim/nano is still sufficient for the job.
  
  6nf 9 years ago
  
  I honestly don't see a single reason why MySQL should be used over PostgreSQL for anything, thus my sweeping statement. I could be wrong but I don't think so.
- elvinyung 9 years ago
  
  You got downvoted, but I agree with you. Traditional RDBMS is inherently single-node and a pain to scale. The actual sad thing here is that there's no mature/reliable commodity (i.e. open source) distributed RDBMS for transaction processing yet.
  
  jinqueeny 9 years ago
  
  I have to bring this up again: NewSQL. There are quite a few new choices out there, Google Spanner, Cockroachdb(https://github.com/cockroachdb/cockroach), TiDB (https://github.com/pingcap/tidb). All of them are very easy to scale while maintaining the ACID transactions.
  
  elvinyung 9 years ago
  
  I said open source and mature :)
  IMO really the best option for distributed OLTP RDBMS/NewSQL right now is basically either Citus or Vitess, only because they are built on the backs of extremely mature databases. It really feels like all other options are not mature enough (except maybe VoltDB).
  
  jinqueeny 9 years ago
  
  Could you please elaborate how you define "mature"?
  
  elvinyung 9 years ago
  
  IMO, in descending order of importance to me:
  * Dependability. I can be completely certain that standard usage of the database won't lose my data or leave it in a bad state, at the very least.
  * Community. There's an active IRC channel (or the like) where I can ask questions. When an error show up, I can Google it and easily find an applicable solution most of the time.
  * Performance. Not just with synthetic (i.e. TPC-C, TPC-H, YCSB, etc.) workloads, but proven to have reasonably satisfactory QPS in many/most real production settings.
  * Ecosystem. There's a good collection of first-party (and also ideally third-party) tools, constantly being improved. (Mere wire-compatibility with MySQL or Postgres might be good, but probably isn't enough, since 100% wire compatibility seems rare.)
  
  jinqueeny 9 years ago
  
  Thanks for sharing！These criteria seems quite practical. You are probably right. Time will tell.
  
  rhencke 9 years ago
  
  CockroachDB? I'll grant you it's still young, but it pretty solidly hits all your other checkboxes.
  
  elvinyung 9 years ago
  
  IMO, it's very promising but still extremely immature.
  I still believe in Curt Monash's cardinal rule of RDBMS development: http://www.dbms2.com/2013/03/18/dbms-development-marklogic-h...
  
  yenwel 9 years ago
  
  VoltDB?
- stephenr 9 years ago
  
  Galera + HAProxy or ProxySQL should give you a multi-master MySQL with no app rewrites required, no?
  
  jjirsa 9 years ago
  
  Primary key conflicts are still a thing. Especially problematic where you're either not using auto_incrementing PKs, and/or you try to switch writes from one master to the other and app has to be aware of the switch.
  
  stephenr 9 years ago
  
  Not using auto incrementing keys is surely a rarity with MySQL projects.
  Even before galera was a thing regular MySQL replication could use auto_increment_(increment|offset) to make each server generate unique id's.
  
  jjirsa 9 years ago
  
  There exists a LOT of data models that don’t use auto incrementinf primary keys. Rarity or not, it’s a real use case and a real problem for master/master replication
  Also, even if you do use auto increment with offsets for each server, you’ll have to know how many servers you’ll eventually have, or waste offsets and eventually risk outrunning your data types
  
  stephenr 9 years ago
  
  Non auto keys will already need some way to ensure uniqueness that will involve the application expecting conflicts.
  For those that do use auto inc, galera will manage the offset/increment value based on cluster size automatically.
imtringued 9 years ago

SQL is just a querying language, the point of the article is that it's not exclusive to RDBMS, it is also used by NoSQL databases. There is also no law that says RDBMS must be strongly consistent. They can also be eventually consistent. The opposite also applies to NoSQL databases.

crimsonalucard 9 years ago

If NoSQL never worked because of a lack of a common interface, then switching back to SQL is just giving up on finding a real solution because SQL is one of the big mistakes in web development.

Allow me to elucidate: We query the database in a "query language" that is declarative and based on a expression. This SQL expression is then compiled down into imperative instructions. This is where the mistake lies: SQL is a leaky abstraction. We understand algorithms, we understand how to manipulate the Big-O of imperative instructions to get better runtimes, but in order to understand how to optimize SQL we have to understand how this "expression" compiles down into imperative code. There's no rhyme or reason for why "SELECT STAR" is bad unless you know beforehand the instructions it compiles down to. Actually, you don't even need to understand the instructions "SELECT STAR" compiles down to, you can build an entire career (DB admins) around just remembering that and other random rules as optimization hacks.

The bottleneck in web development is the database. The reason why we can use python or ruby as application servers instead super fast C++ apps is because the database will always be way slower. So it would make sense to optimize the database, yet decades of imperative algorithm theory is thrown out the window because you can't query the database imperatively. Instead of optimizing the bottleneck, we write out web apps in imperative code and our database queries in highly abstract leaky expressions.

The web app is not the bottleneck. Having a highly abstract functional language as the web app is perfectly ok as you tend to not have the web app run through complex search algorithms anyway. Instead we decide to write our webapps in optimizable imperative code and our bottlenecked database in SQL. It's all backwards.

I'm thinking the mistake with NOSQL was the fact that the databases weren't relational. Can we make a relational database with a query language that is NOSQL and imperative?

russellbeattie 9 years ago

Wow... I wish I could just wait a few years and have everyone come to their senses on other things as well.

bischofs 9 years ago

The author seems to conflate sql with single node classical rdbms'.

The query language side of things hasn't changed much but coming from automotive which is currently generating daily terabytes of long and flat data in the form of time series streams with very little relational characteristics nosql systems are very useful.

I think there is a tendency to go after old relational systems because of how they handled themselves with the data that was generated mostly by users. At this point my industry is dealing with data generated by machines and nature and thus the need for extremely scalable and distributed systems.

Whether these systems use sql i don't think is that important ( most of them will support sql or sql like )

jugg1es 9 years ago

Just like anything in the tech world, it all depends on what you are trying to do. There are scenarios where nosql makes more sense and cases where relational databases make more sense. There is no panacea in regards to data storage.

myrandomcomment 9 years ago

I started in tech at IBM supporting the finance clients. When I moved to an ISP in the early 90s I was dumb struck by the fact that things broke and it was okay. Oh the Usenet server failed again....etc...it never sat well with me. Look SNA was complex, but uptime on the system I support was measured in decades. I have done a ton of startups since then and while I love the speed at which we move the falgility of the system always rubbed me wrong. Looking at the need for speed (mongo, et.al) was always painful. I am happy to see that rational thought is winning (which is how I see this).

skc 9 years ago

I like SQL. I've never failed to do anything with it and find it very powerful.

But while that aspect of the SQL experience I'm comfortable with, I'm still just a developer and not a DBA. I don't even _want_ to be a DBA. And as someone putting the pieces of a startup together in a country where DBA skills are practically non-existent (meaning I can't reliably train someone up myself) a cloud hosted NoSQL solution seems a much better bet on paper.

I've noticed that when the topic of SQL vs NoSQL comes up, the issue of cost of infrastructure maintenance always seems to get glossed over.

botskonet 9 years ago

I've long struggled with NoSQL. My data is relational, and I've been long taught to normalize and de-dup my data, so how do I use this? I keep feeling like I'm missing the genius because I keep coming back to wishing I had a true relational table.

However, there have been times where being forced to define a schema has been painful. If our app has to store dynamic keys/values it all winds up as records in a relational db table acting like key/value store. Whereas in Mongo we can just store the document, without affecting the entire table schema.

I want the best of both worlds...

mrjn 8 years ago

SQL is not beating NoSQL. NoSQL is evolving. https://blog.dgraph.io/post/sql-vs-nosql/

mr_overalls 9 years ago

SQL has all of the benefits mentioned in the article, but a few small changes to the ANSI standard - mostly taking design cues from C#'s LINQ - would make it much more usable and modern as a query language.

https://www.linqpad.net/WhyLINQBeatsSQL.aspx

Also, it's such a small thing, but why not place the SELECT clause after the FROM clause? This would allow for easier auto-completion help from the query editor.

jonbaer 9 years ago

A bit surprised to find no mention of the GPU database field (ie: https://www.mapd.com/demos/, etc) because straight SQL on this hardware is extremely fast when optimized. Feels like all this NoSQL stuff came around when developers no longer wanted to deal w/ DBAs and do full stack(s) on their own. DBAs spent long hours on DDL and DML (stored procedures) for a good reason.

arnon 9 years ago

I'd mention an actual capable SQL database a lot before MapD...
Like SQream or Kinetica

AzzieElbab 9 years ago

Why is no one stating the obvious? 1) hardware improvments along with db hosted solutions simplified sql management and scaling 2) most data is simply useless or of very temporary value. Having said that, I do not think sql can compete with specialized solutions when it comes to unstructured or semi-structured data like EMR for example. Also, with notable exception of datalog/datomic NoSQL solutions remain kV stores hacked together by some horrible programming models

zimablue 9 years ago

It feels like everyone mixes relational databases with SQL in discussions. Relational databases are fine but SQL is disgusting. It's a query language that's very hard to parse or generate and has multiple incompatible variants. Ironically the Lambda calculus version looks much better, I'd never seen those before. I wish we'd go back to that as default, unify around a standard and expose block text SQL as an extra for users.

jeremiep 9 years ago

The one downside I see with SQL is that its a string. Would it be a data structure instead it would be incredibly easier to manipulate, without third party query builders. Even without that its still more convenient than most NoSQL.
Still, Datalog as implemented by Datomic is a breath of fresh air.
- zimablue 9 years ago
  
  It's more than just being a string though, it's the multiple incompatible versions and the stunted mini language of it's own. A good solution would be able to run arbitrary expressions in some "real" programming language. It also in my experience doesn't seem to deliver on its central premise of converting declarative expressions to fast operations. So often in a SP I've had to force a write to an intermediate table to jimmy the execution plan. I think it's like excel, it's not considered good because the thing itself is good it seems good because the thing it does is very useful (lightweight interactive dataflow programming, talking to relational databases) and it has a near monopoly. LINQ is really cool but doesn't exist in python (macropython has an unused attempt PINQ I think). Probably because a) SQL is bad b) only MS actually has the definition of the mssql language!
oldandtired 9 years ago

Not quite, the confusion is that SQL DBMS's are assumed to be relational when they are not. None of the major players have produced a relational DBMS (there is one proviso, but IBM didn't promote that specific product out of the UK over its DB2 product out of the USA).

mirekrusin 9 years ago

Why, again, this obsession with absolute statements? A doean't need to be "a killer" of B; Redis is a great NoSQL database and it occupies specific place in the stack, graph databases have their own, invaluable traversal query languages and SQL databases are perfect for many other cases. Nobody needs to "beat to death" or "be the only future". I'm a bit bored with this hyperbolic, nonsense rhetoric.

emodendroket 9 years ago

Because it's better in every single way except maybe for massive scale, a problem fewer people have than think they have, would be my answer to that question.

nimchimpsky 9 years ago

And NoSQL is only better for massive scale when you don't need to do anything with the data.

sandGorgon 9 years ago

Nobody has mentioned Postgres 10 which is going to be pretty cool in its release - https://www.ashnik.com/new-features-coming-in-postgresql-10/

Postgres is creeping up on Cassandra faster than Cassandra becomes usable. Especially the really cool work that Citus is doing.

ddorian43 9 years ago

Nah, postgresql still doesn't have a nice/easy sharding policy. Don't forget that citus is still single-master (the opensource one). Postgresql-xl also doesn't have nice/easy replication.
Scylladb is creeping up on Cassandra though.
- sandGorgon 9 years ago
  
  citus is all opensource now - https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes...
  https://github.com/citusdata/citus
  Which is why Postgres is creeping up on both scylladb and cassandra.
  Plus PG 10 will have declarative partitioning built in. Pretty cool.
  
  ddorian43 9 years ago
  
  1. I know it's open-source, but they have a `citus-mx` product which has multiple masters which isn't open-source (currently you have to do coordinators manually).
  2. Kind hard to creep up on scylladb by seeing it's architecture and assuming you use it for what it's best at, the asyncness and raw performance.

fadzlan 9 years ago

Based from what I read of the article, its more about SQL the query language instead of the normal ACID compliant RDBMS over NoSQL databases.

Of course, having a common query language is a big advantage, but NoSQL varies in paradigms and implementation due to the subset of problems they choose to solve, thus having a standard query language does not make sense for NoSQL.

odammit 9 years ago

Heard a great quote once, "If you don't understand SQL, you probably need SQL."

TheVikingOwain 9 years ago

I always get a little tick when I read the nosql is a new thing. Pick databases like Unidata/Universe and mvBase have been around for a long time and experienced a lot of the problems that newer nosql implementations have/had.

Glyptodon 9 years ago

SQL is great. If it needs anything it's built in tools for data expiration and a means of creating results that are more complexly structured than just rows/columns, though both can be worked around.

known 9 years ago

https://en.wikipedia.org/wiki/Comparison_of_structured_stora...

systematical 9 years ago

For me the author nailed it when he brought up the PostgreSQL JSON datatype. I just couldn't ever get into MongoDB, but recognized the value of JSON structure data. Perfect balance for me.

sgt 9 years ago

What are the implications for apps being designed in Firebase?

turowicz 9 years ago

Comparing SQL to NoSQL is like comparing a boat to a car. They serve different purpose and each come with their own benefits and disadvantages.

amigoingtodie 9 years ago

Larry Ellison ate IBM's lunch, using Codd's idea (which was funded by IBM).

The math is solid, right?

gsylvie 9 years ago

SSD's.

jjirsa 9 years ago

If SSDs solve your scaling problem, you never had a scaling problem
- gsylvie 9 years ago
  
  NVMe SSD's are 100x faster than 15k disks. How many shops run 100+ machine NoSQL instances?
  
  jjirsa 9 years ago
  
  Lots? Certainly most adtech firms and IOT type firms
  Also: scaling isn't just speed, it's also capacity, HA, failover. Are you running 100x4T NVME in a single server?
  Edit: fun little story. Some time back, I wrote a compaction strategy for Cassandra designed for time-series data. That strategy (known as TWCS) is now widely deployed, but for the first year or so, it was an uphill battle to get it accepted into the project over the existing options (high barrier to entry for fairly obvious reasons). I had given a talk at a conference and made the code available online - my employer was going to use it, so giving it out was a good way to help iron out any wrinkles, and maybe some early adopters could help me get it into the project.
  There was a Fortune 500 company who sent me one of their products - unsolicited - as a thank-you, because they had deployed my compaction strategy, and it apparently helped them a lot. I asked how much, and their answer was something like "About $20k/month". In my mind, I thought that was great - probably let them decommission somewhere in the range of 5-20 servers, which sounded pretty good to me. Then they said "per cluster, and there's 10 clusters so far".
  Now, the $200k/month savings is cool and all, but the real point is think about how many machines are involved here to be able to SAVE $200k/month just by changing the way data files are combined.

jondubois 9 years ago

this article is overly opinionated and narrow-minded.

Many startups use NoSQL. Personally I prefer NoSQL for most use cases. That said I have nothing against SQL itself but I will NEVER go back to using an ORM - They're a dirty hack; always have been and always will be. I'd rather write the SQL by hand.

I love using MongoDB; it's very simple to use. I liked RethinkDB's ReQL even more (even though the company itself had to shut down, the open source project is still great).

The main problem with SQL databases is that the powerful features offered by SQL don't scale; so if you want to build a future-proof system, you have to avoid using certain features like joins; the advantage of NoSQL is that it's built to scale; the constraints are are feature.

arkh 9 years ago

> you have to avoid using certain features like joins
Too bad it is kinda sorta the thing with a RDBM. Relationnal. When you have to work with data which are related and don't want to do stupid shit like updating the infos of an author in each of its written article you want SQL.
But nothing prevents you from using a NoSQL database on top or even multiple ones. If you have multiple way to use your data, having multiple read or write models is a plus.
About scale: rare are the people who can't just add more hardware to the problem.
naranha 9 years ago

I really agree with you. I think ORM is responsible for a lot of failed software projects, since it leads to over-engineering. I saw a lot of projects where 90% of the mapped columns or tables could be replaced with JSON, because they were simply never queried against.
One thing I learned writing SPAs is that joins are often unnecessary because with proper state management a lot of data is already present on the client side. NoSQL Query languages (for example Mongo or CouchDB queries) are more than sufficient for these use cases.
rayascott 9 years ago

Well said. It's amazing how many engineers will make sweeping generalisations like it's a provable fact. You'd think from their training, they'd know better. But hey, he's got a blog and service, and everyone listen up now.

amirouche 9 years ago

What is good about SQL is not the programming theory behind it, or the fact that it's supposed to be simple enough for non-programmers. Just like JavaScript, the good thing about SQL is that it's a standard.

atomical 9 years ago

Most data is relational. If you put relational data in a NoSQL database you end up writing an API that is similar to SQL.

SmellTheGlove 9 years ago

Yup, and you move data logic from a stable, baked solution (your RDBMS of choice) to the application layer. Best case, you end up with something as well-implemented as SQL, but you're maintaining it as part of the app logic. Might as well let the database do it.
Clubber 9 years ago

Most structured data is relational. If you include data like (someone else's) web site content, images, sound files, video files, etc, it's a different picture (pardon the pun).
- busterarm 9 years ago
  
  Filter it through logstash (or something) and then structure it :D Literally just "collect everything", store in ES, model what you need later.
  It's a bit weird, but logstash & elasticsearch (and relating/extracting data when and where I need it into postgres) have made my life so much easier
- wvenable 9 years ago
  
  Most data is structured. A lot of other data (mages, sound, video) can be stored untransformed as binary. If there is a middle ground between those, it's really really small.
- LeoPanthera 9 years ago
  
  How is NoSQL better than simply using the filesystem?
  
  Clubber 9 years ago
  
  I'm not an expert on NoSQL, but based on general computer knowledge I would say some advantages over a filesystem would be:
  1. Single entry point. 2. Not having to share your filesystem. 3. Centralized security separated from your OS 4. Simple key/value access (no folder structure) 5. Abstraction of the filesystem (like #2) 6. Distribution of data over multiple nodes. 7. Separation of NoSQL admin and network admins. 8. Shiney new thing :)
  I'm sure now a days, an advanced filesystem/sharing protocol would handle some of these.
  
  dom0 9 years ago
  
  The biggest advantage is that you get transactions. (Or Not). Possibly faster small objects with less (memory, storage) overhead.
  
  paulddraper 9 years ago
  
  For one, should be faster for small operations.
  Some databases also let you define secondary indexes.
quotemstr 9 years ago

> If you put relational data in a NoSQL database you end up writing an API that is similar to SQL
Yep. Except sometimes without the benefits of a query planner, standardization, or comprehensibility. NoSQL is better in every way, right?

wildchild 9 years ago

Because nosql is hype and snake-oil. Barely usable tools to dump startups on ignorant investors.

primeblue 9 years ago

NoSQL is practically useless for succinct and efficient queries

newsmania 9 years ago

everybody hates on Mongodb, but at least it's sql-injection proof.