But then https://ggsql.org/faq.html has a section, "Can I use SQL queries inside the VISUALISE clause," which says, "Some parts of the syntax are passed on directly to the database".
The homepage says "ggsql interfaces directly with your database"
Yes this was my question as well, an example showing all the plumbing/dependencies to generate a graph from an external database server would be very helpful.
We certainly plan to create a few videos showing how to set it up and use it. If you use it in Positron with the ggsql extension it can interact directly with the connection pane to connect to the various backends you have there
That is fair - it is somewhat of a special concept.
ggsql connects directly with your database backend (if you wish - you can also run it with an in-memory DuckDB backend). Your visual query is translated into a SQL query for each layer of the visualisation and the resulting table is then used for rendering.
E.g.
VISUALISE page_views AS x FROM visits
DRAW smooth
will create a SQL query that calculates a smoothing kernel over the data and returns points along that. Those points are then used to create the final line chart
ggsql has the concept of a "reader", which can be thought of as the way ggsql interfaces with a SQL database. It handles the connection to the database and generating the correct dialect of SQL for that database.
As an alpha, we support just a few readers today: duckdb, sqlite, and an experimental ODBC reader. We have largely been focusing development mainly around driving duckdb with local files, though duckdb has extensions to talk to some other types of database.
The idea is that ggsql takes your visualisation query, and then generates a selection of SQL queries to be executed on the database. It sends these queries using the reader, then builds the resulting visualisation with the returned data. That is how we can plot a histogram from very many rows of data, the statistics required to produce a histogram are converted into SQL queries, and only a few points are returned to us to draw bars of the correct height.
By default ggsql will connect to an in-memory duckDB database. If you are using the CLI, you can use the `--reader` argument to connect to files on-disk or an ODBC URI.
If you use Positron, you can do this a little easier through its dedicated "Connections" pane, and the ggsql Jupyter kernel has a magic SQL comment that can be issued to set up a particular reader. I plan to expand a little more on using ggsql with these external tools in the docs soon.
I skimmed the article for an explanation of why this is needed, what problem it solves, and didn't find one I could follow. Is the point that we want to be able to ask for visualizations directly against tables in remote SQL databases, instead of having to first pull the data into R data frames so we can run ggplot on it? But why create a new SQL-like language? We already have a package, dbplyr, that translates between R and SQL. Wouldn't it be more direct to extend ggplot to support dbplyr tbl objects, and have ggplot generate the SQL?
Or is the idea that SQL is such a great language to write in that a lot of people will be thrilled to do their ggplots in this SQL-like language?
Will this ever integrate rest of the ggplot2 dependent packages described here: https://exts.ggplot2.tidyverse.org/gallery/ in the near or distant future? Sorry if it already mentioned somewhere.
I'd say it does, though it is certainly much more low-level than e.g. ggplot2. But the basic premises of the building blocks described be Leland Wilkinson is there
Maybe I skim read it too fast, but I did not find any clear description in the blog post or website docs of how this relates to SQL databases
I was kind of guessing that it doesn't run in a database, that it's a SQL-like syntax for a visualisation DSL handled by front end chart library.
That appears to be what is described in https://ggsql.org/get_started/anatomy.html
But then https://ggsql.org/faq.html has a section, "Can I use SQL queries inside the VISUALISE clause," which says, "Some parts of the syntax are passed on directly to the database".
The homepage says "ggsql interfaces directly with your database"
But it's not shown how that happens AFAICT
confused
Yes this was my question as well, an example showing all the plumbing/dependencies to generate a graph from an external database server would be very helpful.
We certainly plan to create a few videos showing how to set it up and use it. If you use it in Positron with the ggsql extension it can interact directly with the connection pane to connect to the various backends you have there
That is fair - it is somewhat of a special concept.
ggsql connects directly with your database backend (if you wish - you can also run it with an in-memory DuckDB backend). Your visual query is translated into a SQL query for each layer of the visualisation and the resulting table is then used for rendering.
E.g.
VISUALISE page_views AS x FROM visits DRAW smooth
will create a SQL query that calculates a smoothing kernel over the data and returns points along that. Those points are then used to create the final line chart
ggsql has the concept of a "reader", which can be thought of as the way ggsql interfaces with a SQL database. It handles the connection to the database and generating the correct dialect of SQL for that database.
As an alpha, we support just a few readers today: duckdb, sqlite, and an experimental ODBC reader. We have largely been focusing development mainly around driving duckdb with local files, though duckdb has extensions to talk to some other types of database.
The idea is that ggsql takes your visualisation query, and then generates a selection of SQL queries to be executed on the database. It sends these queries using the reader, then builds the resulting visualisation with the returned data. That is how we can plot a histogram from very many rows of data, the statistics required to produce a histogram are converted into SQL queries, and only a few points are returned to us to draw bars of the correct height.
By default ggsql will connect to an in-memory duckDB database. If you are using the CLI, you can use the `--reader` argument to connect to files on-disk or an ODBC URI.
If you use Positron, you can do this a little easier through its dedicated "Connections" pane, and the ggsql Jupyter kernel has a magic SQL comment that can be issued to set up a particular reader. I plan to expand a little more on using ggsql with these external tools in the docs soon.
Highly suggest leveraging adbc. I would love to use this against our bigquery tables.
I skimmed the article for an explanation of why this is needed, what problem it solves, and didn't find one I could follow. Is the point that we want to be able to ask for visualizations directly against tables in remote SQL databases, instead of having to first pull the data into R data frames so we can run ggplot on it? But why create a new SQL-like language? We already have a package, dbplyr, that translates between R and SQL. Wouldn't it be more direct to extend ggplot to support dbplyr tbl objects, and have ggplot generate the SQL?
Or is the idea that SQL is such a great language to write in that a lot of people will be thrilled to do their ggplots in this SQL-like language?
It seems to be for sql users who don’t know python or r.
Will this ever integrate rest of the ggplot2 dependent packages described here: https://exts.ggplot2.tidyverse.org/gallery/ in the near or distant future? Sorry if it already mentioned somewhere.
Love the layering approach - that solves a problem I’ve had with other sql/visual hybrids as you move past the basics charts.
It is conceivable that this could become a duckdb extension, such that it can be used from within the duckdb CLI? That would be pretty slick.
Looks intriguing. Brings plotting to Sql instead of “transforming” sql for plotting.
The new visualisation tool from Posit. Combines SQL with the grammar of graphics, known from ggplot2, D3, and plotnine
I'm one of the authors - happy to take any questions!
I don't think D3 uses grammar of graphics model?
I'd say it does, though it is certainly much more low-level than e.g. ggplot2. But the basic premises of the building blocks described be Leland Wilkinson is there
Don't forget Vega! https://vega.github.io/vega/
Wow, love this idea.
Would be awesome if somehow coupled into Evidence.dev