Using SQL Union
I had the chance recently to use
SQL Union in my everyday work, so I figured
I'd write this to remind myself of why it's so useful.
First, though, some hot takes.
Building dashboards and CSVs for internal business tools into your apps is a little silly and cumbersome. For one, every time someone wants another column added, they have to ask engineering to make a code change. And also, once the data set gets big enough, requests start timing out.
It's actually really difficult to generate quality queries with
ActiveRecord. When you're working with more complex joins across the codebase, it's probably better to dip into writing raw SQL. Although it's not as maintainable, it's often more efficient, and you end up with tabular data rather than objects hydrated with data from your database. Plus you can actually transform the data into the shape that you want by using
CASEstatements and so on.
Okay, hot takes aside, you can probably see that in some cases, it's better to expose your app's data (probably with a data pipeline) to allow it to be queryable with a business intelligence tool. So...that is what we did in this case.
Using Postico, I connected to a replica in a data warehouse and was able to
query data from across applications, which is how
SQL Union came into play.
I wanted to put data from different sources into one table to export it as a
CSV. To do this, I used the
UNION operator in SQL.
Before you use
union, you'll need to make sure that the number of columns
in each data set is equal. The data will just get mashed together
independent of any column names.
Here's a basic example and then I'll show off the query I actually used.
If for some reason you had two tables and wanted to make a single list of kids and parents, you could use
But lo and behold, the number of columns differs in each table.
That's cool: we can make them the same by adding a value to represent that column. Or we could omit that column if we wanted to. I'll do the example where we list a value.
select * from kids union select name, email, 'n/a' as age from parents
And that should allow us to get the right data in the right columns. Simple, silly example, but this sets the stage for some more complex querying.
Here's a cleaned-up version of the query I used to retrieve the set of online students who were in both our self-paced and structured programs. Data was split across many different tables from two separate apps that had their data piped into Redshift.
select users.uuid, users.email, users.first_name, users.last_name, cohort_batches.* from users join user_roles on user_roles.user_id = users.id join roles on user_roles.role_id = roles.id join (select batches.id as batch_id, batches.uuid as batch_uuid, batches.iteration as batch_iteration, cohorts.name as cohort_name, campuses.name as campus_name, cohorts.start_date as start_date from cohorts join batches on cohorts.batch_id = batches._uuid join campuses on cohorts.campus_id = campuses.id join courses on courses.id = cohorts.course_id where campuses.name = 'Online' and courses.name in ('Software Engineering', 'Data Science') union select batches.id as batch_id, batches.uuid as batch_uuid, batches.iteration as batch_iteration, null as cohort_name, # wasn't available, so fill in a default value 'Online' as campus_name, # campus name all the same for these null as start_date # no start date since self-paced from batches where batches.iteration in ('v-000', 'online-web-sp-000', 'online-ds-sp-000') ) as cohort_batches on cohort_batches.batch_id = user_roles.roleable_id where roles.name = 'student' and user_roles.roleable_type = 'Batch'
It's not the craziest query I've ever seen, but as a person unaccustomed to writing raw SQL all day, it was definitely a doozy to construct. What I found interesting was the ability to join two sets of "cohort batches" together derived from different sources and then join that set of data to our users table based on a role and roleable type.
So, lots of querying, subquerying, union-ing and yeah. I suppose it was fun? For what it's worth, dipping down into raw SQL wasn't my favorite activity, but it got progressively more fun over time. So happy querying, I guess.