Querying GitHub's GraphQL API
As part of my "Commits" app, I decided to fetch commit data from GitHub's new GraphQL API. I figured that while I'm learning something new, I might as well just try learning as much as possible, which is why you're about to see some really ugly code and a pseudo explanation of it all.
GraphQL
GraphQL stands for graph query language. Graph refers more to the structure of the data, which consists of nodes and edges, than to a graph that you might see in a presentation. GitHub has a pretty nice intro to GraphQL if you're interested, but in summary, when you query a GraphQL API instead of a REST API, you can fetch data from related resources in a single query from one endpoint, https://api.github.com/graphql, in the case of GitHub, instead of many. With a REST API, you'd likely have multiple endpoints you'd need to hit in order to gather all the data you need.
One thing to note is that this is usually the case with a well-maintained, public API. If you're working with an API you design yourself that isn't totally RESTful, you probably haven't needed to make multiple queries in order to obtain the data you need. In Rails-land, for instance, we often define serializers that specify how we want an object to be represented after we query or change it in an endpoint.
In order to form a query, you send a single POST request to the endpoint. Within the body of the request you place your query as a string.
I played a lot with the GitHub GraphQL Explorer in order to get a sense of what my query would actually look like. One nice thing you'll notice about GraphQL is that the structure of your query will basically mirror the structure of the JSON payload you get back. Among the other many benefits of GraphQL is the ability to ask for exactly the data you need, nothing more, nothing less.
GitHub
In order to get started with the GitHub API, you'll need to do a couple setup steps, such as creating a personal access token in order to authenticate. You'll also need to grant that access token certain scopes.
When you're ready, it's time to query, and by query, I mean mess around with the explorer until you get something that resembles the data you want.
I decided to start from a repository. After consulting some of the
schema
documentations and more experimentation, I was able to form a query that
would go from the repository (a node), grab a ref (or branch) with the name of master
,
then leverage a target
, in this case a Git object, and eventually use an
inline fragment to extract some data about commits. Refs and branches aren't
quite one and the same, but it was close enough for my purposes.
From there, I started traversing the graph using edges and nodes, where edges are the connections between nodes and nodes are basically different resources, to grab all of the commits by this one individual. Full disclosure: I have not read all of the docs about queries and GraphQL, but I did land on this query, which does the job fairly well.
For reference, here's the function I created that will construct the query I want while passing in some variables. The cursor I ended up using was the SHA of the most recent commit at the time I was writing this.
def query(repoName, owner, cursor, authorId) do
"query {
repository(name: \"#{repoName}\", owner:\"#{owner}\") {
name
ref(qualifiedName: \"master\") {
target {
... on Commit {
id
history(first: 100, author: {id: \"#{authorId}\"}, after: \"#{cursor}\") {
pageInfo {
hasNextPage
}
edges {
node {
oid
message
author() {
name
date
}
}
cursor
}
}
}
}
}
}
}"
end
Now, because I wanted to query a number of repositories and run script to
seed my database easily, I decided to write a module in the seeds file,
/commits/priv/repo/seeds.exs
.
The content of the module follows below:
defmodule Seed do
def query(repoName, owner, cursor, authorId) do
"query {
repository(name: \"#{repoName}\", owner:\"#{owner}\") {
name
ref(qualifiedName: \"master\") {
target {
... on Commit {
id
history(first: 100, author: {id: \"#{authorId}\"}, after: \"#{cursor}\") {
pageInfo {
hasNextPage
}
edges {
node {
oid
message
author() {
name
date
}
}
cursor
}
}
}
}
}
}
}"
end
def get_commits(repoName, owner, cursor, hasNextPage, authorId) when hasNextPage == true do
token = Application.get_env(:commits, :github_access_token)
{:ok, %HTTPoison.Response{status_code: 200, body: body}} = HTTPoison.post("https://api.github.com/graphql", Poison.encode!(%{"query" => query(repoName, owner, cursor, authorId)}), [{"Authorization", "bearer #{token}"}, {"Content-Type", "application/json"}])
history = body
|> Poison.decode!
|> Kernel.get_in(["data", "repository", "ref", "target", "history"])
nextPage = history["pageInfo"]["hasNextPage"]
commits = body
|> Poison.decode!
|> Kernel.get_in(["data", "repository", "ref", "target", "history", "edges"])
Enum.map(commits, fn c -> save(c, repoName) end)
last_cursor = List.last(commits)["cursor"]
get_commits(repoName, owner, last_cursor, nextPage, authorId)
end
def get_commits(repoName, owner, cursor, hasNextPage, authorId) when hasNextPage == false do
IO.puts cursor
end
def save(commit, repoName) do
message = commit["node"]["message"]
date = commit["node"]["author"]["date"]
if !String.contains?(message, "Merge") do
IO.puts message
{:ok, formatted_date} = NaiveDateTime.from_iso8601(date)
Commits.Repo.insert!(%Commits.Logs.CommitMessage{content: message, committed_at: formatted_date, repo_name: repoName})
end
end
end
Looping in Elixir?
In Ruby, you'd probaby write a loop to iterate over the paginated results you get from an API, whereas in Elixir you'd more likely use recursion, which in simple terms is when a function keeps calling itself until it gets to a base case and stops.
Above, you'll see that instead of writing a loop, I wrote two functions
named get_commits
, one that is called when there is a next page and one
that is called when there isn't. When there are still pages of results left,
I keep calling that version of the function, and when the results run out, I
exit the loop.
At the end of my seed file, I called the function Seed.get_commits
,
passing in the repo, owner, authorId, cursor, and hasNextPage value of true
for a bunch of different repos to which I have access.
Running the Seed File
Running the seed file is easy from the command line with:
mix run priv/repo/seeds.exs
And right after that, I had a pretty full database ready to be queried in a non-GraphQL fashion.
All of this only touches the surface of GraphQL and somewhat poorly demonstrates my current knowledge of Elixir. But hey, we all need to start somewhere.