2 May 2022
GraphQL makes it all too easy to overload your database with requests. Writing resolvers is at the core of any GraphQL server, and if you're not careful, you can very easily bring down your database in a single GraphQL query.
DataLoader was introduced to help with batching and caching requests to prevent sending unnecessary requests to your database. Using DataLoader we can create a "loader" that we use inside of our resolvers. If the loader has a value with the key already, that will be used, otherwise it'll execute your loader only once, unless you clear it.
We'll use an existing GraphQL server using GraphQL Yoga, and some static data, we'll replicate the overloading problem.
Let's take the following SDL:
type Query {
users: [User!]!
}
type User {
id: ID!
name: String!
bestFriend: User!
}
I have an array of users that I have the following function that replicates the simple behaviour of a database fetching this item.
const getUserById = (id: string): User => {
console.log(`Calling getUserById for id: ${id}`);
return users.find((d) => d.id === id);
};
Then inside of our GraphQL resolvers, we have a field resolver for User.bestFriend
that calls the getUserById
function above.
const resolvers = {
Query: {
users: () => users,
},
User: {
bestFriend: async ({ bestFriendId }) => {
const user = await getUserById(bestFriendId);
return user;
},
},
};
If we now execute a query to fetch a user's best friend, their best friend, and their best friend... Wherever the same user is referenced, we will call getUserById
multiple times.
Here's the query, and while it probably abuses nested queries, it demonstrates quite clearly the issue dataloader can solve:
{
users {
name
bestFriend {
name
bestFriend {
name
bestFriend {
name
}
}
}
}
}
We can see the output of console.log
in our terminal of this happening:
Calling getUserById for id: 2
Calling getUserById for id: 3
Calling getUserById for id: 2
Calling getUserById for id: 3
Calling getUserById for id: 2
Calling getUserById for id: 3
Calling getUserById for id: 2
Calling getUserById for id: 3
Calling getUserById for id: 2
We can see here that we made far too many requests to the same function getUserBydId
for a single user.
Thankfully DataLoader gives us a simple batching mechanism that we can use to reduce the multiple requests to getUserById
.
Next we'll create a function that allows us to pass multiple id
's that it can then pass onto getUserById
. This function must return a Promise
. It's what is expected of DataLoader.
const getUsersByIds = async (ids: string[]): Promise<User[]> => {
return ids.map((id) => getUserById(id));
};
Now using something like GraphQL Yoga we can instantiate a new DataLoader
and pass it our getUsersByIds
function in it's constructor:
const server = createServer({
schema: {
typeDefs,
resolvers,
},
context: () => ({
userLoader: new DataLoader(getUsersByIds),
}),
});
Now inside of our resolver code we can get userLoader
, and replace our previous implementation with something that looks like this:
const resolvers = {
Query: {
users: () => users,
},
User: {
bestFriend: async ({ bestFriendId }, _, { userLoader }) => {
const user = await userLoader.load(bestFriendId);
return user;
},
},
};
Now when we make the same query as we did above, we should see significantly less console.log
statements inside the terminal:
Calling getUserById for id: 2
Calling getUserById for id: 3
If some resolvers require authorization logic, have no fear! Using GraphQL Yoga you have access to the Request
object that is sent, and you can .get()
things from the headers
to pass to your business logic.
const server = createServer({
schema: {
typeDefs,
resolvers,
},
context: ({ req }) => {
const authorizationToken = req.headers["authorization"];
return {
userLoader: new DataLoader((ids) =>
getUsersByIds(ids, authorizationToken)
),
};
},
});
I should end by saying that DataLoader doesn't replace the need for a cache, so you'll want to make sure you're implementing something like that as well.