GraphQL Schema Design Best Practices

Thoughts from 19^th December 2019

Alternative Title: I read tons of GraphQL blog articles and these are my notes.

Build your schema based on existing requirements and your domain model

It's tempting to try to define the "perfect schema" for all of your data up front, but what makes the graph valuable is the degree to which it follows user requirements - which are constantly changing. Therefore the true perfect schema makes it easy for the graph to evolve in response to changing needs, without breaking existing clients.
All fields and methods in your schema should be oriented by the demand of the consumers of your API. Fields shouldn't be added to the schema speculatively. You should build your schema incrementally based on actual requirements, and evolve it over time.
Start with the schema first, and design it purely based on the domain and not what's behind the fields (Databases, Rest Endpoints, etc.). To do this, you have to be either a domain expert yourself, or work closely together with them.
The schema should never leak implementation details (for example, use friends instead of unfilteredFriendConnection as a field name).

Don't try to build a one-size-fits-all schema

One big benefit of GraphQL is that each client can select exactly what they need and want, instead of being forced to consume what the API designer cooked up. You should embrace the different use cases and clients.
Prefer optimized, exact fields over "smart" fields, whenever distinct use-cases exist.

# Do!
userById(id: ID!): User!
userByName(name: String!): User!

# Don't!
user(id: ID, name: String): User!

Use unions of specific types instead of flags on general types

# Do!
union Shipping = Pickup | Mail

# Don't!
type Shipping {
  isPickup: Boolean!
}

Use consistent naming conventions

enum Region {
  EUROPE
  NORTH_AMERICA
}

type MarketingCampaignConnection {
  edges: [MarketingCampaignEdge!]!
}

type MarketingCampaignEdge {
  node: MarketingCampaign
}

type MarketingCampaign {
  trackingUrl: String!
  region: Region!
}

type Query {
  marketingCampaigns(filter: MarketingCampaignFilter): MarketingCampaignConnection!
}

type Mutation {
  deleteMarketingCampaign(input: DeleteMarketingCampaignInput!): DeleteMarketingCampaignPayload!
}

type Subscription {
  deleteMarketingCampaignEvent: DeleteMarketingCampaignPayload!
}

Field names should use camelCase.
Type names should use PascalCase.
Enum names should use PascalCase.
Enum values should use ALL_CAPS, because they are similar to constants.
Apply the (Action)(Type)(Modifier) format to everything.
Use as specific naming as possible (e.g. use imageUrl instead of image, or onlineStoreUrl instead of url) so when the requirements change, you can make changes to the schema without having to introduce a breaking version.

Use Object types instead of simple types whenever possible

# Do!
type Customer {
  location: {
    city: String!
    zipCode: String!
  }
}

# Don't!
type Customer {
  locationCity: String!
  locationZipCode: String!
}

It may seem ugly at first, but nesting is a virtue in GraphQL schema design. You will want to nest as much as reasonable.
The rationale for this is that nesting types gives you the most flexibility to evolve your schema in a sensible way without having a breaking version in the future.
You should heavily consider applying this rule to fields that have a prefix or suffix.
Another good common example is image which should likely not be a simple String field but instead an object to allow for example for resizing the image, having text for the alt field, etc.

# Do!
type Product {
  image: {
    url (width: Int!, height: Int!): String!
    title: String!
  }
}

# Don't!
type Product {
  image: String!
}

Nest your types, do not reference their IDs

# Do!
type Book {
  author: Author!
}

# Don't!
type Book {
  authorId: ID!
}

Coming from a REST API development standpoint, it might be reasonable to embed IDs into the response types.
This is a big anti-pattern in GraphQL, because you loose the ability to fetch related resources in one API call via nesting in the graph, and (like for REST) need another round-trip to fetch the resources by their IDs.

Use an `input` object type for mutations

# Do!
type UpdateAuthorInput {
  author: {
    id: ID!
    firstName: String!
    lastName: String!
  }
}

type Mutation {
  updateAuthor (input: UpdateAuthorInput!): UpdateAuthorPayload!
}

# Don't!
type Mutation {
  updateAuthor (id: ID!, firstName: String!, lastName: String!): UpdateAuthorPayload!
}

Having only one input object makes it much easier to make the mutation dynamic with variables (one variable total vs one variable per field).
Nesting the input data into an additional object (e.g. author) allows for more flexibility later (for example, if you have a sendTeamNotificationEmail flag, you can now nest it under flags instead of having to add it directly into the Input type).
This is also required for Relay's clientMutationId used to consolidate mutations and their responses.
Don't reuse the Type (e.g. Author) as the InputType, because it may contain circular references, properties you may not want the user to be able to set, computed properties, etc.

Return affected objects as payloads for mutations

# Do!
type UpdateAuthorPayload {
  author: Author!
}

type DeleteAuthorPayload {
  id: ID!
}

type Mutation {
  updateAuthor(input: UpdateAuthorInput!): UpdateAuthorPayload!
}

# Don't!
type Mutation {
  updateAuthor(input: UpdateAuthorInput!): ID!
}

Return the affected resources for create and update mutations. This makes it easier to directly consume the change in the client without having to send an additional request.
For delete mutations, only return the deleted ID, since resolving the relations of the (now deleted) resource can cause errors that are unclear to the consumer.
Nesting the payload data into an additional object (e.g. author) allows for more flexibility later (for example, if you want to return some additional data like the ID of a queued backend job).
This is also required for Relay's clientMutationId used to consolidate mutations and their responses.

Don't forget about computed fields

Since clients have to specify exactly what fields they need, don't be shy about adding behaviour driven fields that answer specific client use-cases, and help reducing behavioural logic in the client (for example a isMergable field that computes server-side).
Another good candidate for computed fields are fields based on the authentication context (for example isMe, iAmFollowing, myLessions, etc.)

Use connections for pagination, and use pagination for everything that is a list

type Product {
  recommendedProducts (first: Int, last: Int, after: String, before: String): ProductConnection!
}

type ProductConnection {
  edges: [ProductEdge!]!
  pageInfo: PageInfo!
}

type ProductEdge {
  cursor: String!
  node: Product!

  # Optional: Additional data for the relationship
  boughtTogetherPercentage: Float!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

# Usage:
product (id: 1) {
  title
  recommendedProducts (first: 5, after: "4e025") {
    edges {
      cursor
      boughtTogetherPercentage
      node {
        title
      }
    }
    pageInfo {
      hasNextPage
      hasPreviousPage
      startCursor
      endCursor

      # Additional optional fields you could implement:
      pageCount
      hasNextPages (amount: Integer!)
      hasPreviousPages (amount: Integer!)
    }
  }
}

Never use authors: [Author!]! like you see in many tutorials. It will not scale.
These connections are more forward-compatible, as they allow for adding metadata to the association itself (e.g. for pagination) and the "edge" (the relation between the parent entity and the associated entity).
You should use these connections for everything that returns a list, so queries, 1:N and N:M relationships.
If you don't like how this looks (like me), take five minutes and watch the explanation here which helped me understand the reasons behind this design. You can also read more about this here.
For mutations, use a naming strategy like (Action)(TypeLeft)(TypeRight)Edge(Modifier), e.g. createUserFriendsEdgePayload

Consider adding filter & sort to connections

products(filter: "createdAt < 2019", sort: {field: "createdAt", direction: "ASC"}, first: 5) {
  edges {
    node {
      title
    }
  }
}

In a lot of cases, it is very useful to have the ability to filter and order the resources inside of connections similar to the WHERE / ORDER BY part in SQL. This generic interface allows for the implementation of many different features without having to update the schema.
How exactly the syntax for these filters should be defined is up for discussion, but most APIs take inspiration from SQL or MongoDB's find syntax.
Keep in mind that the implementation of these filters can get very complex, for example if you decide to allow filtering based on the data of relations (e.g. stores(filter: "products.title = 'Foo'") to filter all stores that sell a product with a specific title). Choose to limit complexity based on your requirements.

Provide top level queries for "get from ID" and for "get from filters"

product(id: ID!): Product!
products(filter: FilterString): ProductConnection!

Providing both a query to handle single resources (e.g. for detail pages), as well as a more general, filterable, paginated query for reading out multiple resources (e.g. for overview pages) handles most common use cases.
These queries should be provided for all types that can be used as starting points into the graph.

(Optional) Global object identification

If your application is growing out of proportion, it can help to provide only a single top level node() query for handling single resources, that can handle any resource with a global identifier. This reduces complexity for queries because it only needs a single resolver instead of multiple ones.
This global identifier has the type encoded into it, for example via base64(Author:123), and has to be provided on all resources.
You can read more about how to implement this here.

GraphQL Schema Design Best Practices

Build your schema based on existing requirements and your domain model

Don't try to build a one-size-fits-all schema

Use consistent naming conventions

Use Object types instead of simple types whenever possible

Nest your types, do not reference their IDs

Use an input object type for mutations

Return affected objects as payloads for mutations

Don't forget about computed fields

Use connections for pagination, and use pagination for everything that is a list

Consider adding filter & sort to connections

Provide top level queries for "get from ID" and for "get from filters"

(Optional) Global object identification

Use an `input` object type for mutations