Pathways: The What, the Why and the How

Pathways aims to be a curated collection of “pathways” for learning any skill. Every pathway is a self sufficient guide to gain proficiency in any skill or topic.

The Why

The internet has an abundancy of resources of varying qualities. However, there is no guide to deciding which resource is the best for someone at a particular level. These resources are often scattered across courses, blogs, GitHub repositories, etc. There is no need for additional resources. But there is a strong need for these resources to be arranged such that they make sense to someone with zero knowledge on a topic.

The Problem with Existing Resources

As an example, suppose you are learning Django. If you choose to not follow a course you might end up doing something like this:

  • You Google “Learn Django”
  • Readan article that tells you Django follows the “MVT” pattern
  • Google what MVT means. “Model View Template” what the f-ck does that mean?
  • What’s a model? What’s a view? What’s a template?
  • Oh so a template is a frame in which you put data
  • Wait, where is this data coming from? A model?
  • So models = Database?
  • And what are Django forms? Am I supposed to use them or not?

It can go on and on till all those articles push you around like a pin ball and things finally begin to make sense. If only you knew what to read first. Or even what to read at all. This is exactly what Pathways aims to solve - tell you what to read first, and what to leave for later.

The What

Pathway_Display The crux of Pathways is well pathways. A Pathway is an ordered collection of steps. A step in itself can be:

  • An external resource like a tutorial, a blog, a YouTube video, etc
  • Another Pathway

Instead of dumping everything you need to learn before following a tutorial in the prerequisites, a Pathway can introduce an external topic in a step exactly when you need to learn it. This way, Pathways can be self sufficient, with virtually no prerequisites. The progress of the user is also tracked across Pathways. So if you have completed a step titled Python in the Django Pathway, you won’t have to repeat it while following the Flask Pathway. This helps the author of a Pathway to make fewer assumptions about the reader and make more comprehensive Pathways.

Pathways are also community driven. All content in Pathways is stored in a public GitHub repository. People can contribute to Pathways using GitHub’s PR model while also being attributed for their contributions on Pathways.

The How

Now we come to the juicy tech stuff. Pathways is built using the GRANDStack ie: GraphQL, React, Apollo and Neo4j database. The project was bootstrapped using the GrandStack Starter.

The Database and the Data Modelling

Knowledge in its current state is best represented as a graph. Topics are often interlinked rather than sequential. Thus, it was pretty obvious that we had to use a graph database to store data for Pathways.

Neo4j is a native graph database, built from the ground up to leverage not only data but also data relationships. Neo4j connects data as it’s stored, enabling queries never before imagined, at speeds never thought possible.

Neo4j is by far the most popular choice for Graph databases. Cypher, the query language for Neo4j is refreshingly simple to use. We wrote raw Cypher queries for all writes in the database. The reads are handled mostly along with GraphQL.

Pathways is built upon four fundamental entities:

  • The User: Someone who interacts with Pathways
  • Pathways: A collection of steps alongwith other metadata
  • Steps: A link to a Pathway or Content with data like: time required, index in Pathway etc
  • Content: Reusable markdown content that is abstracted behind a step.

These are represented as nodes in the database and interact with each other using relationships. For example, the relation between a Pathway and a step is expressed as:

(s:Step)-[r:HAS_PARENT_PATHWAY]->(p:Pathway)

Creating The GraphQL API

The main problem with using only Apollo for the GraphQL API was that we would have had to write resolver for each subquery. This would have required multiple database calls completely dissolving the advantages of Neo4j. Thankfully, GRANDstack comes with a very helpful library called neo4j-graphql-js. The library translates all GraphQL queries into the corresponding Cypher queries fetching exactly what we need in a single database call most of the times.

Queries like the one above are handled with ease without any custom resolvers. We only had to define the schema that corresponds to the data in the database.

While the library works great most of the time, letting the library handle data fetching means that we are left with little control. This made handling business logic a bit tricky. On top of this, the documentation for most of the stack is sparse and incomplete. To handle fetching that involved some business logic, we wrote our own resolversThis meant breaking the single database call rule but the handy delegateToSchema function in graphql-tools helped us do most it in a maximum of 2-3 database calls.

Note: We wouldn’t have had to write our own resolvers if we just used @cypher directives but that would have meant writing database queries inside our GraphQL definition. We chose to maintain the code structure and the separation of the DB layer from the API layer.

The Client

The client is written in ReactJS. It uses Apollo Client to query the GraphQL server. It supports live markdown editing for creating Pathways.

The Containerization and the Deployment

Containerizing the project using Docker helped us in speeding up the development of the project. People working on the client don’t have to learn how to set up environment variables, host the server and connect it to the database. A simple docker-compose up gets the whole project up and running. Another benefit is that we don’t have to separate the production and the development version of the project. We simply extract the configuration into environment variables.

Pathways is currently deployed in a DigitalOcean droplet. We used Github Actions to set up a one-click deployment from the repository.

The GitHub Repository

Alongwith the Neo4j database, all data of every Pathway is stored in a reproducible format in a public GitHub repository. This enables Pathways to support community driven contributions. Anyone can make changes to the content on Pathways and be credited for it. The GitHub repository and the Neo4j database are kept in sync both ways. The GitHub API is heavily documented and in our case, using it was fairly simple.

Pathways is currently under heavy development and is welcoming contributions. Check out the GitHub repository at Pathways.