Today, the new category system, which allows an unrestricted number of category-levels, has been installed on PhysicsOverflow. This system was required to enable the categorization of submissions beyond the four levels provided by the actual Question2Answer (Q2A) framework. This new system is almost invisible for the user, but has the important function to make a (future) large number of submissions searchable by the user. About two and a half months have been required to develop this system, you had to wait for a long time. Therefore I like to give you some insights.
The original category system of the Q2A framework is a hierarchical database model. Every node (a category) is linked to its parent by the ID of the parent node. Such a model enables a fast writing of new nodes (you just create the node and link it to its parent), while queries through the tree are usually slow. Q2A solved this issue by a fixed number of 4 category levels. To be able to find parents in a short time, the path to the parent is hard-coded in every post. This means that every post contains four indices storing the way back through the tree. This is a clever and fast solution. However, it can not be extended to an unrestricted number of categories, it even gets slow if extended to 8 levels for instance. The category system is written in the core code of Q2A and spread over a large part of the system.
Another issue arising when we increase the number of category levels is the user interface. Actually, when the user asks a question, he has to select the category for this post. The user interface uses select tags like:
It is clear that for instance for eight categories, the place on the page is too small to display all these tags side by side. Therefore also the user interface had to be changed. The new user interface take much less room and looks like this:
For the new category system, I have implemented another database model, called nested set model. It allows for an unrestricted number of category levels. While queries through the categories become very fast, the insertion of nodes is slow, because all indices of the whole tree have to be changed. However, changes on the categories will be used much less frequent than for instance to display the tree. Like this, it is well adapted to our needs.
As already mentioned, the original category system is placed in the core code of Q2A and is active in almost all pages provided by the framework (even in pages I never expected this). I had to replace 9 files completely, all select specifications for database accesses to the category system had to be renewed and, naturally, the code for the nested set model had to be written. To give you an impression on the size of the task, here some numbers: The original Question2Answer framework consists of about 36’000 lines of code. Until today, I have written 18’100 new lines of code for PhysicsOverflow in form of plugins, layers, overrides and changes in the core code, about half of the size of the system. The new system required 5’800 additional lines of code, which explains, why it took that long to realize it. I hope now that I was able to test all use-cases so that there remain eventually only minor bugs.
The next development step will be a surprise, stay tuned!