Schrodinger's Cat Strikes Back

Home » Articles posted by polarkernel

Author Archives: polarkernel

A long step towards the adult PhysicsOverflow

Today, the new category system, which allows an unrestricted number of category-levels, has been installed on PhysicsOverflow. This system was required to enable the categorization of submissions beyond the four levels provided by the actual Question2Answer (Q2A) framework. This new system is almost invisible for the user, but has the important function to make a (future) large number of submissions searchable by the user. About two and a half months have been required to develop this system, you had to wait for a long time. Therefore I like to give you some insights.

The original category system of the Q2A framework is a hierarchical database model. Every node (a category) is linked to its parent by the ID of the parent node. Such a model enables a fast writing of new nodes (you just create the node and link it to its parent), while queries through the tree are usually slow. Q2A solved this issue by a fixed number of 4 category levels. To be able to find parents in a short time, the path to the parent is hard-coded in every post. This means that every post contains four indices storing the way back through the tree. This is a clever and fast solution. However, it can not be extended to an unrestricted number of categories, it even gets slow if extended to 8 levels for instance. The category system is written in the core code of Q2A and spread over a large part of the system.

Another issue arising when we increase the number of category levels is the user interface. Actually, when the user asks a question, he has to select the category for this post. The user interface uses select tags like:

categories_old

It is clear that for instance for eight categories, the place on the page is too small to display all these tags side by side. Therefore also the user interface had to be changed. The new user interface take much less room and looks like this:

categroies_new

For the new category system, I have implemented another database model, called nested set model. It allows for an unrestricted number of category levels. While queries through the categories become very fast, the insertion of nodes is slow, because all indices of the whole tree have to be changed. However, changes on the categories will be used much less frequent than for instance to display the tree. Like this, it is well adapted to our needs.

As already mentioned, the original category system is placed in the core code of Q2A and is active in almost all pages provided by the framework (even in pages I never expected this). I had to replace 9 files completely, all select specifications for database accesses to the category system had to be renewed and, naturally, the code for the nested set model had to be written. To give you an impression on the size of the task, here some numbers: The original Question2Answer framework consists of about 36’000 lines of code. Until today, I have written 18’100 new lines of code for PhysicsOverflow in form of plugins, layers, overrides and changes in the core code, about half of the size of the system. The new system required 5’800 additional lines of code, which explains, why it took that long to realize it. I hope now that I was able to test all use-cases so that there remain eventually only minor bugs.

The next development step will be a surprise, stay tuned!

polarkernel

PhysicsOverflow is Living and Animated

Wondering what happened on PhysicsOverflow while I have developed the preliminary phase of PhysicsOverflows reviews section, Reviews I, I have looked on the host providers statistic and have done some queries in the event log of our database. Here are some numbers illustrating the pleasant activity on our site:

Number of visits in May 2014 (provider’s statistic):

visits_may_2014

As you can see, the number is slightly growing. The number of visits has been above 300 all the time approaching now 400 by end of the month. Maybe somebody has an explanation for the peak after the second weekend?

In the event log table of the database, 14’482 events have been logged in May 2014. Some interesting numbers are

  • 4126 times, a user has logged in.
  • 200 questions have been written.
  • 353 answers have been posted.
  • 1002 comments have been added.
  • 4995 upvotes have been given.
  • 86 new users have registered on the site.

I think this is really not bad for the second month after publishing the site. Maybe we will already have quite a number of submissions in the reviews section by end of next month? Interested? Submit a paper to PhysicsOverflow!

We Have Liftoff!

Liftoff

PhysicsOverflow public beta is online now since 17 days. After a turbulent start with some database issues on our host and spam attacks, now the site is stable and working fine. After an ongoing and continuous improvement of details, the section with Q&A on physics takes now the main part of activity on the site. Since we have gone online at 4th April, more than 150 new users have registered on the site and many of them are already quite active. The number of visits per day exceeded the number of visits on Theoretic Physics on SE since the first day, as can be seen in the following graphics:

Visits_April_2014

This is great, since we aren’t even part of a huge network! Our questions per day is about 5.6 (excluding imported posts), which is a lot more than TP.SEs. However, as already stated, the site does not depend on these figures, there is no deadline, as it was on TP. Also the term beta does not mean that the site will go away, it means only that we have still much more ideas to obtain the full-fledged version of PhysicsOverflow.

Please Contribute

Now is the time where the site gets shaped. If you participate now with your votes, ideas, opinions, questions and answers, you contribute to build a site with contours as you like them. Don’t stay outside, have a look at PhysicsOverflow, register there if you like it, or contribute here on this blog.

Review Section

We are still a small team. However, we are working with full power to be able to leave the beta state and to complete the site, as intended. The main part for this will be the Review Section, as already announced in this blog. This is not only a reconfiguration of the site, it requires a considerable part of new development, because such a feature is not foreseen in the Question2Answer framework. In detail, the following main functionalities have to be developed:

  • Integration of new pages, called Submission and Review.
  • Two voting criterias, one for originality and one for accuracy.
  • New page design, enabling display of a score value from the votings.
  • Add feature to add multiple authors.
  • Redesign of voting mechanism, distributing votes to multiple authors.
  • Integration in the rep update and recount system.
  • Adding and managing the required database tables.
  • Increasing the category depth to realize hierarchical tagging system.
  • Integration of score as new sorting criterion.
  • Software for mass import from ArXiV.
  • Software for daily import from ArXiV.

We are on the way with all that and look forward to realize these steps within a reasonable time. Stay tuned!

 

Fine Tuning

I know, everybody is impatient to see our site running and I apologize the delay. During several tests I found some issues to be corrected before new questions are entered in our database. However, in my opinion we will be faster like this, because otherwise we would have risked to have an inconstistent database or even to lose questions on repairing it. Preparing the test technical beta phase turned out to be more complicated than we thought. Additionally, Dilaton has been severely handicaped by an extremely poor internet connection, which made the communication very difficult. But now, we are very close! I assume we can go online for the test-users during the coming week. Dilaton will advice them in an email, how to connect to the beta site and how to contribute.

These have been the main issues:

  • The way Question2Answer handles access restrictions on plugins is vulnerable. A plugin may be set to be invisible for unauthorized users, but this does not really prevent the access. Therefore I had to add code to realize a strong proof of authority.
  • In the A51 import, several users had multiple (up to 4) accounts with the same username. This is not allowed under Question2Answer. I was not capable to resolve if these users with the same name have been really the same person. As a temporary solution I have manually corrected the database and renamed these users to USER_1, USER_2, … and so on. Actually I am developing a simple plugin “Merge User Account”. This will be very delicate and I am not yet sure if this can be really handled properly. Users with multiple accounts should please contact me directly. However, this plugin will not yet been required for the beta tests and therefore this issue will not slowdown our project.

The contact to register as test-user is still open!

Going Online :-)

As you have already seen, we have a host now with a first short introduction for the new site. Don’t bother about details now (the link is now blue), I had only very few time available and in my ecstasy I wanted to show you something already running. For a computer scientist, an automatic web-editor, as it is available on our host, is sometimes quite confusing. I do not like these automatisms. However, in let’s say two weeks, I will have set up a first version of our new site and will not require such an editor to do that.As a first step I would like to set up a technical beta version with the following purposes:

  • Debugging: I have a more than 40 years of practical experience with IT-projects and I am sure that there will be some bugs in my code. It will be the task of some dedicated test users (you?) to help me finding these bugs.
  • Parametrization: Setting all the parameters of the site by the super-administrator.
  • Completion: Many details are not yet ready: Logo, introducing text, texts in emails, etc. During the technical beta, the site should be completed for takeoff.
  • Organization: During this phase, the first organization should be prepared (administrators, moderators, ?).

During his phase, new registrations will be prohibited, only test users will have access to the site. However, the site will be visible from outside (which could be some advertising?).  I could also place it at a secret place known only to the test users. I could also exclude robots (Google etc.) from the site during this run. Note: There will be a (small) probability that the content of the site at the end of this phase has to be deleted. This could only happen if tables in tha database would have to be altered, so the risk is small.

In order that we are able to organize the technical beta, please contact Dilaton if you are interested to participate. We will contact you with details for the registration, as soon as the site is up.

I will finance the hosting of our site for a while. However, we should also start to think about the responsibility for the site. Actually, I am the only one that is no more anonymous, because according to ICANN I had to provide my complete contact data. So, from a legal point of view, at the moment I am the only one who is responsible for the content of this site. I would not like to stay alone in this position. At least at the end of the beta phase, we should have some regulation (legal notes, about us, etc ?) about this subject. Has somebody alrady thought about that?

 

 

Import of Endangered SE-Questions

As announced in my last post, I like to introduce the prototype of our new Q2A-plugin for the import of endangered SE-questions. For the user it has become the simplest and most comfortable solution I can imagine. Starting point is the link to the question on any SE-site loaded in your browser, as for example:

SE link

Copy this link. Note that the complete link is required; do not use the shared links at the bottom of the questions. Then you may select the menu option “Import SE-Question” on our Physics Overflow site, which is only visible and accessible to dedicated users like administrators or moderators (selectable by the super administrator):

PO menu

Paste the link copied from the SE-site into the appropriate field of the import dialog:

Import SE

Select the desired Physics Overflow category and click the import button. In a little while, the process announces the successful import of the complete thread containing the question and  all answers and comments:

Import SE done

The import is made using the StackExchange API. This API implements throttles, which reduce the number of daily calls to 300 for a single IP, as long as the application has no valid access token. If the application has an access token (obtained via authenticating a user), this number is 10’000 calls per day and per IP. My plugin requires typically two calls for each import (one for the thread and a second for the user data), as long as no more than 30 users have contributed to the question. For every 30 users more, again a call is required (I have found questions with more than 100 contributing users). This means that without an access token, about 150 questions per day may be imported. I have no idea what happens, when this quota is trespassed. The API returns the remaining quota of calls, which is divided by two in our plugin and indicated in the dialog window (see image above). A part of an example import is shown in the next picture:

Attribution1

Attribution

Attribution is regularized in the API terms of use, which point to the Stack Exchange Terms of Service. As far as I understand, we are allowed to copy content from SE-sites, as far as we follow the rules under this last link. My proposition is to put an attribution line under every imported question, answer and comment, that looks like this:

Attribution details

Like this, the SE rules and the rules of the  Creative Commons Attribution Share Alike license should in my opinion be fulfilled. The exact date and time of the import is added, because it is not possible to synchronize edits that are made on SE after the import. So the import is a snapshot of the state at the time indicated by this date/time. The API also provides no way to import the edit history of the questions.

If anybody has more knowledge about attribution to SE, I would be glad to get some feedback. By the way, shouldn’t we also think about terms of use for our site?

Remaining Issues

There are some issues on importing user identities, which I try to explain below. Users are imported exactly the same way as during the migration of the closed SE.TP, with their display name and email hash. The following cases may occur:

  • User no more registered on SE-site. In this case, there exists no link to the user profile on the SE-site. The plugin then allocates the post to a user “UnknownToSE”, which is hidden in the list of users, similar to the voter introduced for the import of SE.TP questions.
  • Collision with an existing user name on Physics Overflow. A user has registered with the same display name on PO as the user to be imported. In this case, the plugin checks the email hashes of both users. In case of a match, the imported user is assigned to the existing user. If the hashes are different, I have not yet a useful solution. Actually, I use again the user “UnknownToSE”, but this is not a good solution. Any ideas?
  • Collision between identical users from different SE-sites. A StackExchange user may post for instance on SE Physics and also on SE Math, but using different email addresses. I have observed that such cases appear quite often. In contrast to user IDs on different sites, the only stable ID is the account ID of a user. Using the StackExchange API, it is possible to find this ID for active SE-users. However, the Area 51 dump did not provide this ID.

Any ideas for the solution of these issues are helpful.

Next Steps

I think it is slowly time to prepare the takeoff of Physics Overflow. In my next post I will make a proposition for this process. I hope Dilaton will have recovered soon and will be on board again. Get well soon!

Approaching the Goal, Technical End of Year Report

Soon, we have end of this year and it is time to give you some information about the technical state of the project. There have been many hours of frustration, but I think, nonetheless, I have some fine results to present. Let me go through the different subject studied or realized:

LaTex in Markdown Editor
The issue of this editor is that markdowns are represented by so called escape sequences, as know from the programming language C or C++. For instance \n is an end-of-line r \t is a tab. In order to be able to represent the backslash character used for these sequences, a double backslash \\ is used. During the transform of an edited text into a html-coded text, the Markdown Editor “eats” one of these backslashes, which was the reason for the issues that Dilaton mentioned already in this blog. I have found a way of preprocessing the edited text by replacing Latex sequences by tags and inserting these blocks at the end by a postprocessing. This worked fine for the live preview, but by unknown reasons not for the real posting. After some days of frustration and anger, I gave up.

Dummy Voter
I was not happy with the solution of thousands of dummy voters, escaping at the end of migration, but leaving their votes until a recount showed, that the cat was dead. This is too near to Schrodinger’s cat for me. Now I have replaced this solution by a single voter (do you guess its name?) with increased rights, who inserts all votes from TP and makes itself invisible for everybody after that. Like this, this solution will survive all recounts during the life of the site. For this solution I required one solely line of core hack in Q2As code (this is the first time up to now).

New editor, LaTex enabled
The WYSIWYG editor delivered with Q2A is based on CKEditor. Unfortunately, the implemented version of CKEditor does not support LaTex. In the meantime, there exists a new version of this editor, which supports LaTex, but it was not possible to introduce it into WYSIWYG. Therefore I have developed a new Q2A plugin, which supports this new version. This new plugin supports many nice features, as can be seen in the following screen shot:

editor_capabilities

LaTex code may be inserted using a special window, where the LaTex code can be written (without $ or $$) using a live preview either as block or inlined:

latex_editor       latex_editor_inline

A source editor enables the user to insert any desired html-code:

editor_source

I am sure you will like this new editor plugin!

Regaining Accounts for former TP users
This issue is solved! I have written a plugin that allows former users of SE.TP to regain their account in the migrated Q2A site. I have added a login link into the side panel near the attribution message:

regain_side_panel

The user sees then a login page with a short explanation of the login procedure:

regain_login_form

If he gives the correct credentials as he used on the former SE.TP site (checked using MD5 encryption of his email address), he is logged in as a normal Q2A user:

regain_logged

He is then enabled to change his account data (email address, password, etc.) and may access his old posts.

That’s it for this year!

I wish Merry Christmas to all of you and a happy new year with a successful start of this new site!

Migration of SE.TP to Q2A, part II

As always in IT-projects, the devil is in the details. Looking more deeply in the result of the first migration results, I found some issues that had to be corrected. Mainly the issues around the curious LaTex phrases I mentioned in the last blog have been a pain to solve. Mainly the unwanted changes in character sets during the whole process (reading xml, treating text in php, transfer to database, compilation to html-pages in Q2A) and the treating of html-tags have been awkward. However, we have now a local site with all posts and votings (still useing the dummy user hack) of SE.TP migrated to Q2A, as shown in the following example:

Q2Aexample

Note that two subcategories SE.TP and SE.TP.Meta have been created. As an example, how attribution (at least for SE.TP, but not yet for SE.Physics) could look like, see the text at the right side panel. LaTex look now fine in all posts. Also Meta has now been included, as shown in the post from Shog9 with with its far-reaching consequences (the SE.TP site has been closed a short time later):

SE

I’d like to discuss some spots on the continuation of this project:

Attribution

We have now two subcategories SE.TP and SE.TP.Meta from the closed SE beta page on Area 51 Stack Exchange. A simple method to handle attribution for these posts would be a text as in the side bar as in the sample above and to prevent users to insert additional posts in these two categories. Binding attributions to single posts would require a core hack in the Q2A-code and take much time and risk. Actually I do not yet know how imported posts from the running SE.Physics site could be handled.

History

The SE.TP dump contains an additional file called history.xml. It contains change history with the following type-ids included

  1. Edit Title – A question’s title has been changed.
  2. Edit Body – A post’s body has been changed, the raw text is stored here as markdown.
  3. Edit Tags – A question’s tags have been changed.
  4. Rollback Title – A question’s title has reverted to a previous version.
  5. Rollback Body – A post’s body has reverted to a previous version – the raw text is stored here.
  6. Rollback Tags – A question’s tags have reverted to a previous version.
  7. Post Closed – A post was voted to be closed.
  8. Post Reopened – A post was voted to be reopened.
  9. Post Deleted – A post was voted to be removed.
  10. Post Undeleted – A post was voted to be restored.
  11. Post Locked – A post was locked by a moderator.
  12. Post Unlocked – A post was unlocked by a moderator.
  13. Community Owned – A post has become community owned.
  14. Post Migrated – A post was migrated.
  15. Question Merged – A question has had another, deleted question merged into itself.
  16. Question Protected – A question was protected by a moderator.
  17. Question Unprotected – A question was unprotected by a moderator.
  18. Post Disassociated – An admin removes the OwnerUserId from a post.
  19. Question Unmerged – A previously merged question has had its answers and votes restored.

In my opinion it could be very complicated to write a program flow that inserts all these changes into the actual site. What is your opinion, is that really required? It may delay the start of the new site for a long time if we would implement all these tags. To give you an idea: Only the simple migration of the actual local site required more than 600 lines of code. Does anybody have an idea, if the dumped posts from SE contain the posts before or after these changes?

Account recovery

As proposed by Dimension10, the easiest way to recover the accounts of former users would be the check with the existing hashes with MD5 encryption. I will write a plug-in, where the user may enter his email address and password as he used in SE. Both will then be checked against the MD5 hashes in the SE.TP dump. If successful, their account will be automatically restored. Writing such a plugin should be quite easy (except the devil …).

Migration of posts from SE.Physics

I will study a direct access using Stack.PHP and Stack Exchange API to SE.Physics . If the primary keys of the database (user ID and Post Id) are available, it should also be possible to write a plug-in, that enables users to insert single posts directly from SE.Physics. However, the issue of attribution is not yet solved. The community should also discuss, who shall have the right to migrate such questions.

Migration of SE.TP to Q2A, first results

Hi all. I am the “nice, friendly, and very competent informatics expert” (thanks to Dilaton!) that supported the installation of the Q2A test site on the laptop of Dilaton. I have been made an author of this blog by Dilaton so that I may report here about my progress directly to this community. I will support Dilaton and this community for the technical setup of the new site. Actually, I am writing a php-script for the migration of SE-dumps into Q2A and got the following results:

Good news first: It was possible to migrate users, questions, answers and comments properly into the database of Q2A. The posts look fine, beside of some curious LaTex phrases (identical to the original text in the dump) that I do not yet understand (although I know LaTex very well). I will care later about that. The posts are searchable; the database seems to be build completely correct. Also the statistics (number of questions and answers and the related points for them) for each user is correct.

User emails and passwords in the SE-dump are encrypted. We will have to setup a process, which allows all old users to reclaim their original posts, while hackers and trolls remain outside. This will not be very simple. Maybe, the “forgotten password” utility will provide a means for that, but then, the correct email addresses of the users will be required. Maybe, somebody of you has an idea?

There is an issue in the introduction of the original SE.TP votes. Q2A stores votes related to the posts (questions and answers) and additionally to the users that gave the vote. It does not allow a user to vote more than once for a post. The relation of the votes to the users is only visible to the administrator. However, the SE.TP dump does not provide this relation. There is only a relation between votes and posts, the voting user is unknown. I order to insert these votes, I had to create several thousand dummy users, each one voting only for one post, and at the end to delete all these users. After that, the corresponding points for the users and their statistic are properly set, as long as the admin does not recount the posts in the database. However, this recalculation could be required sometimes to keep the database free of corruptions. Maybe I should provide a script to the admin that resets these old votes after such a recount.

As a next step I will care about the issues in the compilation of LaTex within the markdown editor. I hope that is not required to write a new markdown editor 😉