Schrodinger's Cat Strikes Back

Home » Technical issues

Category Archives: Technical issues

Hosting PhysicsOverflow at Bielefeld University

Thanks to Christian Pietsch http://www.ub.uni-bielefeld.de/~cpietsch/ from the library of Bielefeld University, who is also the founder of the OpenScience Q&A community https://openscience.uni-bielefeld.de/ (restarted outside the SE network after an unsuccessful SE beta, using support from our side) we have obtained the offer to move PhysicsOverflow to a server of the library of Bielefeld University.

The moderators agreed to accept the offer; thus in the near future, PhysicsOverflow will be migrated to a server at Bielefeld University. To avoid any tiresome administrative procedures or even obstacles, this will at present not encompass an official endorsement of PhysicsOverflow by Bielefeld University. In the long run, obtaining the latter should probably be tried at some point in the future.

Moving to the university library of Bielefeld will have several advantages:

  1. It will do away with the technical issues and glitches caused by our current hosting provider One.com that polarkernel had to deal with without being given proper access to the server.
  2. It will provide direct and personal technical support as well as root access for polarkernel to the new server. This will ease debugging in case of future technical problems with the server.
  3. It will ensure additional support for keeping the PO-server running. As Christian Pietsch is the administrator of the new server, he will be able and willing to help if needed to ensure that the PO-server runs smoothly, for example in case (as last year) polarkernel is on holiday without a good internet connection.
  4. The move will constitute a big step forward to ensure a long time perspective for PhysicsOverflow and towards solving our long-standing issue that there is only a single person who can technically ensure the smooth running of the site.

Generally, users should not be negatively affected by the upcoming migration. Users should hardly notice anything special during the migration – certainly less than in the past during problems with the current provider. In the worst case there will be a break of at most 24 hours until the new domain address has propagated to all domain name servers in the world. However, our domain name “physicsoverflow.org” will not get changed by this migration.

In case of such a break, will inform here in this blog about the current status and what is actually going on.

This post is a slightly adapted version of the PO Meta Announcement

https://physicsoverflow.org/38745/hosting-physicsoverflow-at-bielefeld-university

Edit May 15, 2017 by polarkernel:

Christian Pietsch and me have finally prepared all aspects of the announced migration of PhysicsOverflow to the University of Bielefeld. It will take place at Wednesday, 17. May 2017, starting at about 07:30 UTC. Please save all your drafts before this time. During this migration, we have to shut down the server, move the content to the new server and also to move the domain name to another registrar. This process may take up to 24 hours for users, that are far away from Germany, given by the time required for the new IP to be propagated to all DNS servers of the world. We will keep you informed about the status of the migration below this edit.

After the migration, Christian Pietsch will continue to keep the server running in case of any issues and will replace me for this task when I am absent. Therefore I increased his level on PhysicsOverflow to SuperAdministrator. He will also be our advisor for questions around our webserver. It is a great progress for me to be released from this responsibility during my vacations and I like to thank Christian for his commitment. Naturally I will continue to take care for the code and its future development. However, on the long run, we will need a replacement also for this job.

Status of the migration:

May 17, 07:30 UTC: Planned start of the migration.
May 17, 07:40 UTC: Maintenance mode on, backup and migration of the database started.
May 17, 08:05 UTC: Database successfully moved. Starting domain migration.
May 17, 08:45 UTC: Got certificate, moved to Christian Pietsch who will install it.
May 17, 09:15 UTC: Email accounts generated, but not yet activated.
May 17, 09:35 UTC: Certificate is installed. Domain transfer activated.
May 17, 10:00 UTC: Still waiting for domain transfer confirmation.
May 17, 10:04 UTC: Got confirmation. Propagation of new IP to all DNS-servers started.
May 17, 10:30 UTC: The domain host provider did still not yet connect the domain.
May 17, 12:10 UTC: Much later than expected, the new IP starts to propagate now.
May 17, 12:30 UTC: First contact to the new server: It’s done!!

A long step towards the adult PhysicsOverflow

Today, the new category system, which allows an unrestricted number of category-levels, has been installed on PhysicsOverflow. This system was required to enable the categorization of submissions beyond the four levels provided by the actual Question2Answer (Q2A) framework. This new system is almost invisible for the user, but has the important function to make a (future) large number of submissions searchable by the user. About two and a half months have been required to develop this system, you had to wait for a long time. Therefore I like to give you some insights.

The original category system of the Q2A framework is a hierarchical database model. Every node (a category) is linked to its parent by the ID of the parent node. Such a model enables a fast writing of new nodes (you just create the node and link it to its parent), while queries through the tree are usually slow. Q2A solved this issue by a fixed number of 4 category levels. To be able to find parents in a short time, the path to the parent is hard-coded in every post. This means that every post contains four indices storing the way back through the tree. This is a clever and fast solution. However, it can not be extended to an unrestricted number of categories, it even gets slow if extended to 8 levels for instance. The category system is written in the core code of Q2A and spread over a large part of the system.

Another issue arising when we increase the number of category levels is the user interface. Actually, when the user asks a question, he has to select the category for this post. The user interface uses select tags like:

categories_old

It is clear that for instance for eight categories, the place on the page is too small to display all these tags side by side. Therefore also the user interface had to be changed. The new user interface take much less room and looks like this:

categroies_new

For the new category system, I have implemented another database model, called nested set model. It allows for an unrestricted number of category levels. While queries through the categories become very fast, the insertion of nodes is slow, because all indices of the whole tree have to be changed. However, changes on the categories will be used much less frequent than for instance to display the tree. Like this, it is well adapted to our needs.

As already mentioned, the original category system is placed in the core code of Q2A and is active in almost all pages provided by the framework (even in pages I never expected this). I had to replace 9 files completely, all select specifications for database accesses to the category system had to be renewed and, naturally, the code for the nested set model had to be written. To give you an impression on the size of the task, here some numbers: The original Question2Answer framework consists of about 36’000 lines of code. Until today, I have written 18’100 new lines of code for PhysicsOverflow in form of plugins, layers, overrides and changes in the core code, about half of the size of the system. The new system required 5’800 additional lines of code, which explains, why it took that long to realize it. I hope now that I was able to test all use-cases so that there remain eventually only minor bugs.

The next development step will be a surprise, stay tuned!

polarkernel

PhysicsOverflow is Living and Animated

Wondering what happened on PhysicsOverflow while I have developed the preliminary phase of PhysicsOverflows reviews section, Reviews I, I have looked on the host providers statistic and have done some queries in the event log of our database. Here are some numbers illustrating the pleasant activity on our site:

Number of visits in May 2014 (provider’s statistic):

visits_may_2014

As you can see, the number is slightly growing. The number of visits has been above 300 all the time approaching now 400 by end of the month. Maybe somebody has an explanation for the peak after the second weekend?

In the event log table of the database, 14’482 events have been logged in May 2014. Some interesting numbers are

  • 4126 times, a user has logged in.
  • 200 questions have been written.
  • 353 answers have been posted.
  • 1002 comments have been added.
  • 4995 upvotes have been given.
  • 86 new users have registered on the site.

I think this is really not bad for the second month after publishing the site. Maybe we will already have quite a number of submissions in the reviews section by end of next month? Interested? Submit a paper to PhysicsOverflow!

We Have Liftoff!

Liftoff

PhysicsOverflow public beta is online now since 17 days. After a turbulent start with some database issues on our host and spam attacks, now the site is stable and working fine. After an ongoing and continuous improvement of details, the section with Q&A on physics takes now the main part of activity on the site. Since we have gone online at 4th April, more than 150 new users have registered on the site and many of them are already quite active. The number of visits per day exceeded the number of visits on Theoretic Physics on SE since the first day, as can be seen in the following graphics:

Visits_April_2014

This is great, since we aren’t even part of a huge network! Our questions per day is about 5.6 (excluding imported posts), which is a lot more than TP.SEs. However, as already stated, the site does not depend on these figures, there is no deadline, as it was on TP. Also the term beta does not mean that the site will go away, it means only that we have still much more ideas to obtain the full-fledged version of PhysicsOverflow.

Please Contribute

Now is the time where the site gets shaped. If you participate now with your votes, ideas, opinions, questions and answers, you contribute to build a site with contours as you like them. Don’t stay outside, have a look at PhysicsOverflow, register there if you like it, or contribute here on this blog.

Review Section

We are still a small team. However, we are working with full power to be able to leave the beta state and to complete the site, as intended. The main part for this will be the Review Section, as already announced in this blog. This is not only a reconfiguration of the site, it requires a considerable part of new development, because such a feature is not foreseen in the Question2Answer framework. In detail, the following main functionalities have to be developed:

  • Integration of new pages, called Submission and Review.
  • Two voting criterias, one for originality and one for accuracy.
  • New page design, enabling display of a score value from the votings.
  • Add feature to add multiple authors.
  • Redesign of voting mechanism, distributing votes to multiple authors.
  • Integration in the rep update and recount system.
  • Adding and managing the required database tables.
  • Increasing the category depth to realize hierarchical tagging system.
  • Integration of score as new sorting criterion.
  • Software for mass import from ArXiV.
  • Software for daily import from ArXiV.

We are on the way with all that and look forward to realize these steps within a reasonable time. Stay tuned!

 

Technical problems with the mySQL data base server :-(

Dear PhysicsOverflow community,

today we seem to face (for the first time within a time period of about  two months) some problems with the mySQL data base server on the host side. This lead to a short down time of PhysicsOverflow of about 10 min some hours ago, and to a new longer down time of an hour just now. This is very annoying, in particular as we have just started our public beta … :-/

So we apologize for this inconvenience, but all we could to is complaining to the administrators of our hosting provider, which is what Polarkernel did:

 

Hi
There seem to bee problems with the database on our host. Sorry, but I can’t change it. See here the transcript of my contact with our host provider (translated by Google):
Now talk to ‘Elodie’
Elodie: Hi – how can I help you?
You: I got today on my site “www.physicsoverflow.org” for the second time a “database connect error”. Can you figure out what’s going on?
Elodie: Yes, actually, there are some difficulties with the MySQL Server today.
Elodie: The technicians are already aware and working on it.
You: OK, I hope the issue can be resolved soon. I have opened my site tonight and it’s a shame, especially when such errors occur at the beginning.
Elodie: I see. I do not think it takes a long time. But at present we have not yet received any specific information from the technicians. Except to the confirmation that they know it and work on it.
You: OK, then I’ll wait. Thank you!
Elodie: No cause 🙂
polarkernel
Now PhysicsOverflow seems to work again for now
but new down times can not be excluded until the technicians of our host state that they have successfully resolved the issue …
We will report here as well as on the site itself as soon as everything is ok again.
But lets not hope that the worst is over now, I am so excited and happy about all the great nice wise people joining in so far 🙂
Cheers

Fine Tuning

I know, everybody is impatient to see our site running and I apologize the delay. During several tests I found some issues to be corrected before new questions are entered in our database. However, in my opinion we will be faster like this, because otherwise we would have risked to have an inconstistent database or even to lose questions on repairing it. Preparing the test technical beta phase turned out to be more complicated than we thought. Additionally, Dilaton has been severely handicaped by an extremely poor internet connection, which made the communication very difficult. But now, we are very close! I assume we can go online for the test-users during the coming week. Dilaton will advice them in an email, how to connect to the beta site and how to contribute.

These have been the main issues:

  • The way Question2Answer handles access restrictions on plugins is vulnerable. A plugin may be set to be invisible for unauthorized users, but this does not really prevent the access. Therefore I had to add code to realize a strong proof of authority.
  • In the A51 import, several users had multiple (up to 4) accounts with the same username. This is not allowed under Question2Answer. I was not capable to resolve if these users with the same name have been really the same person. As a temporary solution I have manually corrected the database and renamed these users to USER_1, USER_2, … and so on. Actually I am developing a simple plugin “Merge User Account”. This will be very delicate and I am not yet sure if this can be really handled properly. Users with multiple accounts should please contact me directly. However, this plugin will not yet been required for the beta tests and therefore this issue will not slowdown our project.

The contact to register as test-user is still open!

Going Online :-)

As you have already seen, we have a host now with a first short introduction for the new site. Don’t bother about details now (the link is now blue), I had only very few time available and in my ecstasy I wanted to show you something already running. For a computer scientist, an automatic web-editor, as it is available on our host, is sometimes quite confusing. I do not like these automatisms. However, in let’s say two weeks, I will have set up a first version of our new site and will not require such an editor to do that.As a first step I would like to set up a technical beta version with the following purposes:

  • Debugging: I have a more than 40 years of practical experience with IT-projects and I am sure that there will be some bugs in my code. It will be the task of some dedicated test users (you?) to help me finding these bugs.
  • Parametrization: Setting all the parameters of the site by the super-administrator.
  • Completion: Many details are not yet ready: Logo, introducing text, texts in emails, etc. During the technical beta, the site should be completed for takeoff.
  • Organization: During this phase, the first organization should be prepared (administrators, moderators, ?).

During his phase, new registrations will be prohibited, only test users will have access to the site. However, the site will be visible from outside (which could be some advertising?).  I could also place it at a secret place known only to the test users. I could also exclude robots (Google etc.) from the site during this run. Note: There will be a (small) probability that the content of the site at the end of this phase has to be deleted. This could only happen if tables in tha database would have to be altered, so the risk is small.

In order that we are able to organize the technical beta, please contact Dilaton if you are interested to participate. We will contact you with details for the registration, as soon as the site is up.

I will finance the hosting of our site for a while. However, we should also start to think about the responsibility for the site. Actually, I am the only one that is no more anonymous, because according to ICANN I had to provide my complete contact data. So, from a legal point of view, at the moment I am the only one who is responsible for the content of this site. I would not like to stay alone in this position. At least at the end of the beta phase, we should have some regulation (legal notes, about us, etc ?) about this subject. Has somebody alrady thought about that?

 

 

Import of Endangered SE-Questions

As announced in my last post, I like to introduce the prototype of our new Q2A-plugin for the import of endangered SE-questions. For the user it has become the simplest and most comfortable solution I can imagine. Starting point is the link to the question on any SE-site loaded in your browser, as for example:

SE link

Copy this link. Note that the complete link is required; do not use the shared links at the bottom of the questions. Then you may select the menu option “Import SE-Question” on our Physics Overflow site, which is only visible and accessible to dedicated users like administrators or moderators (selectable by the super administrator):

PO menu

Paste the link copied from the SE-site into the appropriate field of the import dialog:

Import SE

Select the desired Physics Overflow category and click the import button. In a little while, the process announces the successful import of the complete thread containing the question and  all answers and comments:

Import SE done

The import is made using the StackExchange API. This API implements throttles, which reduce the number of daily calls to 300 for a single IP, as long as the application has no valid access token. If the application has an access token (obtained via authenticating a user), this number is 10’000 calls per day and per IP. My plugin requires typically two calls for each import (one for the thread and a second for the user data), as long as no more than 30 users have contributed to the question. For every 30 users more, again a call is required (I have found questions with more than 100 contributing users). This means that without an access token, about 150 questions per day may be imported. I have no idea what happens, when this quota is trespassed. The API returns the remaining quota of calls, which is divided by two in our plugin and indicated in the dialog window (see image above). A part of an example import is shown in the next picture:

Attribution1

Attribution

Attribution is regularized in the API terms of use, which point to the Stack Exchange Terms of Service. As far as I understand, we are allowed to copy content from SE-sites, as far as we follow the rules under this last link. My proposition is to put an attribution line under every imported question, answer and comment, that looks like this:

Attribution details

Like this, the SE rules and the rules of the  Creative Commons Attribution Share Alike license should in my opinion be fulfilled. The exact date and time of the import is added, because it is not possible to synchronize edits that are made on SE after the import. So the import is a snapshot of the state at the time indicated by this date/time. The API also provides no way to import the edit history of the questions.

If anybody has more knowledge about attribution to SE, I would be glad to get some feedback. By the way, shouldn’t we also think about terms of use for our site?

Remaining Issues

There are some issues on importing user identities, which I try to explain below. Users are imported exactly the same way as during the migration of the closed SE.TP, with their display name and email hash. The following cases may occur:

  • User no more registered on SE-site. In this case, there exists no link to the user profile on the SE-site. The plugin then allocates the post to a user “UnknownToSE”, which is hidden in the list of users, similar to the voter introduced for the import of SE.TP questions.
  • Collision with an existing user name on Physics Overflow. A user has registered with the same display name on PO as the user to be imported. In this case, the plugin checks the email hashes of both users. In case of a match, the imported user is assigned to the existing user. If the hashes are different, I have not yet a useful solution. Actually, I use again the user “UnknownToSE”, but this is not a good solution. Any ideas?
  • Collision between identical users from different SE-sites. A StackExchange user may post for instance on SE Physics and also on SE Math, but using different email addresses. I have observed that such cases appear quite often. In contrast to user IDs on different sites, the only stable ID is the account ID of a user. Using the StackExchange API, it is possible to find this ID for active SE-users. However, the Area 51 dump did not provide this ID.

Any ideas for the solution of these issues are helpful.

Next Steps

I think it is slowly time to prepare the takeoff of Physics Overflow. In my next post I will make a proposition for this process. I hope Dilaton will have recovered soon and will be on board again. Get well soon!

Admin Dashboard Settings Part 9: Spam settings

The menu Spam in the Admin Dashboard allows to tick certain check marks to avoid PhysicsOverflow to get clogged with spam, noise, or generally trolled by ill meaning folks, which I think is important. Also, from a legal point of view we will be responsible for what gets posted on our site, so there will certainly be a need for people who watch out a bit and help with detecting and removing really bad things …

The first few options deal with settings concerning the registration and confirmation of emails by new users:

  • Request confirmation of user emails: I will have to ask Polarkernel if this is needed for the Regain TP account plugin, currently the check mark is set
  • All new users must confirm their email: Stack Exchange does not have this and I am not sure if we need it either
  • Enable moderation (approval) of users: this seems too restrictive IMHO
  • Email me when a new user registers: I think we don’t need this
  • Temporarily suspend new user registrations: this might be a good option for our technical private beta to come, to prevent too many users to register already. The intention behind limiting registration during this test phase is that we do not want to worry too much about loosing nice serious content while testing. Polarkernel will probably write more about this soon …

The next few options allow to enable CAPTCHAs under different conditions. I personally think that having to deal with CAPTCHAs all the time in cyber space  (to comment in blogs, etc) sucks; but who knows, if bad spam issues should arise, we might use it ? For example if certain “EnergyNumbers” and other “friends” decide to massively spam attack PhysicsOverflow  etc, it might come in handy  😉

  • Use captcha for user registration:
  • Use captcha on reset password form:
  • Use captcha for anonymous posts:
  • Use captcha if email not confirmed:
  • Use captcha on feedback form:
  • Use captcha module: here you can only choose reCAPTCHA whatever this means …

Then you can use moderation for things that might get troublesome (but normally they shouldnt IMHO). I am not sure if we need these, and rather think we should avoid copying the paranoid negative attitude of the Meta Stack Overflow (MSO) crowd who always assumes the worst about the character and intentions of other people, which leads to the well-known large bureaucratic overhead, draconic punishments,  and oppressive overmoderation of most parts of the SE network …

  • Use moderation for anonymous posts:
  • Use moderation if email not confirmed:
  • Use moderation for users with few points:

Next comes a text field, where you can list the IP addresses of trolls that should be banned. Maybe we could gather the IP addresses of people who we do not want to see on PhysicsOverflow by all means in advance, I have several (user) names in mind (joking) … ;-P

Finally you can rate limit many actions, which is good for keeping spam bots out I think. Unfortunately these options are not effective to keep out the spam bots developed by a certain aggressive and destructive System Administrator who tests and trains his malware (which is intended to interrupt any serious theoretical physics discussions everywhere in the internet as soon as certain keywords like SUSY or ST appear) on his personal homepage …  Such spam has still  to be taken care off manually … ;-).

  • Rate limit for user registrations: 5 per IP/hour
  • Rate limit for logging in: 20 per IP/hour
  • Rate limit for asking questions: 20 per IP/hour, 10 per user/hour
  • Rate limit for adding answers: 50 per IP/hour, 25 per user/hour
  • Rate limit for posting comments: 40 per IP/hour,  20 per user/hour
  • Rate limit for voting: 600 per IP/hour, 300 per user/hour
  • Rate limit for flagging posts:  10 per IP/hour, 5 per user/hour
  • Rate limit for uploading files:  20 per IP/hour, 10 per user/hour
  • Rate limit for private and wall messages: 30 per IP/hour, 5 per user/hour
    These limits might need to be set to higher values, if we want to use user walls as chat rooms

Approaching the Goal, Technical End of Year Report

Soon, we have end of this year and it is time to give you some information about the technical state of the project. There have been many hours of frustration, but I think, nonetheless, I have some fine results to present. Let me go through the different subject studied or realized:

LaTex in Markdown Editor
The issue of this editor is that markdowns are represented by so called escape sequences, as know from the programming language C or C++. For instance \n is an end-of-line r \t is a tab. In order to be able to represent the backslash character used for these sequences, a double backslash \\ is used. During the transform of an edited text into a html-coded text, the Markdown Editor “eats” one of these backslashes, which was the reason for the issues that Dilaton mentioned already in this blog. I have found a way of preprocessing the edited text by replacing Latex sequences by tags and inserting these blocks at the end by a postprocessing. This worked fine for the live preview, but by unknown reasons not for the real posting. After some days of frustration and anger, I gave up.

Dummy Voter
I was not happy with the solution of thousands of dummy voters, escaping at the end of migration, but leaving their votes until a recount showed, that the cat was dead. This is too near to Schrodinger’s cat for me. Now I have replaced this solution by a single voter (do you guess its name?) with increased rights, who inserts all votes from TP and makes itself invisible for everybody after that. Like this, this solution will survive all recounts during the life of the site. For this solution I required one solely line of core hack in Q2As code (this is the first time up to now).

New editor, LaTex enabled
The WYSIWYG editor delivered with Q2A is based on CKEditor. Unfortunately, the implemented version of CKEditor does not support LaTex. In the meantime, there exists a new version of this editor, which supports LaTex, but it was not possible to introduce it into WYSIWYG. Therefore I have developed a new Q2A plugin, which supports this new version. This new plugin supports many nice features, as can be seen in the following screen shot:

editor_capabilities

LaTex code may be inserted using a special window, where the LaTex code can be written (without $ or $$) using a live preview either as block or inlined:

latex_editor       latex_editor_inline

A source editor enables the user to insert any desired html-code:

editor_source

I am sure you will like this new editor plugin!

Regaining Accounts for former TP users
This issue is solved! I have written a plugin that allows former users of SE.TP to regain their account in the migrated Q2A site. I have added a login link into the side panel near the attribution message:

regain_side_panel

The user sees then a login page with a short explanation of the login procedure:

regain_login_form

If he gives the correct credentials as he used on the former SE.TP site (checked using MD5 encryption of his email address), he is logged in as a normal Q2A user:

regain_logged

He is then enabled to change his account data (email address, password, etc.) and may access his old posts.

That’s it for this year!

I wish Merry Christmas to all of you and a happy new year with a successful start of this new site!