Schrodinger's Cat Strikes Back

Home » Technical issues » Migration of SE.TP to Q2A, first results

Migration of SE.TP to Q2A, first results

Hi all. I am the “nice, friendly, and very competent informatics expert” (thanks to Dilaton!) that supported the installation of the Q2A test site on the laptop of Dilaton. I have been made an author of this blog by Dilaton so that I may report here about my progress directly to this community. I will support Dilaton and this community for the technical setup of the new site. Actually, I am writing a php-script for the migration of SE-dumps into Q2A and got the following results:

Good news first: It was possible to migrate users, questions, answers and comments properly into the database of Q2A. The posts look fine, beside of some curious LaTex phrases (identical to the original text in the dump) that I do not yet understand (although I know LaTex very well). I will care later about that. The posts are searchable; the database seems to be build completely correct. Also the statistics (number of questions and answers and the related points for them) for each user is correct.

User emails and passwords in the SE-dump are encrypted. We will have to setup a process, which allows all old users to reclaim their original posts, while hackers and trolls remain outside. This will not be very simple. Maybe, the “forgotten password” utility will provide a means for that, but then, the correct email addresses of the users will be required. Maybe, somebody of you has an idea?

There is an issue in the introduction of the original SE.TP votes. Q2A stores votes related to the posts (questions and answers) and additionally to the users that gave the vote. It does not allow a user to vote more than once for a post. The relation of the votes to the users is only visible to the administrator. However, the SE.TP dump does not provide this relation. There is only a relation between votes and posts, the voting user is unknown. I order to insert these votes, I had to create several thousand dummy users, each one voting only for one post, and at the end to delete all these users. After that, the corresponding points for the users and their statistic are properly set, as long as the admin does not recount the posts in the database. However, this recalculation could be required sometimes to keep the database free of corruptions. Maybe I should provide a script to the admin that resets these old votes after such a recount.

As a next step I will care about the issues in the compilation of LaTex within the markdown editor. I hope that is not required to write a new markdown editor 😉

 

Advertisements

16 Comments

  1. Hi, thanks a lot for looking into these issues.

    Maybe I’m being totally stupid, but for retrieving the correct emails, can’t we just ask the TP.SE users to submit their emails, then encrypt the email into MD5, and check against the email hash database? If the hash is found, their login details can be e-mailed to them? .

    Thanks again.

    • polarkernel says:

      Hi Dimension10
      Your proposal isn’t stupid at all. If SE used MD5 for hashing, this would be a secure way to verify the old email addresses. I didn’t search for the encryption algorithm of SE. It would be inpractical, if users have migrated their email since the end of SE.TP.

      • Yes, it *is* MD5 encrypted.

        I tried encrypting my own email address with an MD5 encryptor, and got this “dc183fa5ee5f0d66aa6f1797d6992c9f”. This is exactly what is found in the Phys.SEs data dump.

  2. By the way, what does the script for which you created the thousands of “dummy users” do? Does it prevent voting by the existing users on the imported questions?

    • polarkernel says:

      No. Because a user has only one vote for a post, i had to simulate several users voting for the same post. There is no information available about which user voted for a given post. It is not possible to prevent existing users to vote again for the imported posts.

  3. Dilaton says:

    Hi polarkernel welcome here 🙂

    Thanks for this nice post containing the good news and for your continuous vigorous support !

  4. Dilaton says:

    I guess to allow former TP users to reclaim their posts (I hope at least some will come to the new site), we could build some kind of a work around if everything else fails:

    My idea is to simple set up a Question in the Meta category, inviting former TP users (one per answer) to ask us to merge their new accounts (of which they have the loggin data) with the old one.

  5. Dilaton says:

    Concerning the procedure of importing questions from SE data dumps, I have a general question:

    You once said that this can only be done just after installing the Q2A site, before anything else is customised etc … Is this still true?

    It would be nice to be able to import new questions, from newer Physics SE data dumps for example, even while the site is already running online later…

    • polarkernel says:

      Hi Dilaton
      This is no more required. The script allows (with minor changes in the configuration) to import new questions from other SE data dumps at any time, also if the site is already running. The issue is that, if using other dumps, the corresponding users could have other IDs in the dump than in the area51 dump. You would have to identify users that are already registered in the SE.TP dump and users, that have to be created. This makes it somehow complicated.

  6. Dilaton says:

    I am not sure if I understand the issue concerning the votes of the TP questions correctly …

    Do you say that as things are at present, all the rep and votes the TP users have obtained in their previous life would be lost as soon as we have to recount the posts for some reasons? That would be bad for them indeed, at least without some efficient workaround such as the script that resets the votes as you suggest :-/

    About relating the votes to the users that gave them, I am not sure how important this is, what do others think?

    As we are trying to build a serious and reasonable community (where fun and jokes are nevertheless allowed and appreciated of course), my hope is that there will be less needs to deal with voting frauds, serial votes, or generally trolls and rascals playing havoc with with voting etc …

    But maybe I am to naive, and at least to deal with such things the possibility to see who voted how on what (at least for the trusted mods / admins) would be needed.

  7. Mitchell Porter says:

    Thanks for helping this heroic effort led by Dilaton. 🙂

  8. […] As already explained, the early data base of the new physics site will contain all of the Theoretical Physics SE questions, plus selected questions from elsewhere (mostly Physics SE I guess for now). There are some attribution issues we have to respect (some boring tasks can hopefully be automated), and importing Theoretical Physics questions works already reasonably well […]

  9. […] Polarkernel pointed out  here, there are some issues concerning the correct installation of the votes for from TP.SE imported […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: