Schrodinger's Cat Strikes Back

Home » Technical issues » Migration of SE.TP to Q2A, part II

Migration of SE.TP to Q2A, part II

As always in IT-projects, the devil is in the details. Looking more deeply in the result of the first migration results, I found some issues that had to be corrected. Mainly the issues around the curious LaTex phrases I mentioned in the last blog have been a pain to solve. Mainly the unwanted changes in character sets during the whole process (reading xml, treating text in php, transfer to database, compilation to html-pages in Q2A) and the treating of html-tags have been awkward. However, we have now a local site with all posts and votings (still useing the dummy user hack) of SE.TP migrated to Q2A, as shown in the following example:

Q2Aexample

Note that two subcategories SE.TP and SE.TP.Meta have been created. As an example, how attribution (at least for SE.TP, but not yet for SE.Physics) could look like, see the text at the right side panel. LaTex look now fine in all posts. Also Meta has now been included, as shown in the post from Shog9 with with its far-reaching consequences (the SE.TP site has been closed a short time later):

SE

I’d like to discuss some spots on the continuation of this project:

Attribution

We have now two subcategories SE.TP and SE.TP.Meta from the closed SE beta page on Area 51 Stack Exchange. A simple method to handle attribution for these posts would be a text as in the side bar as in the sample above and to prevent users to insert additional posts in these two categories. Binding attributions to single posts would require a core hack in the Q2A-code and take much time and risk. Actually I do not yet know how imported posts from the running SE.Physics site could be handled.

History

The SE.TP dump contains an additional file called history.xml. It contains change history with the following type-ids included

  1. Edit Title – A question’s title has been changed.
  2. Edit Body – A post’s body has been changed, the raw text is stored here as markdown.
  3. Edit Tags – A question’s tags have been changed.
  4. Rollback Title – A question’s title has reverted to a previous version.
  5. Rollback Body – A post’s body has reverted to a previous version – the raw text is stored here.
  6. Rollback Tags – A question’s tags have reverted to a previous version.
  7. Post Closed – A post was voted to be closed.
  8. Post Reopened – A post was voted to be reopened.
  9. Post Deleted – A post was voted to be removed.
  10. Post Undeleted – A post was voted to be restored.
  11. Post Locked – A post was locked by a moderator.
  12. Post Unlocked – A post was unlocked by a moderator.
  13. Community Owned – A post has become community owned.
  14. Post Migrated – A post was migrated.
  15. Question Merged – A question has had another, deleted question merged into itself.
  16. Question Protected – A question was protected by a moderator.
  17. Question Unprotected – A question was unprotected by a moderator.
  18. Post Disassociated – An admin removes the OwnerUserId from a post.
  19. Question Unmerged – A previously merged question has had its answers and votes restored.

In my opinion it could be very complicated to write a program flow that inserts all these changes into the actual site. What is your opinion, is that really required? It may delay the start of the new site for a long time if we would implement all these tags. To give you an idea: Only the simple migration of the actual local site required more than 600 lines of code. Does anybody have an idea, if the dumped posts from SE contain the posts before or after these changes?

Account recovery

As proposed by Dimension10, the easiest way to recover the accounts of former users would be the check with the existing hashes with MD5 encryption. I will write a plug-in, where the user may enter his email address and password as he used in SE. Both will then be checked against the MD5 hashes in the SE.TP dump. If successful, their account will be automatically restored. Writing such a plugin should be quite easy (except the devil …).

Migration of posts from SE.Physics

I will study a direct access using Stack.PHP and Stack Exchange API to SE.Physics . If the primary keys of the database (user ID and Post Id) are available, it should also be possible to write a plug-in, that enables users to insert single posts directly from SE.Physics. However, the issue of attribution is not yet solved. The community should also discuss, who shall have the right to migrate such questions.

Advertisements

14 Comments

  1. Firstly., Thanks a *lot* for all your help.

    Now, about the post history, I don’t think I understand what you mean…

    Do you mean that we should discard the post history? That isn’t Really possible,
    The post history would be required for a few reasons:
    – For attributing (even though not required legally) editors,
    – To make sense of comments which comment on an earlier revision, and
    – To allow one to reverse some non-constructive edits made on SE.
    But maybe somethings, like rollbacking etc., could be possibly simplified by omitting edits that were rollbacked… ?

    Or do you mean that we don’t need to preserve the “type” of edit? That is ok, in my opinion, and I agree with it.

    • Dilaton says:

      For TP questions, the Area51 link to the closed site is enough, such that linking to it in the side bar should be already fine.

      Concerning the attribution of edits generally, they were recently discussing this on MSO:

      http://meta.stackoverflow.com/q/208904/184300

    • polarkernel says:

      Maybe there was a misunderstanding. Naturally, post history is an important requirement and it is easily possible to realize it using the Q2A edit history plug-in for all new questions. I was just in doubt if the post history of the closed site is a requirement. If we see in the actual migrated site already the final posts after all corrections, which have been made by the authors, are you really interested in the history of these changes? Well, understanding comments on earlier versions is really an argument.

      Well if you find that having this history is a requirement, I will realize its migration, as far as Q2A allows it. However, I wanted to warn you that this will take a considerable time for the development. This also, because the representation of history in the Q2A database by the Q2A edit history plug-in will require reverse engineering, there exists no documentation on that subject.

      What do you think about “freezing” the imported categories SE.TP and SE.TP.Meta?

      • Dilaton says:

        I think the edit history of the TP question is dispensible (at least at the moment) because too time consuming to make it work. And for the attribution of the TP questions it is not needed as far as I understand it.

        I agree that no new questions should be posted in the SE.TP and SE.TP.Meta categories. We will have our own Meta / Main + subcategories of these two system. But I am not sure if the questions in SE.TP and SE.TP.Meta should be some kind of locked too ? People might want to post new comments and answers to these questions when we are online, so I would rathre not lock them but make clear that SE.TP and SE.TP.Meta should NOT be used to categorize new questions …
        And if hopefully some nice TP users come to our site and reclaim their posts, they should have the the full freedom to further edit them etc .

      • Ok, I see. I agree with you. There was likelily very little external SE influence or irritating stuff going on at TP.SE, anyway, and the users there would be intelligent enough to not require anyone to edit their posts *much*…

        But they would be required for the PhysSEs questions, of course.

        But I don’t understand why they should be frozen? Isn’t it better to let people answer and comment those questions too? Or maybe, Is there a difficulty in adding input forms on the TP.SE and TP.SE meta questions?

        I think that also, the “SE.TP” and “SE.P” and “…” categories should only be temporary, as users with the privilege should be able to recategorise them over time.

        • Dilaton says:

          My thought is too that the TP questions (Main and Meta) should be allowed to attract new answers and comments.

          As you can see from the figures, keeping the two special categories helps with doing the attribution for these questions. Out of my head I am not sure, if a question can have more than one category in Q2A (I thought rather not). Otherwise we could add our conventional Main/Theoretical Physics subcategory too, if needed …

          • polarkernel says:

            Q2A does definitely not allow two categories for a post, unfortunately. The category ID is a single integer in a posts database table. 😦

            • Dilaton says:

              It is ok, I basically see no harm in keeping the special category. We can still tag them with the normal (theoretical physics) tags we use for questions asked directly on the new site too, such that they should appear in the list of related questions (a widget) too for example, etc …

        • Dilaton says:

          Off topic but nice:

          This question has now a nice answer from Lumo 🙂

          http://physics.stackexchange.com/q/88053

        • polarkernel says:

          OK, I got your idea. My only concern was attribution. If a SE.TP post will be recategorized by the original user, the simple attribution text in the side panel will be no more valid, but this will not care this user. In this case, SE would be no more attributed for this post, but this should not be severe in my opinion. If new questions or answers are posted in SE.TP and SE.TP.Meta categories, the attribution text is again no more valid, but who cares?

  2. Dilaton says:

    Thanks for this very nice and encouraging post :-)))

    Does LaTex now work bug-free?

    In my opinion, the issue of the history files could be put aside a bit. Some actions that could be applied to a question on SE, such as for example locking, are not even possible with standard Q2A (not sure if these things could be enabled by plugins)…
    However, maybe it would be possible to make the Q2A edit history plugin work bug free insteed

    https://tpproposal.wordpress.com/2013/10/26/what-plugins-do-we-need/

    such that the edit history would work correctly for new posts?

    As I understand it, the attribution for the TP questions looks perfect, they explicitely told me on MSO that for closed sites only the link to Area51 is needed and nobody objected …

    Migration of questions from Physics SE could maybe be decided in an appropriate “Request for migration votes” meta post too …

    • polarkernel says:

      As far as I have tested it, LaTex works fine now within the migrated posts. I did not yet study its behavior in the Q2A edit history plug-in, this is one of my next goals. Unfortunately, there exists no documentation about the insides of this plug-in.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: