Schrodinger's Cat Strikes Back

Home » Legal issues » About importing data dumps, and attribution (the script that AWB runs on may help us solve a couple of things)

About importing data dumps, and attribution (the script that AWB runs on may help us solve a couple of things)


(Note to users who are a bit confused: Dilaton has just made me an author on this blog, which explains how I am posting this)

To have a good base of posts, we will likelily be importing two sets of data dumps from Stack Exchange:

  1. The entire Theoretical Physics Stack Exchange (TP.SE) data dump
  2. Part of the Physics Stack Exchange (Phys.SE) data dump

The first is very easy, in that we don’t need to worry much about anything. As  mentioned here and here, one simply needs to add a link to the closed SE beta page on Area 51 Stack Exchange (/A51.SE).

The TP.SE proposal

Also, there are only 413 questions from TP.SE, whereas I have estimated that there are around 4335 possibly interesting questions from Physics.SE.

To attribute posts from Physics.SE, however, we need to provide:

  • The link to the question on Physics.SE
  • The link to the OP’s user profile on Physics.SE
  • A note indicating that the question is from Physics.SE

At least that’s how we are told to interpret CC-by-SA.

I’d also think that (not for legal-reasons, but to just attribute the effort by editors) it is important to attribute the editors, but not give links to their profile.

Now, I made query using Data Stack Exchange (Data.SE), and there happen to be 12054 interesting questions.

However, luckily for us, there is a lot of double – counting, so using an existing query, it appears that each question has approximately an average of 2.78 tags, so I estimate that we will be importing around 4336 interesting questions from Phys.SE.

Now, I think it is totally impractical for us to go about manually tagging all the posts with the right attributions.

So, we’d need a script to help us tag all imported questions as such.

I think that the next important post would be to discuss the different settings to enable on the Admin Dashboard of the site.



  1. Funny, someone viewed this blog by searching for “important features in a toy”. : )

  2. Dilaton says:

    Congratulations to this nice post, I like this 🙂

    Your successful fight with the crackpot made me LOL, and the term poophead is very funny, I’ll probably have to quote it too from time to time 😀

    The illustration of how automating certain tasks by the AWB script is nicely illustrative, I am sure polarkernel will be able to develop a similar solution for our purpose of attribution, if such a function to process the data base does not already exist.

    Editors can do very helpful work too I agree, but maybe if we successfully install the links to the original Physics SE question, it will be enough that with this people will be able to follow these links and appreciate their edits directly at Physics SE to honor them?

    Anyway, this post is by no means wasted but very funny to read instead 😀


  3. […] for now). There are some attribution issues we have to respect (some boring tasks can hopefully be automated), and importing Theoretical Physics questions works already reasonably […]

  4. Dilaton says:

    This might be some kind of important for us too …:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: