Hey, I haven't blogged in forever because I didn't want to blog until Pokemon Twilight V5 was ready, but Game Fortress has delayed it so long that I'm going to blog anyway.
So I've been making some good progress on my website stuff since my last blog, I reached version 1.0 of the Pyco Shoutbox ^_^That makes it my first project to reach 1.0 in a very long time. For comparison, PycoForums is only at 0.49 though that may not be an accurate representation of the work done on it due to how complex it is and how many changes I keep making to older systems.I also recently began work on PycoWiki, but all I have done so far is planning out some of the mysql databases and doing some research on how the wikipedia software works. I still have plans for an awesome game related wiki, but right now the forum system is still my main focus.Something really funny I found out is that the wikia software apparently saves a full text version of every edit of a page. As far as i can tell, it never deletes any of those either. I went back on one page and it had at least 500 fully saved revisions (probably every single revision ever made, but I didn't look that far back). I can't imagine how much memory it takes up to have thousands upon thousands of copies of each page all stored in the same database. I wish I could afford awesome servers like that =pThe important part:Anyway, I'm blogging because I need some more ideas for small website features like PycoShout. I need some smaller projects to work on when I get bored of the forums. Right now my only other small feature in development is a random name generator which I will put on the main Pyco site. I guess you could also consider my chatbot to be one of these features, but it could almost be considered part of the shoutbox.So if anybody has ideas for random standalone website features I could create that would be awesome, the only limitation is that they can't be any form of file sharing because that goes against my hosts rules.Hopefully you guys have some ideas, I'm in desperate need of ad money =p
Think about it, though. Plain text? Even huge articles take up tiny amounts of space. Complex algorithms run thousands of times a day? Now THAT'll impact your server, and on a site as popular as wikipedia you need every CPU cycle you can get.
It makes sense to store every article as plaint text.
Even if every article was 500 Kb, and you saved 500 versions for each, it would take 'only' 700 Tb to store 3million articles. That's no small amount, but considering you can go and buy right now a 2 Tb drive for your desktop, that size is probably tiny in the datacenter world.I don't know if that made sense, but at I liked doing the math :DWell I only went back 500, in reality they save every single revision so the numbers would probably be much higher. Of course that only applies to commonly used pages, obscure pages would have much lower revision counts.
Theres 6806 revisions of the cheese page, and 19959 revisions for the hitler page. I dunno, it just seems like a lot of useless data, what the page looked like 8 years ago really doesn't matter.Did the survey.
It was deep and meaningfulSurvey is down for maintenance.
It's back now
edit: I reached my goal for amount of answers so I took down the link, don't want the information to change after I use it in my paperYou don't need complex algorithms to do such things.
A very easy way to do this:- Split each page in pieces. - Only update (and thus backup) pieces which were changed by the edit.Nothing complex about that.Ah, but bryan, you could save a lot more space with a lookup table, which is indeed getting more complex.