Monday, August 27, 2007

Armory to Data Miners - It's ok!

I finally got around to looking at the recent batch of armory updates, and noticed this little gem:

XML file size reduction: The XML files used by the Armory have been optimized for speed, partly by streamlining the amount of code in the files. As a result, third-party sites that mine Armory data may need to make adjustments to account for the new file configurations.

There you go, implicit permission from Blizzard to suck down data.

Friday, August 17, 2007

Report - 20070814232234

Eventually there will be 7 reports. The base report consists of all level 70s broken down by class, with the top 20 specs for each class. The remaining 6 reports are further broken down by the following metrics: battlegroup, faction, gender, race, realm, and realm type. There is also an index page as a jumping off point to all reports. Unfortunately at this time battlegroup and realm breakdowns aren't working properly. I need to design custom XSL templates for them as they don't mesh with the other reports as easily. Once those templates are created I'll link to them here. Enjoy!

Thursday, August 16, 2007

Report - Spec Breakdown by Realm Type

Ok, I've got all the characters from my first run loaded with updated armory data. This time I store the raw XML, (and much props to Okoloth for a nice method to merge all armory data into a single XML file) then when I want to run a report at a point in time, I export the information I'm interested in to a database for quick searching. After writing insane scripts to generate HTML from the database last time, I've learned my lesson. Now the data is exported in XML and I'm using XSL to transform the data into tables. The first report I've generated is a breakdown of each class' top 20 specs, by realm type.
For each class you'll see the top 20 specs for RP-PvP, PvE, PvP and RP servers. Unfortunately, in order to make sure the output shows up in all browsers properly, I'm not serving raw XML with an XSL stylesheet directly, but using PHP to load the XML and XSL and output the resulting HTML. The raw XML is available here. You will note that the XML also contains the top 10 builds for each spec. I didn't use that data in the HMTL report because I wanted to keep the tables somewhat simple, and by using XSL I can tweak the stylesheet to include that data once I figure out a good way to represent it. (plus I can make it prettier much more easily, if I ever learn how)

Taking a quick look at the data for Hunters, 41/20/0 dominates both PvE and PvP realms, but interestingly enough, a larger percentage of Hunters with that spec are on PvP servers. (7920 vs. 5938, or 7.861% vs. 5.894%) Also, while the first Survival spec on PvP servers is only ranked #9 (0/21/40), the first PvE Survival spec is at #7. Though to be fair, the percentage of the class is higher in the PvP spec. In fact, it's pretty clear that out of the Hunters represented in this table there are more PvP Hunters than PvE.

Please feel free to talk about this report either here or on our forums.

Wednesday, August 15, 2007

New Build Shop up - Warlock ??/41+/??

WoW Insider has a new Build Shop up, this time focusing on a Demonology spec. This one's a bit open-ended, since the requirement is just 41+ points in demonology. I've updated the existing 70s I have now and fetched their (relatively) current character sheets. Scanning the top 20 Warlock specs on my current report, #5 is 7/43/11, and #6 is 20/41/0. Coming in at #16 is 18/43/0, #18 is 5/45/11, and #20 is 9/41/11.

Obviously, I've got a new report out, but I'm redesiging the report generation to be more general and flexible. The goal here is to support live queries against that database on various criteria. I know for a fact that making this live will kill my server, so I need to figure out a way to ensure that doesn't happen. In any case, you should eventually be able to do top N spec/builds per class, with these additional variables: faction, race, gender, realm, realm type and battlegroup. I think a lot of people will be interested in the realm type breakdown and if there truly is a difference between how PvE and PvP players choose their talents.

Tuesday, August 7, 2007

765k * 5 XML files == 7.5 gigs of data

I'm almost a third of the way through reimporting all the characters from the first run into the new db. At the moment I'm only sucking down and storing the raw xml from armory and haven't done any processing. While that's going on I'm desigining a new database table structure with the concept of tracking characters over time in mind.

In the meantime, does anyone have any pointers to good tutorials on XSL? Outputting raw html is kind of a pain so being able to do XSL transformations on XML output would probably make things a bit simpler.

Friday, August 3, 2007

2nd Fetch update - 416k and counting

After 3 days or so, I've fetched new data on 416,000 of the 2.5 million characters I have so I should have new data by the end of august. I need to figure out the best way to multithread this process so I can grab data faster. As a side note, the XML data for those 416,000 characters is taking up close to 4 gigs of drive space. Coming up with a more efficient storage method is next on the list.

Wednesday, August 1, 2007

What constitutes being a particular spec versus a hybrid?

There seems to be a difference of opinion on this one. Prior to TBC, it was generally agreed that you were "specced" to a particular talent tree if you had at least 31 points in that talent which means just over 50% of your talent points are in a single tree. Now that TBC is out and there are an additional 10 points to allocate, some people say that you're a hybrid unless you have at least 41 points in a tree, which is at least 67%. What do you think? Personally I've stuck with 31 points in a single tree as being specced. I'll run some numbers and update this post shortly.

Update: Here's my quick class breakdown. Each Tree will have two numbers, number of characters that have 31 or more points in that tree, and number of characters that have 41 or more points in that tree. The remaining number are hybrids by that definition.

Total number of 70s: 782,247

Druid: 76371
Balance: 11053 (14.47%) , 9370 (12.27%)
Feral Combat: 44857 (58.74%) , 43897 (57.48%)
Restoration: 20421 (26.74%) , 17934 (23.48%)
Hybrids: 40 (0.05%) , 5170 (6.77%)

Hunters: 96671
Beast Mastery: 26328 (27.23%) , 24733 (25.58%)
Marksmanship: 62945 (65.11%) , 59058 (61.09%)
Survival: 7101 (7.35%) , 3419 (3.54%)
Hybrids: 297 (0.31%) , 9461 (9.79%)

Mages:104760
Arcane: 41297 (39.42%) , 12353 (11.80%)
Fire: 33784 (32.25%) , 28360 (27.07%)
Frost: 29506 (28.17%) , 27602 (26.35%)
Hybrids:173 (0.17%) , 36445 (34.79%)

Paladins:69338
Holy: 50493 (72.82%) , 45571 (65.72%)
Protection: 9991 (14.41%) , 9124 (13.05%)
Retribution: 8526 (12.19%) , 7974 (11.40%)
Hybrids: 328 (0.47%) , 6669 (9.62%)

Priests:91094
Discipline: 10577 (11.61%) , 4728 (5.20%)
Holy: 44361 (48.70%) , 9712 (10.66%)
Shadow: 35781 (39.28%) , 35030 (38.45%)
Hybrids: 375 (0.41%) , 41624 (45.69%)

Rogues:94760
Assassination: 24234 (25.57%) , 19420 (20.49%)
Combat: 47267 (49.88%) , 40052 (42.27%)
Subtlety: 21193 (22.36%) , 13901 (14.67%)
Hybrids: 2066 (2.18%) , 21567 (22.76%)

Shamans:51279
Elemental: 13797 (26.90%) , 6326 (12.34%)
Enhancement: 16685 (32.54%) , 12951 (25.26%)
Restoration: 20743 (40.45%) , 20274 (39.54%)
Hybrids: 54 (0.11%) , 11728 (22.87%)

Warlock:85389
Affliction: 39612 (46.39%) , 32815 (38.43%)
Demonology: 27128 (31.77%) , 21758 (25.48%)
Destruction: 18074 (21.17%) , 13926 (16.31%)
Hybrids: 575 (0.67%) , 16890 (19.78%)

Warrior:112765
Arms: 41118 (36.46%) , 15151 (13.44%)
Fury: 16159 (14.33%) , 13553 (12.02%)
Protection: 55178 (48.93%) , 53906 (47.80%)
Hybrids: 310 (0.27%) , 30155 (26.74%)

Armory Miner Gets Props!

Okoloth just got a mention on WoW Insider! It's good that people are realizing the power of the Armory.

Standardizing Armory data storage

There are a number of projects out there to fetch and operate on Armory data. However, there doesn't seem to be much interaction between these projects. The simplest way to cache Armory data is to just fetch the 5 XML files for each character and store them as blobs in a table. However, that's not really conducive to doing complex queries and searching it doesn't scale. I'm working on a database representation of character data (arena teams is on the table next) and I'm wondering if anyone else out there is interested in working on this. The goal is to create a freely available database schema that any Armory-related project could take advantage of for reasonably quick offline searching. If you're interested, drop me a line at thebuildmine@gmail.com. Once the schema's done, then work can begin on a fetch and management library for various scripting languages. (perl, php, python, ruby, etc.)

Round 2 gathering has started

After going through the first grab of 2.5M characters, I've gone back and done some refinements on data gathering and optimization. I'm currently acquiring the latest data on all the characters I've gotten before, sorted by level descending, since what we're most interested in in the short term is data on level 70s. This time I'm fetching all of the armory data and storing it instead of just the character sheet for more complex queries and comparisons. Eventually I was get things to a point where live searches can be done.