User:GPSLeo/stats-tools
Jump to navigation
Jump to search
Some little Tools to analyze Commons.
Get edit list[edit]
#!/usr/bin/env python3
import pywiki # my own toolset for mediawiki API stuff
import json
ua = "DATA-SCRIPT/0.1 (User:GPSLeo)"
url = "https://commons.wikimedia.org/w/api.php"
req = pywiki.setloginBot(ua)
out = []
quest = {
"action": "query",
"format": "json",
"list": "recentchanges",
"utf8": 1,
"formatversion": "2",
"rctype": "edit|new",
"rcprop": "title|timestamp|ids|tags|patrolled|user",
"rcshow": "anon",
"rclimit": "max"
}
#"rctag": "mobile edit"
n = False
while n == False:
data = req.get(url, params=quest)
data = data.json()
print(data)
for elem in data["query"]["recentchanges"]:
out.append(elem)
if "continue" not in data:
n = True
else:
quest["rccontinue"] = data["continue"]["rccontinue"]
with open("edits.json", "w") as outfile:
outfile.write(json.dumps(out, indent=4))
Analyze[edit]
Script does not run! Only a collection of code lines.
library(jsonlite)
library(plyr)
library(stringr)
setwd("")
data <- fromJSON("edits.json", simplifyDataFrame = T)
mobile <- fromJSON("editsmobile.json", simplifyDataFrame = T)
data$user <- as.factor(data$user)
users <- count(data$user)
#Users with more edits then
users100 <- users[which(users$freq >= 100),]
#Users with edits between
users2to9 <- users[which(users$freq >= 2 & users$freq <= 9),]
#Base stats
nrow(users100) #Number of IPs
sum(users2to9$freq) #Number of edits made by these IPs
# Count number of patrolled and autopatrolled(reverted) edits
usersVect <- users
patrolled = 0
autopatrolled = 0
for(r in 1:nrow(data[data$user %in% usersVect,])){
if(data[r,"patrolled"] == T){patrolled = patrolled+1}
if(data[r,"autopatrolled"] == T){autopatrolled = autopatrolled+1}
}
# Smaller variant for all edits
count = 0
for(r in 1:nrow(mobile)){
if(mobile[r,"patrolled"] == T){count=count+1}
#if(mobile[r,"autopatrolled"] == T){count=count+1}
}
As for IPs auropatrolled is practically the same as reverted. So we can take the number of auropatrolled edits as the number of reverted edits. The tags are not unsable in R without some cleanup before.
For the patrolled edits we need to reduce the count for edits marked as patrolled by the count of autopatrolled edits.
- Reverted = autopatrolled
- Patrolled = patrolled - autopatrolled
- Checked = patrolled
- All edits = sum of all edits
Results[edit]
Results as of 2021-12-15 13:40 and bit later for mobile edits.
All IPs | IPs with >100 edits | IPs with 10-100 edits | IPs with 2-9 edits | IPs with 1 edit | IP edits mobile | |
---|---|---|---|---|---|---|
IPs in group | 7378 | 85 | 449 | 2299 | 4545 | 3803 |
Edits | 54983 | 29975 | 13304 | 7359 | 4545 | 7889 |
% of IP edits | 100% | 54.5% | 24.2% | 13.4% | 8.3% | 14.3% |
Unchecked edits | 50874 | 28252 | 12530 | 6858 | 4230 | 5550 |
Unchecked edits % | 92.5% | 94.3% | 94.2% | 93.2% | 93.1% | 70.4% |
Checked edits (Revert + Patrol) | 4109 | 1723 | 774 | 501 | 315 | 2339 |
Checked edits % in group | 7.5% | 5.7% | 5.8% | 6.8% | 6.9% | 29.6% |
Patrolled edits | 1845 | 528 | 211 | 130 | 51 | 673 |
Patrolled edits % in group | 3.4% | 1.8% | 1.6% | 1.8% | 1.1% | 8.5% |
Reverted edits | 2264 | 1195 | 563 | 371 | 264 | 1666 |
Reverted edits % in group | 4.1% | 4% | 4.2% | 5% | 5.8% | 21.1% |
Reverted edits % of checked in group | 55.1% | 69.4% | 72.7% | 74.1% | 83.8% | 71.2% |