User:GPSLeo/stats-tools

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Some little Tools to analyze Commons.

Get edit list[edit]

#!/usr/bin/env python3

import pywiki # my own toolset for mediawiki API stuff
import json

ua = "DATA-SCRIPT/0.1 (User:GPSLeo)"
url = "https://commons.wikimedia.org/w/api.php"

req = pywiki.setloginBot(ua)

out = []

quest = {
	"action": "query",
	"format": "json",
	"list": "recentchanges",
	"utf8": 1,
	"formatversion": "2",
	"rctype": "edit|new",
	"rcprop": "title|timestamp|ids|tags|patrolled|user",
	"rcshow": "anon",
	"rclimit": "max"
}
#"rctag": "mobile edit"

n = False

while n == False:
    data = req.get(url, params=quest)
    data = data.json()
    print(data)
    for elem in data["query"]["recentchanges"]:
        out.append(elem)
    if "continue" not in data:
        n = True
    else:
        quest["rccontinue"] = data["continue"]["rccontinue"]
    
        
with open("edits.json", "w") as outfile:
    outfile.write(json.dumps(out, indent=4))

Analyze[edit]

Script does not run! Only a collection of code lines.

library(jsonlite)
library(plyr)
library(stringr)

setwd("")

data <- fromJSON("edits.json", simplifyDataFrame = T)

mobile <- fromJSON("editsmobile.json", simplifyDataFrame = T)

data$user <- as.factor(data$user)

users <- count(data$user)

#Users with more edits then
users100 <- users[which(users$freq >= 100),]

#Users with edits between
users2to9 <- users[which(users$freq >= 2 & users$freq <= 9),]

#Base stats
nrow(users100) #Number of IPs

sum(users2to9$freq) #Number of edits made by these IPs

# Count number of patrolled and autopatrolled(reverted) edits
usersVect <- users
patrolled = 0
autopatrolled = 0

for(r in 1:nrow(data[data$user %in% usersVect,])){
  if(data[r,"patrolled"] == T){patrolled = patrolled+1}
  if(data[r,"autopatrolled"] == T){autopatrolled = autopatrolled+1}
}

# Smaller variant for all edits
count = 0
for(r in 1:nrow(mobile)){
  if(mobile[r,"patrolled"] == T){count=count+1}
  #if(mobile[r,"autopatrolled"] == T){count=count+1}
}

As for IPs auropatrolled is practically the same as reverted. So we can take the number of auropatrolled edits as the number of reverted edits. The tags are not unsable in R without some cleanup before.
For the patrolled edits we need to reduce the count for edits marked as patrolled by the count of autopatrolled edits.

  • Reverted = autopatrolled
  • Patrolled = patrolled - autopatrolled
  • Checked = patrolled
  • All edits = sum of all edits

Results[edit]

Results as of 2021-12-15 13:40 and bit later for mobile edits.

All IPs IPs with >100 edits IPs with 10-100 edits IPs with 2-9 edits IPs with 1 edit IP edits mobile
IPs in group 7378 85 449 2299 4545 3803
Edits 54983 29975 13304 7359 4545 7889
% of IP edits 100% 54.5% 24.2% 13.4% 8.3% 14.3%
Unchecked edits 50874 28252 12530 6858 4230 5550
Unchecked edits % 92.5% 94.3% 94.2% 93.2% 93.1% 70.4%
Checked edits (Revert + Patrol) 4109 1723 774 501 315 2339
Checked edits % in group 7.5% 5.7% 5.8% 6.8% 6.9% 29.6%
Patrolled edits 1845 528 211 130 51 673
Patrolled edits % in group 3.4% 1.8% 1.6% 1.8% 1.1% 8.5%
Reverted edits 2264 1195 563 371 264 1666
Reverted edits % in group 4.1% 4% 4.2% 5% 5.8% 21.1%
Reverted edits % of checked in group 55.1% 69.4% 72.7% 74.1% 83.8% 71.2%