This was the inaugural project under a new LLC, LabMix Development. It was a proof-of-concept experiment in using WordPress plugin repository data to look at trends in plugin development.
It used the Visualizer: Charts and Graphs plugin for charting, allowing for SVG generated charts. The data was collected on page load from a database that would be live updated down the line.
Data was gathered monthly from May through December of 2016 by crawling the repository using a perl script that used the web::query library. This is not how it should be done.
Correctly gathering data would have been via the WordPress Plugin API. Unfortunately, documentation for the API at the time was on the opaque side, making a crawler the more expedient option. To avoid abusing server resources, the scrape took about 60-72 hours to complete.
The collected data included:
- Plugin name and URL,
- Author name and URL
- The tags used for describing the plugin’s use
- The highest known version of WordPress compatibility
- Date of last update to the plugin
- Download count
- Overall average rating
- Total rating count
- Counts of each rating value (i.e. the number of 1, 2, 3, 4 & 5 star ratings each)
If the plugin was flagged as “old” the name was copied to an additional separate file, but also kept in this pool. All of these data points were considered potentially interesting stuff worth looking into down the line.
On Plugin Ratings
The value of analyzing plugin ratings data is up for debate. Regardless, with well over 57,000 plugins (at the time) in the repository, I was compelled to see if I could get an understanding of their usage and market overall.
Initially I thought it might be helpful to look at reviews as a metric of quality (as intended). Especially interesting to me was to examine plugin usage and ratings when filtered by tag. Could looking at individual plugin tags and the ratio of downloads-to-ratings offer a sort of ‘heat’ or ‘controversy’ measure?
This might illuminate areas of opportunity for developing better plugins, or a way for the community to begin managing the size of the repository as a whole. But even a passing familiarity with consumer reviews, trolling, astroturfing, or “gaming” the system has shown, this can prove challenging.
What I found was that most of the most popular plugins were updated every day, and had very consistent download numbers. Draw your own conclusions.
Unfortunately, not long after going live with the alpha version of PluginScore and beginning to promote it within the WordPress developer community, the plugin repository itself was rebuilt, and the data was no longer available. Between this and seemingly little interest from the community, I shelved the project. It was a fun intro to data analytics though, and I learned some about Tableau, Open Refine, SVG, and even a bit of Perl doing it.