Data-Driven Exploration of the R User Community Worldwide

Why a dashboard to explore R user groups globally now?

  • We sought to present an objective representation of R’s popularity that would inform members of the data science community about its growth and activities. By making our solution open-source and data-driven, we maximized transparency and enhanced trustworthiness. This dashboard also helps leaders understand their user groups in a broader context while revealing opportunities for potential leaders to initiate new groups. Finally, the depiction of global distribution contributes to an important and evolving story about trends and opportunities in different parts of the world.
  • For decision making, R centered organizations may need to geographically measure the presence of R users and groups for events planning, diversity programs, and other activities that could expand the use of the R language in under-represented regions.

The R Consortium

Classifying R User Community Groups

  1. Some R user groups do not include “R” among their topics or areas of focus, or include only “R Project for Statistical Computing”, including ones with names comprised of “Location” + “R User Group” (e.g. “Las Vegas R User Group”) and “Location” + “Data Science” or “Analytics” (e.g. “Charleston Data Science”).
  2. There are other groups that have Python, Julia, KNIME, Hadoop, etc. in their Meetup names, yet mention “R Project for Statistical Computing” in their topics, or mention R and other languages in their names and topics. Classifying user-groups by searching their Meetup names alone and not searching their topics fields excluded many of such groups.
  3. Apart from “R Project for Statistical Computing”, many groups identify with the R language by referencing phrases such as “Programming in R”, “Data Science using R”, “R Programming Language”.
  4. Other unexpected name stylings were discovered as exemplified by: useRs, PhillyR, BelgradeR, etc.

Our Solution

  • Retrieve all data science groups on Meetup (7700 +) and use string matching to select groups that contain strings like “r user”, “r-user”,“r-lab”,“phillyr”,“rug”,“bioconductor”,“r-data”,“rug” in their Meetup URL names. We then performed a second round of string matching to search for strings like “programming-in-r”, “r-programming-”, “-using-r”, “r-language”, and “r-project-for-statistical” in the groups’ topics field.
  • Retrieve all data analysis groups on Meetup (1190 +) and use string matching to select groups that contain strings like “r user”, “r-user”,“r-lab”,“phillyr”,“rug”,“bioconductor”,“r-data”,“rug” in their Meetup URL names. We then performed a second round of string matching to search for strings like “programming-in-r”, “r-programming-”, “-using-r”, “r-language”, and “r-project-for-statistical” in the groups’ topics field.
  • Retrieve all user groups on Meetup that mention “r-project-for-statistical-computing” in their topics separately.
  • Retrieve all R-Ladies groups separately because some were left out by the aforementioned matches.

What Was Achieved

  1. We used the meetupr package to extract R user groups from Meetup.com
  2. Improved the existing find_groups() and get_events() functions in meetupr to meet our requirements, and updated the API key usage to the recently required OAuth 2.0 authentication system
  3. Transformed the data retrieved from Meetup via meetupr from data frames to JSON, GeoJSON and CSV
  4. Stored the data by committing the JSON/GeoJSON/CSV files to the GitHub repository of the project
  5. Developed a static HTML dashboard interface based on Gentelella open-source Bootstrap template and rendered the stored data via dashboard components.
  6. Automated the process of extracting R user groups, data transformation and storage using Travis CI.
  7. Deployed the dashboard via GitHub Pages

Switching from Meetup API keys to OAuth 2.0 Authentication System

Some Highlights from the Dashboard

  • A leaflet map with markers and pop-ups filled with information about the user groups’ membership, events, and status (active, inactive, or unbegun).
  • A cluster map to aggregate markers in clusters and provide cluster counts using leaflet-markercluster.js Javascript library.
  • Top destinations for R user groups based on membership across 6 regions.

The Tools

  1. R, RStudio and the following packages:
  • meetupr, curl, jsonlite and leafletR
  1. Javascript and the following libraries: jquery.js, d3.js, echarts.js, leaflet.js, leaflet-markercluster.js and lodash.js
  2. Gentelella Admin Dashboard Bootstrap HTML template
  3. Travis CI to build the project, execute R scripts and bash commands
  4. Bash commands to call R scripts and commit modified files to GitHub

How We Achieved It

  1. We used the meetupr R package to retrieve R User Groups from meetup.com.
  2. We further analyzed this data by computing several summaries. We used the leafletR package to transform our data frame to GeoJSON. We used this GeoJSON file to create a leaflet map using leaflet.js. In this map, R user groups are separated into three groups with markers of three color categories: Active (blue), Inactive (dark-blue), and Unbegun (orange):
  • Active groups have had an event in the past 180 days or have an upcoming event in the future
  • Inactive groups have not had an event in the past 180 days and do not have an upcoming event
  • Unbegun groups have not had an event in the past and none are planned for the future
  1. Persisted all data and our summaries in CSV / JSON files. After each Travis build, the data and our summaries get updated directly from the Meetup API.
  2. We wrote bash commands to run our R scripts, and commit updated CSV / JSON files to GitHub after every Travis build.
  3. We setup Travis Cron Jobs to build this project daily and update our data. This update happens around 09:45AM UTC.
  4. We then customized the Gentelella Admin Dashboard Bootstrap HTML template to our requirements.
  5. Rendered our summaries via widgets on this dashboard. Used Javascript/libraries to produce maps, charts and tables.

The Final Product

Feedback

Acknowledgments

Next

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

94K Followers

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.