The end is here! This journey officially started on the 27th of May, but for me personally, it started way back in January, with my first contribution to the CHAOSS project — fixing a mere typo. Several thousand lines of code and about a hundred commits later, here we are!

The title of my project is Implementing CHAOSS Metrics with Perceval and can be found here.

It involved creating reference implementations for several metrics using jupyter notebooks, pandas and matplotlib with the data fetched by Perceval.

Almost the entirety of my work was done in the implementations folder of the wg-evolution repository.

The GSoC coding period is divided into three phases. The first phase is from the end of May to the end of June. The second one lasts for a month after that, while the third phase ends in August.

What’s been accomplished

  • Reference implementations have been created for a number of metrics.
    Each reference implementation consists of two jupyter notebook and two python scripts, one for a pandas implementation and the other for a plain-python one.
  • Tests for the implementations have been added.
  • A command line script to compute metrics on a repository has been created.
    It can output the computation results as a pdf, a json file and even generate charts as .png images.

Below, I have summarized my work done during the Coding Period, listing my pull requests as well as my weekly blog posts, which talk about my weekly work.

$ \ $

Coding Phase 1 (May 27th to June 28th)

Summary

  • This period involved a lot of experimentation. The initial structure for reference implementations consisted of a single class, which read json data fetched by Perceval and returned different analyses based on the data.

  • We decided on a three-level hierarchy. There would be three category classes, each corresponding to one type of data the metrics work on — commits, issues and pull requests. Individual metric modules would be at the lowest level.
  • Every implementation would have two notebooks, one using pandas, and one with just plain python. Scripts exported from the notebooks would also be added.
  • The directory structure would be as shown in the image above. The “pandas” version of scripts would be in code_df, while notebooks corresponding to the pandas implementations would be located in notebooks_df. Similarly, the plain-python version of notebooks and scripts would be present in the notebooks and the scripts directories respectively.
  • For example, the “pandas” version of the Code Changes metric’s script and notebook can be found here and here respectively.

Relevant posts

List of pull requests

Polaris000/GSoC_19_Perceval_Implementations

  • #1 New_structure for reference implementation
  • #4 Add root class python script and notebook
  • #5 Add modified version of Code_changes-git
  • #7 Second attempt at creating classes
  • #9 Add pure python implementation for CodeChanges

chaoss/wg-evolution

  • #162 Add reference implementation and python script for CodeChanges metric
  • #172 Add reference implementation for new_contributors_of_commits
  • #176 Re-implement compute_timeseries
  • #190 Reference implementation for Code_Changes_Lines

$ \ $

Coding Phase 2 (June 28th to July 26th)

Summary

  • During this period, I started adding a plain python implementation for each metric along with the pandas implementations.
  • A separate directory named scripts houses these plain-python implementations, while notebooks contains the corresponding plain python notebooks.

    chart

  • The adding of tests was also started in this phase.
  • The idea of having a command line script which would run the metrics on user data was discussed.

Relevant posts

List of pull requests

Polaris000/GSoC_19_Perceval_Implementations

  • #17 Add week6 update, flake8 to script and remove logcleaner notebook

chaoss/wg-evolution

  • #193 Add reference implementation for Reviews
  • #194 Add reference implementation for Review Accepted
  • #200 Add tests for commit-based DataFrame reference implementations
  • #202 Add reference implementation for Issue Response Time
  • #203 Add non-pandas reference implementation for CodeChanges metric
  • #204 Add reference implementation for Reviews_Duration
  • #205 Add reference implementation for Reviews_Declined
  • #206 Add tests for the pull request hierarchy
  • #210 Add non-pandas reference implementation for CodeChangesLines
  • #211 Add tests for the utils and conditions modules (with pandas)
  • #212 Add tests for the non-pandas commit hierarchy of implementations
  • #222 Implement new naming convention and fix imports
  • #223 Add non-pandas reference implementation for Reviews

$ \ $

Coding Phase 3 (July 26th to August 26th)

Summary

  • In this phase, I mainly focused on adding the remaining reference implementations and tests.
  • I also implemented a command line script, currently supporting json, markdown and png images (for charts) as output formats. My mentor Valerio helped me a tonne on the finer points.

    analyze.gif

  • Finally, I added some documentation, describing the structure of the implementations as well as instructions on adding more metric implementations in the future as they are defined.

Relevant posts

List of pull requests

chaoss/wg-evolution

  • #207 Add execution script
  • #227 Add non-pandas reference implementation for Reviews Accepted
  • #228 Add tests for non-pandas pullrequest series
  • #232 Add reference implementation for Issues New
  • #236 Add pandas reference implementation for Issues Closed
  • #237 Add non-pandas reference implementation for the Reviews Declined metric
  • #239 Add tests for the non-pandas versions of utils and conditions modules
  • #240 Add a few missing tests
  • #241 Add documentation
  • #242 Add the plot_time_series method to implementation modules

What’s left to be done

  • The defining of metrics is still a work in progress and as they are defined, their implementations would have to be added.
  • The command line execution script can also generate a markdown file as output.

$ \ $

I’d like to thank my mentors, Jesus Gonzalez-Barahona, Valerio Cosentino & Pranjal Aswani. This wouldn’t have been possible without their guidance.

I’d also like to thank Georg Link and Matt Germonprez for their regular responses to my weekly blog posts.

Finally, thanks to the entire CHAOSS community. This was my first time contributing to an open source project and I absolutely enjoyed it!