Week 6

This week, we focused on finishing most of the dataframe metric reference implementations. Adding tests was also an important priority.

Tasks for the week

  • Create a script for issue related metrics
  • Implement pull request related metrics
  • Added doc strings and a few more tests in the commits hierarchy
  • Added modules for non-pandas implementations

Summary for the week

  • Create a script for issue related metrics(#180)
    A point we considered here was that the GitHub API (from where Perceval fetches data) considers all pull requests to be issues. For metrics falling under the issue-hierarchy, we had a choice between filtering out pull requests (which the API had included in issue data) while computing the metric and filtering the data before passing it to the class. We chose to do the latter, so that metric computation would remain as simple as possible.

  • Implement pull request related metrics (#204 #205) Added reference implementation scripts and notebooks for Reviews Duration and Reviews Declined.

  • Added doc strings and a few more tests in the commits hierarchy(#200)
    Added tests for the commit hierarchy of metrics. Though this pull request was opened a while ago, a few iterations were required to get it accepted. Test commit data was also added in this patch.

  • Added tests for the pull request hierarchy(#206)
    Added tests and test data for the pull request hierarchy, with tests for pullrequest_github.py and reviews_github.py. Just like the commit hierarchy, tests for the pull request hierarchy were added, along with the test data.

  • Added modules for non-pandas implementations(#203)
    Added modules for the non-pandas branch of reference implementations. I am facing a little problem making the time_series part, but Jesus has addressed it and we will work on it later. The resample() method we used for the implementations using dataframes and pandas is difficult to implement as is and we are looking for a substitute.

$ \ $

Meeting details

The weekly IRC meeting with my mentors held on Tuesday (the 16th) is summarized below.

Agenda

  • Improving inheritance
  • Regarding the script for metric evaluation
  • Re-implementing new_contributors_of_* metrics

Summary

  • Improving inheritance
    As discussed in previous posts, the goal for now is to move as many commonalities up the hierarchy as possible. When you take a closer look, it is clear that every metric requires just a small modification to the otherwise common structure for the dataframe storing the cleaned and filtered Perceval data. We will have another method which would deal with this, apart from the existing _flatten method.

  • Regarding the script for metric evaluation
    Jesus suggested adding a functionality for charts in the script for calculating metrics on given data.

  • Re-implementing new_contributors_of_* metrics
    We realized that the metrics calculating the number of new contributors in our implementation required an additional parameter, as compared to all other metrics. This problem was partly due to the function of the _flatten method, which would eliminate data from the dataframe which did not satisfy a particular condition, like a range of dates. For metrics calculating new contributors, along with data that fits in a given date range, we also need data before that range, so that we can look for new contributors when compared to the previous data. The next patch will fix this issue.

$ \ $

  • Tasks for the next week
    • Carry on with non-pandas implementations

    • Continue with tests

    • Continue with the script to compute all metrics

    • Continue with the non-pandas reference implementations

    • Add charting to the analysis script

    • Continue implementing other metrics that are as yet undergoing discussion, if possible

The log for this meeting can be found here.

For older GSoC posts, please click here.