Networkit: R port

From Berkman Klein Google Summer of Code Wiki
Revision as of 12:35, 22 January 2019 by Epopko (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
This page is for an old project that is not be part of Google Summer of Code currently. If you are a student looking for projects to get involved with we suggest you check out the projects linked to from the main page of this wiki.


Project Description

Network analysis is a widely used method for the analysis of many social phenomenon. With the advent of social media many scholars applies network analysis to better understand the nature of interaction in those environments. Therefore, existence of efficient network analysis tools is key for such research particularly considering the increasing size of networks that become available via digital trace data. Currently, there are a handful of network analysis packages widely used by data scientists including NetworkX and igraph. NetworKit is another package which is designed to process very large networks and outperform the aforementioned packages.

NetworKit description introduced it as “ an open-source tool suite for high-performance network analysis. Its aim is to provide tools for the analysis of large networks in the size range from thousands to billions of edges. For this purpose, it implements efficient graph algorithms, many of them parallel to utilize multicore architectures. These are meant to compute standard measures of network analysis. NetworKit is focused on scalability and comprehensiveness. … NetworKit is a Python module. High-performance algorithms are written in C++ and exposed to Python via the Cython toolchain.”

Along Python, R is one of the most widely used programming environments for data science. Because NetworKit has some advantages in terms of performance to existing network analysis packages in R, it desirable to have an R package which can interface with the C++ core of NetwroKit.

In this project, we aim to develop an R interface for NetworKit to make it accessible to the community or R users. Rccp is an extension package for R which offers an easy-to-use interface between C++ and R; therefore, makes it easy to create packages which uses the efficiency and performance of C++ while make the algorithms available in R to work interactively and with a rich environment of tools for data analysis.

For this project, the [intern] needs to know C++ and be willing to learn how to work in R to create an interface for the existing NetworKit C++ core. Rcpp is well-documented and there are many examples and guides for development of packages to interface with C++ code. In this project will adhere to the core of the NetworKit project for consistency and will develop the whole package to become available via the Comprehensive R Archive Network (CRAN) for public use.

Ideal project candidate

  • folks that are interested in network analysis
  • proficiency in C++
  • proficiency working with R