Parameters and Attributes

Parameter tuning For a more detailed explanation of the impact of tuning key parameters please see the Supplementary Analysis in our paper. PARC Supplementary Analysis

Parameters

Input parameter

Description

data

(numpy.ndarray) num samples x num features

true_label

(numpy.ndarray) (optional)

dist_std_local

(optional, default = 2) local pruning threshold: the higher the parameter, the more edges are retained

jac_std_global

(optional, default = ‘median’) global level graph pruning. This threshold can also be set as the number of standard deviations below the network’s mean-jaccard-weighted edges. 0.1-1 provide reasonable pruning. higher value means less pruning. e.g. a value of 0.15 means all edges that are above mean(edgeweight)-0.15*std(edge-weights) are retained. We find both 0.15 and ‘median’ to yield good results resulting in pruning away ~ 50-60% edges

random_seed

(optional, default = 42) The random seed to pass to Leiden

resolution_parameter

(optional, default = 1) Uses ModuliartyVP and RBConfigurationVertexPartition

jac_weighted_edges

(optional, default = True) Uses Jaccard weighted pruned graph as input to community detection. For very large datasets set this to False to observe a speed-up with little impact on accuracy

Attributes

Attributes

Description

labels

(list) length n_samples of corresponding cluster labels

f1_mean

(list) f1 score (not weighted by population). For details see supplementary section of paper

stats_df

(DataFrame) stores parameter values and performance metrics