diff --git a/.github/workflows/update_wiki.yaml b/.github/workflows/update_wiki.yaml index 101d95a..1bf3f91 100644 --- a/.github/workflows/update_wiki.yaml +++ b/.github/workflows/update_wiki.yaml @@ -27,11 +27,11 @@ jobs: make html shell: bash - - name: Convert .rst files to Markdown, including index.rst as Home.md + - name: Convert .rst files to Markdown, including index.rst as Home.md, with capitalized names run: | sudo apt-get install -y pandoc mkdir -p converted_md - find . -name "*.rst" -exec sh -c 'if [ "$(basename {})" = "index.rst" ]; then pandoc -f rst -t markdown -o converted_md/Home.md "{}"; else pandoc -f rst -t markdown -o converted_md/"$(basename {} .rst).md" "{}"; fi' \; + find . -name "*.rst" -exec sh -c 'filename=$(basename {} .rst); if [ "$filename" = "index" ]; then pandoc -f rst -t markdown -o converted_md/Home.md "{}"; else capitalized_filename="$(tr '[:lower:]' '[:upper:]' <<< ${filename:0:1})${filename:1}"; pandoc -f rst -t markdown -o converted_md/"${capitalized_filename}.md" "{}"; fi' \; shell: bash - name: Push Documentation to Wiki diff --git a/docs/coxph/implementation.rst b/docs/coxph/implementation.rst index 5834dfa..7cb1ca2 100644 --- a/docs/coxph/implementation.rst +++ b/docs/coxph/implementation.rst @@ -19,7 +19,7 @@ The central part is responsible for the following tasks: - Compute the aggregated model parameters. ``compute_derivatives`` -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~ This function computes the primary and secondary derivatives needed to compute the maximum likelihood estimates of the model parameters. The function is called by the central part and is executed on the central aggregator. @@ -30,13 +30,13 @@ to the data that is stored on the node. The partials are executed in parallel on node. ``get_unique_event_times`` -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~ This function retrieves unique event times and their counts from the selected database. ``compute_summed_z`` -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~ This function computes the sum of the specified explanatory variables for the outcome events. ``perform_iteration`` -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ This function performs an iteration of the algorithm, computing the necessary aggregates. diff --git a/docs/coxph/privacy.rst b/docs/coxph/privacy.rst index 2754ad4..c7397df 100644 --- a/docs/coxph/privacy.rst +++ b/docs/coxph/privacy.rst @@ -5,7 +5,7 @@ Guards ------ Sample size threshold -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ The algorithm has a minimal threshold for the number of rows in the selected database. This threshold is set to 10 rows. If the number of rows in a given data station is below this threshold, the data station will not be included in the federated learning process and will be marked in the result. @@ -31,31 +31,35 @@ Vulnerabilities to known attacks .. which attacks would be possible in your system. -.. list-table:: - :widths: 25 10 65 - :header-rows: 1 - - * - Attack - - Risk eliminated? - - Risk analysis - * - Reconstruction - - ✔ - - The amount of information shared was considered insufficient to allow reconstruction of the data underlying the model. - * - Differencing - - ⚠ - - This is indeed possible in case a data station manager were to change the dataset after performing a task, but data station managers should not be allowed to run tasks to prevent this. - * - Deep Leakage from Gradients (DLG) - - ✔ - - This is possible in the central aggregator, but this should be a trusted party and the shared information was considered insufficient to allow for DLG. - * - Generative Adversarial Networks (GAN) - - ✔ - - Synthetic can indeed be used to (statistically) reproduce the data that underlies the produced model, but without knowing the sensitive information the adversary will not be able to assess its authenticity. - * - Model Inversion - - ✔ - - The model prediction can indeed be used to infer the outcome of an actual individual, but without knowing the sensitive information the adversary will not be able to assess its authenticity. - * - Watermark Attack - - ⚠ - - To be determined +✔ Reconstruction +~~~~~~~~~~~~~~~ +**Risk analysis**: +The amount of information shared was considered insufficient to allow reconstruction of the data underlying the model. + +⚠ Differencing +~~~~~~~~~~~~~ +**Risk analysis**: +This is indeed possible in case a data station manager were to change the dataset after performing a task, but data station managers should not be allowed to run tasks to prevent this. + +✔ Deep Leakage from Gradients (DLG) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +**Risk analysis**: +This is possible in the central aggregator, but this should be a trusted party and the shared information was considered insufficient to allow for DLG. + +✔ Generative Adversarial Networks (GAN) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +**Risk analysis**: +Synthetic data can indeed be used to (statistically) reproduce the data that underlies the produced model, but without knowing the sensitive information the adversary will not be able to assess its authenticity. + +✔ Model Inversion +~~~~~~~~~~~~~~~~~ +**Risk analysis**: +The model prediction can indeed be used to infer the outcome of an actual individual, but without knowing the sensitive information the adversary will not be able to assess its authenticity. + +⚠ Watermark Attack +~~~~~~~~~~~~~~~~~~ +**Risk analysis**: +To be determined .. TODO verify whether these definitions are correct. For reference: diff --git a/docs/coxph/references.rst b/docs/coxph/references.rst index f484ce2..c7a94bb 100644 --- a/docs/coxph/references.rst +++ b/docs/coxph/references.rst @@ -1,12 +1,7 @@ References ========== + The Cox proportional hazards model in this algorithm is based on the following scientific publication: -- Authors - Chia-Lun Lu, Shuang Wang, Zhanglong Ji, Yuan Wu, Li Xiong, Xiaoqian Jiang, Lucila Ohno-Machado, -- Title - WebDISCO: a web service for distributed cox model learning without patient-level data sharing, -- Journal - Journal of the American Medical Informatics Association, Volume 22, Issue 6, November 2015, Pages 1212 – 1219, -- DOI - https://doi.org/10.1093/jamia/ocv083 \ No newline at end of file +Lu, C. L., Wang, S., Ji, Z., Wu, Y., Xiong, L., Jiang, X., & Ohno-Machado, L. (2015). WebDISCO: a web service for distributed cox model learning without patient-level data sharing. Journal of the American Medical Informatics Association : JAMIA, 22(6), 1212 – 1219. + \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 586cd12..632c3d0 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,9 +9,9 @@ This algorithm provides an implementation of the Cox Proportional Hazards model. Authors ------- -B Gottardelli -V Gouthamchand -J Hogenboom +- B. Gottardelli +- V. Gouthamchand +- J. Hogenboom Source code -----------