Update docs

Wh1isper · Jul 27, 2023 · d7efed4 · d7efed4
1 parent 3d15e5b
commit d7efed4
Show file tree

Hide file tree

Showing 8 changed files with 253 additions and 3 deletions.
diff --git a/config.md b/config.md
@@ -1 +1,4 @@
-TODO
+It will be ready as soon as possible after Relase 0.1.0, until then you can refer to the source code file [sparglim/config/configer.py](sparglim/config/configer.py)
+
+
+TODO: Generate avaliable envs for config spark session
diff --git a/dev/jupyter-sparglim-on-k8s/README.md b/dev/jupyter-sparglim-on-k8s/README.md
@@ -0,0 +1,11 @@
+This is development verification for [examples/jupyter-sparglim-on-k8s](../../examples/jupyter-sparglim-on-k8s)
+
+Use  [docker/Dockerfile.jupyterlab-sparglim](../docker/Dockerfile.jupyterlab-sparglim) to build a dev version `jupyterlab-sparglim`.
+
+```
+# In project root dir
+docker build -t wh1isper/jupyterlab-sparglim:dev -f dev/docker/Dockerfile.jupyterlab-sparglim .
+
+# reload by deleting deployment pod
+./dev/scripts/reload.sh
+```
diff --git a/dev/jupyter-sparglim-sc/README.md b/dev/jupyter-sparglim-sc/README.md
@@ -0,0 +1,13 @@
+This is development verification for [examples/jupyter-sparglim-sc](../../examples/jupyter-sparglim-sc)
+
+Use  [docker/Dockerfile.jupyterlab-sparglim](../docker/Dockerfile.jupyterlab-sparglim) and [docker/Dockerfile.sparglim-server](../docker/Dockerfile.sparglim-server)  and to build a dev version `jupyterlab-sparglim` and  `sparglim-server`
+
+```
+# In project root dir
+docker build -t wh1isper/jupyterlab-sparglim:dev -f dev/docker/Dockerfile.jupyterlab-sparglim .
+docker build -t wh1isper/sparglim-server:dev -f dev/docker/Dockerfile.sparglim-server .
+
+
+# reload by deleting deployment pod
+./dev/scripts/reload.sh
+```
diff --git a/dev/sparglim-server/README.md b/dev/sparglim-server/README.md
@@ -0,0 +1,11 @@
+This is development verification for [examples/sparglim-server](../../examples/sparglim-server)
+
+Use [docker/Dockerfile.sparglim-server](../docker/Dockerfile.sparglim-server) to build a dev version `sparglim-server`
+
+```
+# In project root dir
+docker build -t wh1isper/sparglim-server:dev -f dev/docker/Dockerfile.sparglim-server .
+# reload by deleting deployment pod
+./dev/scripts/reload.sh
+
+```
diff --git a/dev/sparglim-server/k8s/connect-server-service.yaml b/dev/sparglim-server/k8s/connect-server-service.yaml
@@ -13,7 +13,7 @@ spec:
     - name: spark-ui
       protocol: TCP
       port: 4040
-      nodePort: 30042
+      nodePort: 30040
       targetPort: 4040
   selector:
     app: sparglim-server

diff --git a/examples/jupyter-sparglim-on-k8s/README.md b/examples/jupyter-sparglim-on-k8s/README.md
@@ -0,0 +1,89 @@
+In [Quick Start](../../README.md#quick-start), we start a `local[*]` PySpark Session for data explorations in JupyterLab. This example is using  `spark on k8s for same purpose`.
+
+# Prepare
+
+## Namespace: `sparglim`
+
+```
+kubectl create ns sparglim
+```
+
+## Grant authorization
+
+You need to authorize the pod so that it can create pods(executor)
+
+For a simple test, you can grant administrator privileges to all pods using the following command (**DO NOT this in a production environment**)
+
+```
+kubectl create clusterrolebinding serviceaccounts-cluster-admin
+  --clusterrole=cluster-admin
+  --group=system:serviceaccounts
+```
+
+# Apply and access
+
+```
+# In project root
+kubectl apply -f example/jupyter-sparglim-on-k8s/k8s
+```
+
+Check pod is running:
+
+```
+$: kubectl get pod -n sparglim
+
+NAME                                           READY   STATUS    RESTARTS   AGE
+sparglim-app-5499f54f6b-gk4xv                  1/1     Running   0          33m
+```
+
+Access JupyterLab and try it out:
+
+`http://<master-ip>:30888`
+
+# Usage
+
+## Code
+
+Using code for `spark on k8s` initialization
+
+```
+from sparglim.config.builder import ConfigBuilder
+spark = ConfigBuilder().config_k8s().get_or_create()
+```
+
+When SparkSession created, check executor is up:` kubectl get pod -n sparglim`
+
+```
+NAME                               READY   STATUS    RESTARTS   AGE
+sparglim-825bf989955f3593-exec-1   1/1     Running   0          53m
+sparglim-825bf989955f3593-exec-2   1/1     Running   0          53m
+sparglim-825bf989955f3593-exec-3   1/1     Running   0          53m
+sparglim-app-8495f7b796-2h7sc      1/1     Running   0          53m
+```
+
+## SQL
+
+This will auto config SparkSession to `k8s` mode, via env `SPARGLIM_SQL_MODE`
+
+```python
+%load_ext sparglim.sql
+from sparglim.config.builder import ConfigBuilder
+spark = ConfigBuilder().get_or_create() # No need to config_k8s(), ConfigBuilder is a Singleton
+```
+
+Test it:
+
+```python
+%sql SHOW TABLES;
+```
+
+
+When SparkSession created, check executor is up: `kubectl get pod -n sparglim`
+
+```
+NAME                               READY   STATUS    RESTARTS   AGE
+sparglim-825bf989955f3593-exec-1   1/1     Running   0          53m
+sparglim-825bf989955f3593-exec-2   1/1     Running   0          53m
+sparglim-825bf989955f3593-exec-3   1/1     Running   0          53m
+sparglim-app-8495f7b796-2h7sc      1/1     Running   0          53m
+```
diff --git a/examples/jupyter-sparglim-sc/README.md b/examples/jupyter-sparglim-sc/README.md
@@ -0,0 +1,76 @@
+In [Quick Start](../../README.md#quick-start), we start a `local[*]` PySpark Session for data explorations in JupyterLab, and a `local[*]` Spark Connect Server. This example will combine both of the above on k8s: A PySpark Connect client from JupyterLab on k8s, connect to a Spark Connect Server on k8s.
+
+# Prepare
+
+## Namespace: `sparglim`
+
+```
+kubectl create ns sparglim
+```
+
+## Grant authorization
+
+You need to authorize the pod so that it can create pods(executor)
+
+For a simple test, you can grant administrator privileges to all pods using the following command (**DO NOT this in a production environment**)
+
+```
+kubectl create clusterrolebinding serviceaccounts-cluster-admin
+  --clusterrole=cluster-admin
+  --group=system:serviceaccounts
+```
+
+# Apply and access
+
+```
+# In project root
+kubectl apply -f example/jupyter-sparglim-sc/k8s/jupyter-sparglim/
+kubectl apply -f example/jupyter-sparglim-sc/k8s/sparglim-server/
+```
+
+Check pod is running:
+
+```
+$: kubectl get pod -n sparglim
+
+NAME                                           READY   STATUS    RESTARTS   AGE
+sparglim-app-5499f54f6b-gk4xv                  1/1     Running   0          33m
+```
+
+Access JupyterLab and try it out:
+
+`http://<master-ip>:30888`
+
+Access SparkUI:
+`http://<master-ip>:30040`
+
+# Usage
+
+## Code
+
+Using code for `spark on k8s` initialization
+
+```python
+from sparglim.config.builder import ConfigBuilder
+spark = ConfigBuilder().config_connect_client().get_or_create()
+```
+
+## SQL
+
+This will auto config SparkSession to `connect_client` mode, via env `SPARGLIM_SQL_MODE`
+
+```python
+%load_ext sparglim.sql
+from sparglim.config.builder import ConfigBuilder
+spark = ConfigBuilder().get_or_create() # No need to config_connect_client(), ConfigBuilder is a Singleton
+```
+
+Test it:
+
+```python
+%sql SHOW TABLES
+```
+
+# TIPS
+
+Any configuration on the client side, such as `spark.sql.repl.eagerEval.enabled=true`, is not effective. So `%sql`(`%%sql`) can't display the dataframe. You can use `df.show()` instead.
diff --git a/examples/sparglim-server/README.md b/examples/sparglim-server/README.md
@@ -1,6 +1,53 @@
+In [Quick Start](../../README.md#quick-start), we start a `local[*]` Spark Connect Server. This example will show how to start a Spark Connect Server on k8s.
+
+# Prepare
+
+## Namespace: `sparglim`
+
+```
+kubectl create ns sparglim
+```
+
+## Grant authorization
+
+You need to authorize the pod so that it can create pods(executor)
+
+For a simple test, you can grant administrator privileges to all pods using the following command (**DO NOT this in a production environment**)
+
+```
+kubectl create clusterrolebinding serviceaccounts-cluster-admin
+  --clusterrole=cluster-admin
+  --group=system:serviceaccounts
+```
+
+# Apply and access
+
+```
+# In project root
+kubectl apply -f example/sparglim-server/k8s
+```
+
+Check pod is running:
+
+```
+$: kubectl get pod -n sparglim
+
+NAME                                           READY   STATUS    RESTARTS   AGE
+sparglim-server-5696c9466d-s75bh               1/1     Running   0          86s
+spark-connect-server-6c5a798995af404f-exec-1   1/1     Running   0          52s
+spark-connect-server-6c5a798995af404f-exec-2   1/1     Running   0          52s
+spark-connect-server-6c5a798995af404f-exec-3   1/1     Running   0          52s
+```
+
+Access SparkUI:
+`http://<master-ip>:30040`
+
+
+# Connect it with `sparglim`
+
 ```python
 import os
-os.environ["SPARGLIM_REMOTE"] = "sc://localhost:30052"
+os.environ["SPARGLIM_REMOTE"] = "sc://<master-ip>:30052" # Also avaliable `export SPARGLIM_REMOTE=sc://<master-ip>:30052` before start python
 
 from sparglim.config.builder import ConfigBuilder
 from datetime import datetime, date