Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Wh1isper committed Jul 27, 2023
1 parent 3d15e5b commit d7efed4
Show file tree
Hide file tree
Showing 8 changed files with 253 additions and 3 deletions.
5 changes: 4 additions & 1 deletion config.md
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
TODO
It will be ready as soon as possible after Relase 0.1.0, until then you can refer to the source code file [sparglim/config/configer.py](sparglim/config/configer.py)


TODO: Generate avaliable envs for config spark session
11 changes: 11 additions & 0 deletions dev/jupyter-sparglim-on-k8s/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This is development verification for [examples/jupyter-sparglim-on-k8s](../../examples/jupyter-sparglim-on-k8s)

Use [docker/Dockerfile.jupyterlab-sparglim](../docker/Dockerfile.jupyterlab-sparglim) to build a dev version `jupyterlab-sparglim`.

```
# In project root dir
docker build -t wh1isper/jupyterlab-sparglim:dev -f dev/docker/Dockerfile.jupyterlab-sparglim .
# reload by deleting deployment pod
./dev/scripts/reload.sh
```
13 changes: 13 additions & 0 deletions dev/jupyter-sparglim-sc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
This is development verification for [examples/jupyter-sparglim-sc](../../examples/jupyter-sparglim-sc)

Use [docker/Dockerfile.jupyterlab-sparglim](../docker/Dockerfile.jupyterlab-sparglim) and [docker/Dockerfile.sparglim-server](../docker/Dockerfile.sparglim-server) and to build a dev version `jupyterlab-sparglim` and `sparglim-server`

```
# In project root dir
docker build -t wh1isper/jupyterlab-sparglim:dev -f dev/docker/Dockerfile.jupyterlab-sparglim .
docker build -t wh1isper/sparglim-server:dev -f dev/docker/Dockerfile.sparglim-server .
# reload by deleting deployment pod
./dev/scripts/reload.sh
```
11 changes: 11 additions & 0 deletions dev/sparglim-server/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This is development verification for [examples/sparglim-server](../../examples/sparglim-server)

Use [docker/Dockerfile.sparglim-server](../docker/Dockerfile.sparglim-server) to build a dev version `sparglim-server`

```
# In project root dir
docker build -t wh1isper/sparglim-server:dev -f dev/docker/Dockerfile.sparglim-server .
# reload by deleting deployment pod
./dev/scripts/reload.sh
```
2 changes: 1 addition & 1 deletion dev/sparglim-server/k8s/connect-server-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
- name: spark-ui
protocol: TCP
port: 4040
nodePort: 30042
nodePort: 30040
targetPort: 4040
selector:
app: sparglim-server
Expand Down
89 changes: 89 additions & 0 deletions examples/jupyter-sparglim-on-k8s/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
In [Quick Start](../../README.md#quick-start), we start a `local[*]` PySpark Session for data explorations in JupyterLab. This example is using `spark on k8s for same purpose`.

# Prepare

## Namespace: `sparglim`

```
kubectl create ns sparglim
```

## Grant authorization

You need to authorize the pod so that it can create pods(executor)

For a simple test, you can grant administrator privileges to all pods using the following command (**DO NOT this in a production environment**)

```
kubectl create clusterrolebinding serviceaccounts-cluster-admin
--clusterrole=cluster-admin
--group=system:serviceaccounts
```

# Apply and access

```
# In project root
kubectl apply -f example/jupyter-sparglim-on-k8s/k8s
```

Check pod is running:

```
$: kubectl get pod -n sparglim
NAME READY STATUS RESTARTS AGE
sparglim-app-5499f54f6b-gk4xv 1/1 Running 0 33m
```

Access JupyterLab and try it out:

`http://<master-ip>:30888`

# Usage

## Code

Using code for `spark on k8s` initialization

```
from sparglim.config.builder import ConfigBuilder
spark = ConfigBuilder().config_k8s().get_or_create()
```

When SparkSession created, check executor is up:` kubectl get pod -n sparglim`

```
NAME READY STATUS RESTARTS AGE
sparglim-825bf989955f3593-exec-1 1/1 Running 0 53m
sparglim-825bf989955f3593-exec-2 1/1 Running 0 53m
sparglim-825bf989955f3593-exec-3 1/1 Running 0 53m
sparglim-app-8495f7b796-2h7sc 1/1 Running 0 53m
```

## SQL

This will auto config SparkSession to `k8s` mode, via env `SPARGLIM_SQL_MODE`

```python
%load_ext sparglim.sql
from sparglim.config.builder import ConfigBuilder
spark = ConfigBuilder().get_or_create() # No need to config_k8s(), ConfigBuilder is a Singleton
```

Test it:

```python
%sql SHOW TABLES;
```


When SparkSession created, check executor is up: `kubectl get pod -n sparglim`

```
NAME READY STATUS RESTARTS AGE
sparglim-825bf989955f3593-exec-1 1/1 Running 0 53m
sparglim-825bf989955f3593-exec-2 1/1 Running 0 53m
sparglim-825bf989955f3593-exec-3 1/1 Running 0 53m
sparglim-app-8495f7b796-2h7sc 1/1 Running 0 53m
```
76 changes: 76 additions & 0 deletions examples/jupyter-sparglim-sc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
In [Quick Start](../../README.md#quick-start), we start a `local[*]` PySpark Session for data explorations in JupyterLab, and a `local[*]` Spark Connect Server. This example will combine both of the above on k8s: A PySpark Connect client from JupyterLab on k8s, connect to a Spark Connect Server on k8s.

# Prepare

## Namespace: `sparglim`

```
kubectl create ns sparglim
```

## Grant authorization

You need to authorize the pod so that it can create pods(executor)

For a simple test, you can grant administrator privileges to all pods using the following command (**DO NOT this in a production environment**)

```
kubectl create clusterrolebinding serviceaccounts-cluster-admin
--clusterrole=cluster-admin
--group=system:serviceaccounts
```

# Apply and access

```
# In project root
kubectl apply -f example/jupyter-sparglim-sc/k8s/jupyter-sparglim/
kubectl apply -f example/jupyter-sparglim-sc/k8s/sparglim-server/
```

Check pod is running:

```
$: kubectl get pod -n sparglim
NAME READY STATUS RESTARTS AGE
sparglim-app-5499f54f6b-gk4xv 1/1 Running 0 33m
```

Access JupyterLab and try it out:

`http://<master-ip>:30888`

Access SparkUI:
`http://<master-ip>:30040`

# Usage

## Code

Using code for `spark on k8s` initialization

```python
from sparglim.config.builder import ConfigBuilder
spark = ConfigBuilder().config_connect_client().get_or_create()
```

## SQL

This will auto config SparkSession to `connect_client` mode, via env `SPARGLIM_SQL_MODE`

```python
%load_ext sparglim.sql
from sparglim.config.builder import ConfigBuilder
spark = ConfigBuilder().get_or_create() # No need to config_connect_client(), ConfigBuilder is a Singleton
```

Test it:

```python
%sql SHOW TABLES
```

# TIPS

Any configuration on the client side, such as `spark.sql.repl.eagerEval.enabled=true`, is not effective. So `%sql`(`%%sql`) can't display the dataframe. You can use `df.show()` instead.
49 changes: 48 additions & 1 deletion examples/sparglim-server/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,53 @@
In [Quick Start](../../README.md#quick-start), we start a `local[*]` Spark Connect Server. This example will show how to start a Spark Connect Server on k8s.

# Prepare

## Namespace: `sparglim`

```
kubectl create ns sparglim
```

## Grant authorization

You need to authorize the pod so that it can create pods(executor)

For a simple test, you can grant administrator privileges to all pods using the following command (**DO NOT this in a production environment**)

```
kubectl create clusterrolebinding serviceaccounts-cluster-admin
--clusterrole=cluster-admin
--group=system:serviceaccounts
```

# Apply and access

```
# In project root
kubectl apply -f example/sparglim-server/k8s
```

Check pod is running:

```
$: kubectl get pod -n sparglim
NAME READY STATUS RESTARTS AGE
sparglim-server-5696c9466d-s75bh 1/1 Running 0 86s
spark-connect-server-6c5a798995af404f-exec-1 1/1 Running 0 52s
spark-connect-server-6c5a798995af404f-exec-2 1/1 Running 0 52s
spark-connect-server-6c5a798995af404f-exec-3 1/1 Running 0 52s
```

Access SparkUI:
`http://<master-ip>:30040`


# Connect it with `sparglim`

```python
import os
os.environ["SPARGLIM_REMOTE"] = "sc://localhost:30052"
os.environ["SPARGLIM_REMOTE"] = "sc://<master-ip>:30052" # Also avaliable `export SPARGLIM_REMOTE=sc://<master-ip>:30052` before start python

from sparglim.config.builder import ConfigBuilder
from datetime import datetime, date
Expand Down

0 comments on commit d7efed4

Please sign in to comment.