-
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
180 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Spark Configuration | ||
|
||
The exact explanation and defaults for spark config can be found [here](https://spark.apache.org/docs/latest/configuration.html), `None` means to use the spark native defaults | ||
|
||
# Config PySpark Session via environment variables | ||
|
||
> Generated by [generate-config-docs.py](./generate_config_docs.py) | ||
> Run `python ./generate_config_docs.py` to update this file | ||
{docs} | ||
|
||
# TIPS | ||
|
||
S3 secrets tokens(and others) need only be configured on the `Driver` or `Connect Server`, Configuration in `Connect client` take no effort. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,90 @@ | ||
It will be ready as soon as possible after Relase 0.1.0, until then you can refer to the source code file [sparglim/config/configer.py](sparglim/config/configer.py) | ||
# Spark Configuration | ||
|
||
The exact explanation and defaults for spark config can be found [here](https://spark.apache.org/docs/latest/configuration.html), `None` means to use the spark native defaults | ||
|
||
TODO: Generate avaliable envs for config spark session | ||
# Config PySpark Session via environment variables | ||
|
||
> Generated by [generate-config-docs.py](./generate_config_docs.py) | ||
> Run `python ./generate_config_docs.py` to update this file | ||
Source code: sparglim/config/configer.py | ||
|
||
Avaliable environment variables for SparkEnvConfiger: | ||
|
||
Default config: | ||
|
||
- `SPAGLIM_APP_NAME`: `spark.app.name`, default: `Sparglim`. | ||
- `SPAGLIM_DEPLOY_MODE`: `spark.submit.deployMode`, default: `client`. | ||
- `SPARGLIM_SCHEDULER_MODE`: `spark.scheduler.mode`, default: `FAIR`. | ||
- `SPARGLIM_UI_PORT`: `spark.ui.port`, default: `None`. | ||
- `S3_ACCESS_KEY` or `AWS_ACCESS_KEY_ID`: `spark.hadoop.fs.s3a.access.key`, default: `None`. | ||
- `S3_SECRET_KEY` or `AWS_SECRET_ACCESS_KEY`: `spark.hadoop.fs.s3a.secret.key`, default: `None`. | ||
- `S3_ENTRY_POINT`: `spark.hadoop.fs.s3a.endpoint`, default: `None`. | ||
- `S3_ENTRY_POINT_REGION` or `AWS_DEFAULT_REGION`: `spark.hadoop.fs.s3a.endpoint.region`, default: `None`. | ||
- `S3_PATH_STYLE_ACCESS`: `spark.hadoop.fs.s3a.path.style.access`, default: `None`. | ||
- `S3_MAGIC_COMMITTER`: `spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled`, default: `None`. | ||
- `SPARGIM_KERBEROS_KEYTAB`: `spark.kerberos.keytab`, default: `None`. | ||
- `SPARGIM_KERBEROS_PRINCIPAL`: `spark.kerberos.principal`, default: `None`. | ||
|
||
`config_basic()` can config following: | ||
|
||
- `SPAGLIM_APP_NAME`: `spark.app.name`, default: `Sparglim`. | ||
- `SPAGLIM_DEPLOY_MODE`: `spark.submit.deployMode`, default: `client`. | ||
- `SPARGLIM_SCHEDULER_MODE`: `spark.scheduler.mode`, default: `FAIR`. | ||
- `SPARGLIM_UI_PORT`: `spark.ui.port`, default: `None`. | ||
|
||
`config_s3()` can config following: | ||
|
||
- `S3_ACCESS_KEY` or `AWS_ACCESS_KEY_ID`: `spark.hadoop.fs.s3a.access.key`, default: `None`. | ||
- `S3_SECRET_KEY` or `AWS_SECRET_ACCESS_KEY`: `spark.hadoop.fs.s3a.secret.key`, default: `None`. | ||
- `S3_ENTRY_POINT`: `spark.hadoop.fs.s3a.endpoint`, default: `None`. | ||
- `S3_ENTRY_POINT_REGION` or `AWS_DEFAULT_REGION`: `spark.hadoop.fs.s3a.endpoint.region`, default: `None`. | ||
- `S3_PATH_STYLE_ACCESS`: `spark.hadoop.fs.s3a.path.style.access`, default: `None`. | ||
- `S3_MAGIC_COMMITTER`: `spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled`, default: `None`. | ||
|
||
`config_kerberos()` can config following: | ||
|
||
- `SPARGIM_KERBEROS_KEYTAB`: `spark.kerberos.keytab`, default: `None`. | ||
- `SPARGIM_KERBEROS_PRINCIPAL`: `spark.kerberos.principal`, default: `None`. | ||
|
||
`config_local()` can config following: | ||
|
||
- `SPARGLIM_MASTER`: `spark.master`, default: `local[*]`. | ||
- `SPARGLIM_LOCAL_MEMORY`: `spark.driver.memory`, default: `512m`. | ||
|
||
`config_connect_client()` can config following: | ||
|
||
- `SPARGLIM_REMOTE`: `spark.remote`, default: `sc://localhost:15002`. | ||
|
||
`config_connect_server()` can config following: | ||
|
||
- `SPARGLIM_CONNECT_SERVER_PORT`: `spark.connect.grpc.binding.port`, default: `None`. | ||
- `SPARGLIM_CONNECT_GRPC_ARROW_MAXBS`: `spark.connect.grpc.arrow.maxBatchSize`, default: `None`. | ||
- `SPARGLIM_CONNECT_GRPC_MAXIM`: `spark.connect.grpc.maxInboundMessageSize`, default: `None`. | ||
|
||
`config_k8s()` can config following: | ||
|
||
- `SPARGLIM_MASTER`: `spark.master`, default: `k8s://https://kubernetes.default.svc`. | ||
- `SPARGLIM_K8S_NAMESPACE`: `spark.kubernetes.namespace`, default: `None`. | ||
- `SPARGLIM_K8S_IMAGE`: `spark.kubernetes.container.image`, default: `wh1isper/spark-executor:3.4.1`. | ||
- `SPARGLIM_K8S_IMAGE_PULL_SECRETS`: `spark.kubernetes.container.image.pullSecrets`, default: `None`. | ||
- `SPARGLIM_K8S_IMAGE_PULL_POLICY`: `spark.kubernetes.container.image.pullPolicy`, default: `IfNotPresent`. | ||
- `SPARK_EXECUTOR_NUMS`: `spark.executor.instances`, default: `3`. | ||
- `SPARGLIM_K8S_EXECUTOR_LABEL_LIST`: `spark.kubernetes.executor.label.*`, default: `sparglim-executor`. A string seperated by `,` will be converted | ||
- `SPARGLIM_K8S_EXECUTOR_ANNOTATION_LIST`: `spark.kubernetes.executor.annotation.*`, default: `sparglim-executor`. A string seperated by `,` will be converted | ||
- `SPARGLIM_DRIVER_HOST`: `spark.driver.host`, default: `None`. | ||
- `SPARGLIM_DRIVER_BINDADDRESS`: `spark.driver.bindAddress`, default: `0.0.0.0`. | ||
- `SPARGLIM_DRIVER_POD_NAME`: `spark.kubernetes.driver.pod.name`, default: `None`. | ||
- `SPARGLIM_K8S_EXECUTOR_REQUEST_CORES`: `spark.kubernetes.executor.cores`, default: `None`. | ||
- `SPARGLIM_K8S_EXECUTOR_LIMIT_CORES`: `spark.kubernetes.executor.limit.cores`, default: `None`. | ||
- `SPARGLIM_EXECUTOR_REQUEST_MEMORY`: `spark.executor.memory`, default: `512m`. | ||
- `SPARGLIM_EXECUTOR_LIMIT_MEMORY`: `spark.executor.memoryOverhead`, default: `None`. | ||
- `SPARGLIM_K8S_GPU_VENDOR`: `spark.executor.resource.gpu.vendor`, default: `nvidia.com`. | ||
- `SPARGLIM_K8S_GPU_DISCOVERY_SCRIPT`: `spark.executor.resource.gpu.discoveryScript`, default: `/opt/spark/examples/src/main/scripts/getGpusResources.sh`. | ||
- `SPARGLIM_K8S_GPU_AMOUNT`: `spark.executor.resource.gpu.amount`, default: `None`. | ||
- `SPARGLIM_RAPIDS_SQL_ENABLED`: `spark.rapids.sql.enabled`, default: `None`. | ||
|
||
|
||
# TIPS | ||
|
||
S3 secrets tokens(and others) need only be configured on the `Driver` or `Connect Server`, Configuration in `Connect client` take no effort. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
#!/usr/bin/env python | ||
import os | ||
from pathlib import Path | ||
|
||
from sparglim.config.configer import SparkEnvConfiger | ||
|
||
_HERE = Path(os.path.abspath(__file__)).parent | ||
TEMPLATE_PATH = _HERE / "config-template.md" | ||
OUTPUT_PATH = _HERE / "config.md" | ||
|
||
|
||
def _generate_env_config_docs(config: dict) -> str: | ||
docs = "" | ||
for spark_config, (env, default) in config.items(): | ||
annotations = "" | ||
if isinstance(env, list): | ||
env = "` or `".join(env) | ||
if env.endswith("_LIST"): | ||
spark_config = spark_config.replace("list", "*") | ||
annotations = " A string seperated by `,` will be converted" | ||
docs += f"- `{env}`: `{spark_config}`, default: `{default}`.{annotations}\n" | ||
return docs | ||
|
||
|
||
def generate_docs(target_configer_cls) -> str: | ||
docs = "" | ||
|
||
source_code_path = target_configer_cls.__module__.replace(".", "/") + ".py" | ||
docs += f"Source code: {source_code_path}\n\n" | ||
|
||
docs += f"Avaliable environment variables for {target_configer_cls.__name__}:\n\n" | ||
|
||
docs += f"Default config:\n\n" | ||
docs += _generate_env_config_docs(target_configer_cls.default_config_mapper) | ||
|
||
items = target_configer_cls.__dict__.items() | ||
config_map = {k: v for k, v in items if k.startswith("_") and isinstance(v, dict)} | ||
for config_suffix, config in config_map.items(): | ||
docs += f"\n`config{config_suffix}()` can config following:\n\n" | ||
docs += _generate_env_config_docs(config) | ||
|
||
return docs | ||
|
||
|
||
def generate_from_template(docs: str): | ||
print(f"Generating docs... {TEMPLATE_PATH.as_posix()} -> {OUTPUT_PATH.as_posix()}") | ||
print(docs) | ||
template = TEMPLATE_PATH.read_text() | ||
template = template.format(docs=docs) | ||
OUTPUT_PATH.write_text(template) | ||
|
||
|
||
if __name__ == "__main__": | ||
docs = generate_docs(SparkEnvConfiger) | ||
generate_from_template(docs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters