Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1085] [Bug] When using iceberg format, dbt docs generate is unable to populate the columns information #968

Open
2 tasks done
shishircc opened this issue Jan 3, 2024 · 1 comment
Labels
bug Something isn't working help_wanted Extra attention is needed iceberg Related to the iceberg file format

Comments

@shishircc
Copy link

shishircc commented Jan 3, 2024

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When using iceberg table format, dbt docs generate creates empty catalog.json and hence provides no column information in documentation

Expected Behavior

DBT docs generate should generate properly populated catalog.json

Steps To Reproduce

  1. Configure EMR for working with iceberg and glue catalog
  2. Setup thrift server
  3. Run dbt project on EMR using thrift server
  4. Run dbt docs generate

Relevant log output

0m02:50:24.631713 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa619c86bb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617c572b0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617c57a60>]}


============================== 02:50:24.638185 | c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35 ==============================
[0m02:50:24.638185 [info ] [MainThread]: Running with dbt=1.7.4
[0m02:50:24.639539 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'False', 'fail_fast': 'False', 'log_path': '/home/ec2-user/environment/dbtproject/dags/dbt_blueprint/c360-datalake/logs', 'version_check': 'True', 'profiles_dir': '/home/ec2-user/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'invocation_command': 'dbt docs generate --vars {"day": "31","hour": "0","month": "12","raw_bucket":"c360-raw-data-*****-us-east-1","ts": "2023-12-31T00:00:00+00:00","year": "2023"}', 'introspect': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
[0m02:50:24.958174 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617bece50>]}
[0m02:50:25.203736 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6177c2af0>]}
[0m02:50:25.204633 [info ] [MainThread]: Registered adapter: spark=1.7.0
[0m02:50:25.223511 [debug] [MainThread]: checksum: 577537e0073da8fb99e9f3abffc643b153c4ab719d0d0e1e2dce7637653d4e74, vars: {'day': '31',
 'hour': '0',
 'month': '12',
 'raw_bucket': 'c360-raw-data-********-us-east-1',
 'ts': '2023-12-31T00:00:00+00:00',
 'year': '2023'}, profile: , target: , version: 1.7.4
[0m02:50:25.260540 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
[0m02:50:25.261176 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing
[0m02:50:25.269396 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6175a6f10>]}
[0m02:50:25.272216 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6176cb2b0>]}
[0m02:50:25.272952 [info ] [MainThread]: Found 7 models, 6 sources, 0 exposures, 0 metrics, 439 macros, 0 groups, 0 semantic models
[0m02:50:25.273827 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6176cb2e0>]}
[0m02:50:25.276742 [info ] [MainThread]: 
[0m02:50:25.278117 [debug] [MainThread]: Acquiring new spark connection 'master'
[0m02:50:25.280561 [debug] [ThreadPool]: Acquiring new spark connection 'list_None_c360bronze'
[0m02:50:25.295222 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
[0m02:50:25.295972 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.296507 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show table extended in c360bronze like '*'
  
[0m02:50:25.296985 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m02:50:25.447602 [debug] [ThreadPool]: Spark adapter: Poll response: TGetOperationStatusResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), operationState=5, sqlState=None, errorCode=0, errorMessage='org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;\nShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]\n+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]\n\n\tat org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)\n\tat 
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)\n\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:226)\n\t... 16 more\n', taskStatus=None, operationStarted=None, operationCompleted=None, hasResultSet=None, progressUpdateResponse=None)
[0m02:50:25.448538 [debug] [ThreadPool]: Spark adapter: Poll status: 5
[0m02:50:25.449121 [debug] [ThreadPool]: Spark adapter: Error while running:
/* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show table extended in c360bronze like '*'
  
[0m02:50:25.449863 [debug] [ThreadPool]: Spark adapter: Database Error
  org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
  ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
  +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
  
  	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)
  	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:261)

  Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
  ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
  +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
  
  	at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:2040)
  	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:224)
  	... 16 more
  
[0m02:50:25.450777 [debug] [ThreadPool]: Spark adapter: Error while running:
macro list_relations_without_caching
[0m02:50:25.451525 [debug] [ThreadPool]: Spark adapter: Runtime Error
  Database Error
    org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
    ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
    +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
    
    	at java.lang.Thread.run(Thread.java:750)
    Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
    ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
    +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
    
    	at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:2040)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:224)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1$adapted(CheckAnalysis.scala:163)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:338)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:226)
    	... 16 more
    
[0m02:50:25.457505 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.458084 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show tables in c360bronze like '*'
  
[0m02:50:25.697421 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:25.698186 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:25.708726 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.709422 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_clickstream
  
[0m02:50:25.895379 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:25.896238 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:25.905061 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.905728 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_clickstream2
  
[0m02:50:26.128223 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.128935 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.139356 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.140353 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__cart_items
  
[0m02:50:26.370865 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.371741 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.381025 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.381722 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__customer
  
[0m02:50:26.572163 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.573853 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.584021 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.584680 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__order_items
  
[0m02:50:26.783624 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.784357 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.795421 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.796071 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__product
  
[0m02:50:26.987528 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.988230 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.996066 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.996669 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__product_rating
  
[0m02:50:27.228290 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:27.229005 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:27.237495 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:27.238161 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_supportdb__support_chat
  
[0m02:50:27.439212 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:27.439901 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:27.445604 [debug] [ThreadPool]: On list_None_c360bronze: ROLLBACK
[0m02:50:27.446268 [debug] [ThreadPool]: Spark adapter: NotImplemented: rollback
[0m02:50:27.446799 [debug] [ThreadPool]: On list_None_c360bronze: Close
[0m02:50:27.570900 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617832d60>]}
[0m02:50:27.572181 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
[0m02:50:27.573155 [info ] [MainThread]: 
[0m02:50:27.576111 [debug] [Thread-1  ]: Began running node model.c360.stg_clickstream
[0m02:50:27.578024 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly list_None_c360bronze, now model.c360.stg_clickstream)
[0m02:50:27.578766 [debug] [Thread-1  ]: Began compiling node model.c360.stg_clickstream
[0m02:50:27.603421 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_clickstream"
[0m02:50:27.604594 [debug] [Thread-1  ]: Timing info for model.c360.stg_clickstream (compile): 02:50:27.579148 => 02:50:27.604210
[0m02:50:27.605277 [debug] [Thread-1  ]: Began executing node model.c360.stg_clickstream
[0m02:50:27.606030 [debug] [Thread-1  ]: Timing info for model.c360.stg_clickstream (execute): 02:50:27.605629 => 02:50:27.605653
[0m02:50:27.607610 [debug] [Thread-1  ]: Finished running node model.c360.stg_clickstream
[0m02:50:27.608426 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__cart_items
[0m02:50:27.609975 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_clickstream, now model.c360.stg_salesdb__cart_items)
[0m02:50:27.610700 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__cart_items
[0m02:50:27.619178 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__cart_items"
[0m02:50:27.620232 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__cart_items (compile): 02:50:27.611076 => 02:50:27.619876
[0m02:50:27.620860 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__cart_items
[0m02:50:27.621648 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__cart_items (execute): 02:50:27.621216 => 02:50:27.621229
[0m02:50:27.624942 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__cart_items
[0m02:50:27.625894 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__customer
[0m02:50:27.627310 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__cart_items, now model.c360.stg_salesdb__customer)
[0m02:50:27.628163 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__customer
[0m02:50:27.635786 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__customer"
[0m02:50:27.636779 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__customer (compile): 02:50:27.628650 => 02:50:27.636440
[0m02:50:27.637526 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__customer
[0m02:50:27.638335 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__customer (execute): 02:50:27.637949 => 02:50:27.637961
[0m02:50:27.639622 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__customer
[0m02:50:27.640276 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__order_items
[0m02:50:27.641442 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__customer, now model.c360.stg_salesdb__order_items)
[0m02:50:27.642196 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__order_items
[0m02:50:27.650906 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__order_items"
[0m02:50:27.652276 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__order_items (compile): 02:50:27.642683 => 02:50:27.651808
[0m02:50:27.653043 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__order_items
[0m02:50:27.653692 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__order_items (execute): 02:50:27.653397 => 02:50:27.653410
[0m02:50:27.655082 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__order_items
[0m02:50:27.655742 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__product
[0m02:50:27.656697 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__order_items, now model.c360.stg_salesdb__product)
[0m02:50:27.657630 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__product
[0m02:50:27.744326 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__product"
[0m02:50:27.745610 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product (compile): 02:50:27.658152 => 02:50:27.745075
[0m02:50:27.746830 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__product
[0m02:50:27.747641 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product (execute): 02:50:27.747323 => 02:50:27.747337
[0m02:50:27.749546 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__product
[0m02:50:27.750300 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__product_rating
[0m02:50:27.751857 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__product, now model.c360.stg_salesdb__product_rating)
[0m02:50:27.752630 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__product_rating
[0m02:50:27.760353 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__product_rating"
[0m02:50:27.761355 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product_rating (compile): 02:50:27.753148 => 02:50:27.761013
[0m02:50:27.762149 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__product_rating
[0m02:50:27.762897 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product_rating (execute): 02:50:27.762503 => 02:50:27.762526
[0m02:50:27.764336 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__product_rating
[0m02:50:27.765582 [debug] [Thread-1  ]: Began running node model.c360.stg_supportdb__support_chat
[0m02:50:27.768105 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__product_rating, now model.c360.stg_supportdb__support_chat)
[0m02:50:27.768876 [debug] [Thread-1  ]: Began compiling node model.c360.stg_supportdb__support_chat
[0m02:50:27.776378 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_supportdb__support_chat"
[0m02:50:27.777509 [debug] [Thread-1  ]: Timing info for model.c360.stg_supportdb__support_chat (compile): 02:50:27.769323 => 02:50:27.777059
[0m02:50:27.778705 [debug] [Thread-1  ]: Began executing node model.c360.stg_supportdb__support_chat
[0m02:50:27.779701 [debug] [Thread-1  ]: Timing info for model.c360.stg_supportdb__support_chat (execute): 02:50:27.779171 => 02:50:27.779367
[0m02:50:27.781293 [debug] [Thread-1  ]: Finished running node model.c360.stg_supportdb__support_chat
[0m02:50:27.782634 [debug] [MainThread]: Connection 'master' was properly closed.
[0m02:50:27.783134 [debug] [MainThread]: Connection 'model.c360.stg_supportdb__support_chat' was properly closed.
[0m02:50:27.785129 [debug] [MainThread]: Command end result
[0m02:50:27.800011 [debug] [MainThread]: Acquiring new spark connection 'generate_catalog'
[0m02:50:27.800584 [info ] [MainThread]: Building catalog
[0m02:50:27.804828 [debug] [ThreadPool]: Acquiring new spark connection 'spark_catalog.c360raw'
[0m02:50:27.805619 [debug] [ThreadPool]: On "spark_catalog.c360raw": cache miss for schema ".spark_catalog.c360raw", this is inefficient
[0m02:50:27.811565 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
[0m02:50:27.812110 [debug] [ThreadPool]: Using spark connection "spark_catalog.c360raw"
[0m02:50:27.812600 [debug] [ThreadPool]: On spark_catalog.c360raw: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "spark_catalog.c360raw"} */
show table extended in spark_catalog.c360raw like '*'
  
[0m02:50:27.813343 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m02:50:30.996262 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:30.997081 [debug] [ThreadPool]: SQL status: OK in 3.0 seconds
[0m02:50:31.017468 [debug] [ThreadPool]: While listing relations in database=, schema=spark_catalog.c360raw, found: cart_items, customer, order_items, product, product_rating, simulation, support_chat
[0m02:50:31.018414 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.cart_items
[0m02:50:31.019253 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.customer
[0m02:50:31.020171 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.order_items
[0m02:50:31.020980 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.product
[0m02:50:31.021837 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.product_rating
[0m02:50:31.022754 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.simulation
[0m02:50:31.023464 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.support_chat
[0m02:50:31.030727 [debug] [ThreadPool]: On spark_catalog.c360raw: ROLLBACK
[0m02:50:31.032244 [debug] [ThreadPool]: Spark adapter: NotImplemented: rollback
[0m02:50:31.034169 [debug] [ThreadPool]: On spark_catalog.c360raw: Close
[0m02:50:31.153712 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly spark_catalog.c360raw, now c360bronze)
[0m02:50:31.154930 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_clickstream
[0m02:50:31.156753 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_clickstream2
[0m02:50:31.159402 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__cart_items
[0m02:50:31.160427 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__customer
[0m02:50:31.161208 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__order_items
[0m02:50:31.161775 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__product
[0m02:50:31.162315 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__product_rating
[0m02:50:31.162877 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_supportdb__support_chat
[0m02:50:31.191842 [info ] [MainThread]: Catalog written to /home/ec2-user/environment/dbtproject/dags/dbt_blueprint/c360-datalake/target/catalog.json
[0m02:50:31.195884 [debug] [MainThread]: Resource report: {"command_name": "generate", "command_success": true, "command_wall_clock_time": 6.6277905, "process_user_time": 3.394287, "process_kernel_time": 0.149442, "process_mem_max_rss": "104176", "process_out_blocks": "4960", "process_in_blocks": "0"}
[0m02:50:31.198437 [debug] [MainThread]: Command `dbt docs generate` succeeded at 02:50:31.198171 after 6.63 seconds
[0m02:50:31.199040 [debug] [MainThread]: Connection 'generate_catalog' was properly closed.
[0m02:50:31.199666 [debug] [MainThread]: Connection 'c360bronze' was properly closed.
[0m02:50:31.200187 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa619c86bb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6179c4370>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6177c2af0>]}
[0m02:50:31.201812 [debug] [MainThread]: Flushing usage events

Environment

- OS:Amazon linux {"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.7.4", "generated_at": "2024-01-02T02:39:55.359210Z", "invocation_id": "4f9b9ed4-e962-49bf-8329-df43b335419a", "env": {}}, "nodes": {}, "sources": {}, "errors": null}
- Python: 3.10
- dbt-core: 1.7.4
- dbt-spark: 1.7.0

Additional Context

This is the catalog.json generated by dbt docs generate ...
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.7.4", "generated_at": "2024-01-02T02:39:55.359210Z", "invocation_id": "4f9b9ed4-e962-49bf-8329-df43b335419a", "env": {}}, "nodes": {}, "sources": {}, "errors": null}

@shishircc shishircc added bug Something isn't working triage labels Jan 3, 2024
@github-actions github-actions bot changed the title [Bug] When using iceberg format doc generate misses the columns information [ADAP-1085] [Bug] When using iceberg format doc generate misses the columns information Jan 3, 2024
@shishircc shishircc changed the title [ADAP-1085] [Bug] When using iceberg format doc generate misses the columns information [ADAP-1085] [Bug] When using iceberg format docs generate is unable to populate the columns information Jan 3, 2024
@shishircc shishircc changed the title [ADAP-1085] [Bug] When using iceberg format docs generate is unable to populate the columns information [ADAP-1085] [Bug] When using iceberg format, dbt docs generate is unable to populate the columns information Jan 3, 2024
@shishircc
Copy link
Author

shishircc commented Jan 4, 2024

emr_dag_automation_blueprint.py.txt
Attached DAG can be used to create the EMR with the above configuration

Here is the requirements.txt

--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt"

apache-airflow==2.6.3
apache-airflow-providers-salesforce
apache-airflow-providers-apache-spark
apache-airflow-providers-amazon
apache-airflow-providers-postgres
apache-airflow-providers-mongo
apache-airflow-providers-ssh
apache-airflow-providers-common-sql
astronomer-cosmos
boto3
simplejson
pymongo
pymssql
smart-open
psycopg2==2.9.5
simple-salesforce

@dbeatty10 dbeatty10 added the iceberg Related to the iceberg file format label Feb 7, 2024
@Fleid Fleid added help_wanted Extra attention is needed and removed triage labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Extra attention is needed iceberg Related to the iceberg file format
Projects
None yet
Development

No branches or pull requests

3 participants