Python 2.7 and GCP Google BigQuery: extracts - compression not working
我正在使用 python 2.7(现在无法更改)和 google.cloud.bigquery 的 Google python 客户端库 v0.28,并且 compression="GZIP" 或"NONE" 参数/设置没有似乎对我有用,其他人可以试试这个并告诉我它是否对他们有用吗?
在下面的代码中,您可以看到我一直在使用它,但每次在 GCS 上,我的文件似乎都未压缩,无论我使用什么进行压缩。
注意:我的导入是针对更大的代码集,此代码段并非全部需要
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | from pandas.io import gbq import google.auth from google.cloud import bigquery from google.cloud.exceptions import NotFound from google.cloud.bigquery import LoadJobConfig from google.cloud.bigquery import Table import json import re from google.cloud import storage bigquery_client = bigquery.Client(project=project) dataset_ref = bigquery_client.dataset(dataset_name) table_ref = dataset_ref.table(table_name) job_id_prefix ="bqTools_export_job" job_config = bigquery.LoadJobConfig() # default is"," if field_delimiter: job_config.field_delimiter = field_delimiter # default is true if print_header: job_config.print_header = print_header # CSV, NEWLINE_DELIMITED_JSON, or AVRO if destination_format: job_config.destination_format = destination_format # GZIP or NONE if compression: job_config.compression = compression job_config.Compression ="GZIP" job_config.compression ="GZIP" job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix) # job.begin() job.result() # Wait for job to complete returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination) |
相关链接:
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract.compression
https://googlecloudplatform.github.io/google-cloud-python/latest/_modules/google/cloud/bigquery/job.html
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract.compression
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/api/export_data_to_cloud_storage.py
我确定我在做一些愚蠢的事情,谢谢你的帮助...Rich
在下面编辑
为了分享,我认为我们的最终代码将是……Rich
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | # export a table from bq into a file on gcs, # the destination should look like the following, with no brackets {} # gs://{bucket-name-here}/{file-name-here} def export_data_to_gcs(dataset_name, table_name, destination, field_delimiter=",", print_header=None, destination_format="CSV", compression="GZIP", project=None): try: bigquery_client = bigquery.Client(project=project) dataset_ref = bigquery_client.dataset(dataset_name) table_ref = dataset_ref.table(table_name) job_id_prefix ="bqTools_export_job" job_config = bigquery.ExtractJobConfig() # default is"," if field_delimiter: job_config.field_delimiter = field_delimiter # default is true if print_header: job_config.print_header = print_header # CSV, NEWLINE_DELIMITED_JSON, or AVRO if destination_format: job_config.destination_format = destination_format # GZIP or NONE if compression: job_config.compression = compression # if it should be compressed, make sure there is a .gz on the filename, add if needed if compression =="GZIP": if destination.lower()[-3:] !=".gz": destination = str(destination) +".gz" job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix) # job.begin() job.result() # Wait for job to complete returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination) return returnMsg except Exception as e: errorStr = 'ERROR (export_data_to_gcs): ' + str(e) print(errorStr) raise |
对于表格提取,您应该使用 ExtractJobConfig