How to copy from CSV file to PostgreSQL table with headers in CSV file?
我想将CSV文件复制到Postgres表。 这个表中大约有100列,所以如果我不需要,我不想重写它们。
我正在使用
有没有办法导入包含标题的表,就像我想要做的那样?
这很有效。第一行中包含列名。
1 | COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER |
使用Python库
1 2 3 4 5 6 | FROM sqlalchemy import create_engine import pandas AS pd engine = create_engine('postgresql://user:pass@localhost/db_name') df = pd.read_csv('/path/to/csv_file') df.to_sql('pandas_db', engine) |
可以将
终端的替代方案未经许可
NOTES的pg文档
说
The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.
因此,从字面上看,使用
使用客户端权限表达相对路径的唯一方法是使用STDIN,
When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.
记得在这里:
1 2 3 | psql -h remotehost -d remote_mydb -U myuser -c \ "copy mytable (column1, column2) from STDIN with delimiter as ','" \ < ./relative_path/file.csv |
我已经使用这个功能一段时间没有问题。您只需要提供csv文件中的数字列,它将从第一行获取标题名称并为您创建表:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | CREATE OR REPLACE FUNCTION DATA.load_csv_file ( target_table text, -- name of the table that will be created csv_file_path text, col_count INTEGER ) RETURNS void AS $$ DECLARE iter INTEGER; -- dummy integer to iterate columns with col text; -- to keep column names in each iteration col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet BEGIN SET schema 'data'; CREATE TABLE temp_table (); -- add just enough number of columns FOR iter IN 1..col_count loop EXECUTE format ('alter table temp_table add column col_%s text;', iter); END loop; -- copy the data from csv file EXECUTE format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path); iter := 1; col_first := (SELECT col_1 FROM temp_table LIMIT 1); -- update the column names based on the first row which has the column names FOR col IN EXECUTE format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first) loop EXECUTE format ('alter table temp_table rename column col_%s to %s', iter, col); iter := iter + 1; END loop; -- delete the columns row // using quote_ident or %I does not work here!? EXECUTE format ('delete from temp_table where %s = %L', col_first, col_first); -- change the temp table name to the name given as parameter, if not blank IF LENGTH (target_table) > 0 THEN EXECUTE format ('alter table temp_table rename to %I', target_table); END IF; END; $$ LANGUAGE plpgsql; |
您可以使用d6tstack为您创建表,并且比pd.to_sql()更快,因为它使用本机数据库导入命令。它支持Postgres以及MYSQL和MS SQL。
1 2 3 4 | import pandas AS pd df = pd.read_csv('table.csv') uri_psql = 'postgresql+psycopg2://usr:pwd@localhost/db' d6tstack.utils.pd_to_psql(df, uri_psql, 'table') |
在写入db之前,它还可用于导入多个CSV,解决数据模式更改和/或使用pandas进行预处理(例如日期),请参阅示例笔记本中的更多内容
1 2 | d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table') |