How to convert SQL Query result to PANDAS Data Structure?
任何有关此问题的帮助将不胜感激。
所以基本上我想对我的SQL数据库运行查询并将返回的数据存储为Pandas数据结构。
我附加了查询代码。
我正在阅读关于Pandas的文档,但是我有问题确定我的查询的返回类型。
我试图打印查询结果,但它没有提供任何有用的信息。
谢谢!!!!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from sqlalchemy import create_engine engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING') connection2 = engine2.connect() dataid = 1022 resoverall = connection2.execute(" SELECT sum(BLABLA) AS BLA, sum(BLABLABLA2) AS BLABLABLA2, sum(SOME_INT) AS SOME_INT, sum(SOME_INT2) AS SOME_INT2, 100*sum(SOME_INT2)/sum(SOME_INT) AS ctr, sum(SOME_INT2)/sum(SOME_INT) AS cpc FROM daily_report_cooked WHERE campaign_id = '%s'", %dataid) |
所以我想知道我的变量"resoverall"的格式/数据类型是什么,以及如何使用PANDAS数据结构。
编辑:2015年3月
如下所述,pandas现在使用SQLAlchemy来读取(read_sql)和插入(to_sql)数据库。以下应该有效
1 2 3 | import pandas as pd df = pd.read_sql(sql, cnxn) |
上一个答案:
通过mikebmassey来自类似的问题
1 2 3 4 5 6 7 8 9 | import pyodbc import pandas.io.sql as psql cnxn = pyodbc.connect(connection_info) cursor = cnxn.cursor() sql ="SELECT * FROM TABLE" df = psql.frame_query(sql, cnxn) cnxn.close() |
这是完成这项工作的最短代码:
1 2 3 | from pandas import DataFrame df = DataFrame(resoverall.fetchall()) df.columns = resoverall.keys() |
你可以像保罗的回答一样更好地解析这些类型。
如果您使用的是SQLAlchemy的ORM而不是表达式语言,您可能会发现自己想要将
最干净的方法是从查询的语句属性中获取生成的SQL,然后使用pandas的
1 | df = pd.read_sql(query.statement, query.session.bind) |
编辑2014-09-30:
pandas现在有一个
原始答案:
我无法帮助你使用SQLAlchemy - 我总是根据需要使用pyodbc,MySQLdb或psychopg2。但是当这样做时,一个像下面那样简单的功能可以满足我的需求:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | import decimal import pydobc import numpy as np import pandas cnn, cur = myConnectToDBfunction() cmd ="SELECT * FROM myTable" cur.execute(cmd) dataframe = __processCursor(cur, dataframe=True) def __processCursor(cur, dataframe=False, index=None): ''' Processes a database cursor with data on it into either a structured numpy array or a pandas dataframe. input: cur - a pyodbc cursor that has just received data dataframe - bool. if false, a numpy record array is returned if true, return a pandas dataframe index - list of column(s) to use as index in a pandas dataframe ''' datatypes = [] colinfo = cur.description for col in colinfo: if col[1] == unicode: datatypes.append((col[0], 'U%d' % col[3])) elif col[1] == str: datatypes.append((col[0], 'S%d' % col[3])) elif col[1] in [float, decimal.Decimal]: datatypes.append((col[0], 'f4')) elif col[1] == datetime.datetime: datatypes.append((col[0], 'O4')) elif col[1] == int: datatypes.append((col[0], 'i4')) data = [] for row in cur: data.append(tuple(row)) array = np.array(data, dtype=datatypes) if dataframe: output = pandas.DataFrame.from_records(array) if index is not None: output = output.set_index(index) else: output = array return output |
MySQL连接器
对于那些使用mysql连接器的人,您可以使用此代码作为开始。 (感谢@Daniel Velkov)
使用的参考:
- 使用Connector / Python查询数据
- 使用Python以3个步骤连接到MYSQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import pandas as pd import mysql.connector # Setup MySQL connection db = mysql.connector.connect( host="<IP>", # your host, usually localhost user="<USER>", # your username password="<PASS>", # your password database="<DATABASE>" # name of the data base ) # You must create a Cursor object. It will let you execute all the queries you need cur = db.cursor() # Use all the SQL you like cur.execute("SELECT * FROM <TABLE>") # Put it all to a data frame sql_data = pd.DataFrame(cur.fetchall()) sql_data.columns = cur.column_names # Close the session db.close() # Show the data print(sql_data.head()) |
这是我使用的代码。希望这可以帮助。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import pandas as pd from sqlalchemy import create_engine def getData(): # Parameters ServerName ="my_server" Database ="my_db" UserPwd ="user:pwd" Driver ="driver=SQL Server Native Client 11.0" # Create the connection engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database +"?" + Driver) sql ="select * from mytable" df = pd.read_sql(sql, engine) return df df2 = getData() print(df2) |
这是对您的问题的简短而清晰的答案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | from __future__ import print_function import MySQLdb import numpy as np import pandas as pd import xlrd # Connecting to MySQL Database connection = MySQLdb.connect( host="hostname", port=0000, user="userID", passwd="password", db="table_documents", charset='utf8' ) print(connection) #getting data from database into a dataframe sql_for_df = 'select * from tabledata' df_from_database = pd.read_sql(sql_for_df , connection) |
只需一起使用
1 2 3 4 5 6 7 8 | import pyodbc import pandas as pd # MSSQL Connection String Example connstr ="Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;" # Query Database and Create DataFrame Using Results df = pd.read_sql("select * from myTable", pyodbc.connect(connstr)) |
我已经将
Pandas喜欢像对象一样创建数据结构,请参阅在线文档
祝你好运sqlalchemy和熊猫。
与Nathan一样,我经常想将sqlalchemy或sqlsoup查询的结果转储到Pandas数据框中。我自己的解决方案是:
1 2 | query = session.query(tbl.Field1, tbl.Field2) DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions]) |
这个问题很老,但我想补充两分钱。我把这个问题读作"我想对我的[my] SQL数据库运行查询,并将返回的数据存储为Pandas数据结构[DataFrame]。"
从代码看起来你的意思是mysql数据库,并假设你的意思是pandas DataFrame。
1 2 3 4 5 6 | import MySQLdb as mdb import pandas.io.sql as sql from pandas import * conn = mdb.connect('<server>','<user>','<pass>','<db>'); df = sql.read_frame('<query>', conn) |
例如,
1 2 | conn = mdb.connect('localhost','myname','mypass','testdb'); df = sql.read_frame('select * from testTable', conn) |
这会将testTable的所有行导入DataFrame。
这是我的。以防您使用"pymysql":
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import pymysql from pandas import DataFrame host = 'localhost' port = 3306 user = 'yourUserName' passwd = 'yourPassword' db = 'yourDatabase' cnx = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db) cur = cnx.cursor() query =""" SELECT * FROM yourTable LIMIT 10""" cur.execute(query) field_names = [i[0] for i in cur.description] get_data = [xx for xx in cur] cur.close() cnx.close() df = DataFrame(get_data) df.columns = field_names |
最好的方式我这样做
1 2 3 | db.execute(query) where db=db_class() #database class mydata=[x for x in db.fetchall()] df=pd.DataFrame(data=mydata) |
pandas.io.sql.write_frame已弃用。
https://pandas.pydata.org/pandas-docs/version/0.15.2/generated/pandas.io.sql.write_frame.html
应该改为使用pandas.DataFrame.to_sql
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
还有另一种解决方案。
PYODBC到Pandas - DataFrame不工作 - 传递值的形状是(x,y),索引暗示(w,z)
截至Pandas 0.12(我相信)你可以这样做:
1 2 3 4 5 6 7 | import pandas import pyodbc sql = 'select * from table' cnn = pyodbc.connect(...) data = pandas.read_sql(sql, cnn) |
在0.12之前,您可以:
1 2 3 4 5 6 7 8 | import pandas from pandas.io.sql import read_frame import pyodbc sql = 'select * from table' cnn = pyodbc.connect(...) data = read_frame(sql, cnn) |
如果结果类型是ResultSet,则应首先将其转换为字典。然后将自动收集DataFrame列。
这适用于我的情况:
1 | df = pd.DataFrame([dict(r) for r in resoverall]) |
1.使用MySQL-connector-python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # pip install mysql-connector-python import mysql.connector import pandas as pd mydb = mysql.connector.connect( host = 'host', user = 'username', passwd = 'pass', database = 'db_name' ) query = 'select * from table_name' df = pd.read_sql(query, con = mydb) print(df) |
2.使用SQLAlchemy
1 2 3 4 5 6 7 8 9 10 11 12 13 | # pip install pymysql # pip install sqlalchemy import pandas as pd import sqlalchemy engine = sqlalchemy.create_engine('mysql+pymysql://username:password@localhost:3306/db_name') query = ''' select * from table_name ''' df = pd.read_sql_query(query, engine) print(df) |
很长一段时间从上一篇文章,但也许??它有助于某人
比保罗H短道:
1 2 | my_dic = session.query(query.all()) my_df = pandas.DataFrame.from_dict(my_dic) |