Skip to content

Latest commit

 

History

History
79 lines (69 loc) · 2.27 KB

File metadata and controls

79 lines (69 loc) · 2.27 KB

sqldf

Library for quering DataFrames using SQL.

Overview

SQL DataFrame is a library used for querying multiple types of DataFrame using ANSI SQL syntax. DataFrames supported are:

  1. RAW: usually used so the user doesn’t need to have pandas installed. Example of a RAW "DataFrame":
list_of_dictionaries = [{'name': 'Rigo', 'age': 33}, {'name': 'Bruno', 'age': 33}]
  1. pandas DataFrame.

  2. pyspark DataFrame: sqldf uses pyspark DataFrame in the background to process big data. Note, that the DataFrame return will be a pyspark DataFrame

How to use sqldf

from sqldf import sqldf

# RAW DataFrame
inventory = [{'item': 'Banana', 'quantity': 33}, {'item': 'Apple', 'quantity': 2}]
orders = [{'order_number': 1, 'item': 'Banana', 'quantity': 10}, {'order_number': 2, 'item': 'Apple', 'quantity': 10}]

To select data from a DataFrame and also register a table in memory do the following:

print('Inventory:')
inventory_pyspark_df = sqldf.sql(
	"""
	SELECT item,
           quantity AS quantity_available
      FROM inventory_table
	""",
	inventory,
	table='inventory_table')
inventory_pyspark_df.show()

print('Orders:')
orders_pyspark_df = sqldf.sql(
	"""
	SELECT order_number,
           item,
           quantity AS quantity_ordered
      FROM order_table
	""",
	orders,
	table='order_table')
orders_pyspark_df.show()

Since the table has been specified above, the table will be registered in memory. The next time you want to select data from the table jut do the following (Note that we don't specify table or DataFrame):

# Get inventory below quantity of 10 so we can order more of these items.
print('Items low in quantity:')
inventory_low = sqldf.sql(
	"""
	SELECT item,
           quantity AS quantity_low
      FROM inventory_table
     WHERE quantity < {{ quantity }}
	""",
	quantity=10)
inventory_low.show()

Now we are going to get a list of orders that will be able to be fulfilled. You can specify the table name if you want to register the results of this query in memory so it can be used later.

print('Orders with inventory: ')
orders_with_inventory = sqldf.sql(
	"""
	SELECT ot.*
	  FROM inventory_table it
	  JOIN order_table ot
	    ON it.item = ot.item
	 WHERE it.quantity >= ot.quantity
	"""
	)
orders_with_inventory.show()