Relational Algebra

Lab Materials

This lab will use the following data:

Relation 1

Name Age Gender
Lindsay 16 Female
Tom 21 Male
Eric 33 Male
Marco 13 Male
Kristopher 45 Male
Kelly 21 Female
Mitchell 24 Male
Tammy 30 Female
Brad 18 Male

Relation 2

Name Make
Lindsay Honda
Lindsay Yamaha
Tom Honda
Eric Honda
Eric Yamaha
Marco Honda
Marco Yamaha
Marco Kawasaki
Marco Suzuki
Marco KTM
Mitchell Honda
Mitchell Yamaha
Kelly Honda
Kristopher Honda
Kristopher Yamaha
Kristopher Kawasaki
Tammy Honda
Tammy Yamaha
Brad Honda
Brad Yamaha

Relation 3

Track Type Bike
Carters Racetrack dirt Honda
Carters Racetrack dirt Yamaha
RaceTrack Empire asphalt Honda
RaceTrack Empire asphalt Kawasaki
Dirt Roads dirt Honda
Dirt Roads dirt Kawasaki
Dirt Roads dirt KTM
Dirt Roads dirt Suzuki
Speedy Pavement asphalt Honda
Speedy Pavement asphalt KTM
Speedy Pavement asphalt Yamaha
Expressways asphalt Honda
Expressways asphalt Yamaha
Expressways asphalt Kawasaki
Expressways asphalt KTM
Expressways asphalt Suzuki
Georgia ATV Trails dirt Honda
Georgia ATV Trails dirt KTM
Georgia ATV Trails dirt Suzuki

Lab Content

Objective

Students will implement a basic relational algebra system using Python.

Procedure

Using Built-In Python Data Types to Create Relations

  1. We can represent a relation using Python built-in data structures such as lists and tuples. For example:

    people = set([('Lindsay', 16, 'female'),
                  ('Tom', 21, 'male'),
                  ('Eric', 33, 'male'),
                  ('Marco', 13, 'male'),
                  ('Kristopher', 45, 'male'),
                  ('Kelly', 21,     'female'),
                  ('Mitchell', 24, 'male'),
                  ('Tammy', 30, 'female'),
                  ('Brad', 18, 'male '),
                 ])
    
Create a representation each of the relations provided using a set of tuples.

Implementing Relational Algebra Operators - Set operations

  1. In addition to the three relations you have already created make a fourth relation called other_people:

    other_people = set([('Lindsay', 16, 'female'),
                        ('Tom', 21, 'male'),
                        ('Eric', 33, 'male'),
                        ('Marco', 13, 'male'),
                        ('Daniel', 36, 'male'),
                       ])
    
  2. Because we have created our relation data structure from the Python built in set type we can take advantage of the existing set operations for our implementation of relational algebra. We can easily take the intersection, union and difference of our relations:

    print(people.intersection(other_people))
    print(people.union(other_people))
    print(people - other_people)
    print(other_people - people)
    
  3. It is important to note that the set operations have no knowledge of the meaning behind our data. We must ensure that the operations we perform make sense. For example what is the meaning of the following operation:

    print(people.union(bikes))
    

    This operation returns all of the rows of people and bikes unioned in to a single set. This result is meaningless and set operations are typically performed only when the relations being acted on have the same columns.

Implementing Relational Algebra Operators - Selection

  1. Once we have created our relations we can use a loop to perform selections on our data. For example we can create a list of all of the riders under 22 using the following:

    under_22 = []
    for rider in people:
        if rider[1] < 22:
            under_22.append(rider)
    

    We can create queries such as these more succinctly using List Comprehensions:

    under_22 = [rider for rider in people if rider[1] < 22]
    
  2. Create a list comprehension that provide the following:

    1. Tuples from relation 1 that include females over the age of 17
    2. Tuples from relation 2 that include riders who ride a Yamaha?
    3. Tuples from relation 3 that include race tracks allowing KTM bikes?

Implementing Relational Algebra Operators - Cross Product

  1. We can implement other relational algebra operations using the set of tuples relation data structure. Lets implement the cross product next. The cross product pairs each row of a relation with every row of another relation to create a new relation that contains every possible combination of the input relations tuples.

    We can create a new function that implements the cross product using a nested for loop:

    def crossproduct(a, b):
        newrelation = []
        for row in a:
            for r in b:
                newrelation.append(tuple([item for item in row] + [item for item in r]))
        return set(newrelation)
    
  2. To test our cross product function take the cross product using the following example:

    result = crossproduct(people, bike)
    
  3. Print the result and see if it is what you expected. How many tuples (rows) are in the relation that results from the cross product of people and bike? (Hint: It may be difficulty to count... use the built in len function!)

  4. How many rows did the relation have? How many rows should we have expected? How can we determine the number of rows that will results from a cross product without performing the cross product?

  5. We can use the cross product to solve more complex queries than we could with only selection. We can now combine tables using the cross product and then select the tuples that we want. Lets find all of the riders who ride Honda:

    names = set([row for row in crossproduct(people, bike) if row[0] == row[3] and row[4] == 'Honda'])
    

    Print names and review the results. What do you notice about the length or names and items it contains? Could we get just a list of the names?

  6. Write a query that will find the answers to the following questions:

    1. Find the make of the bikes driven by females over twenty.
    2. Find all of the males younger than twenty-five who drive a Suzuki or a Kawasaki.

Implementing Relational Algebra Operators - Derived Operators

  1. In step five of the previous section we performed the following query that involved the cross product:

    names = set([row for row in crossproduct(people, bike) if row[0] == row[3] and row[4] == 'Honda'])
    

    Doing this query inadvertently utilized a derived relational algebra operation. Which derived operation was it? What other derived operations could we utilize with just our implementation of the cross product and the list comprehension as selection?

Implementing Relational Algebra Operators - Project

  1. Some of our operations, especially cross product, result in new relations that can have many rows and columns. It will become difficult to write list comprehension based queries the larger the relations become. We can use a slightly more complex version of the tuple called a namedtuple provided by the collections standard library module to begin to create relations that offer a better interface and easier management of columns. The named tuple will also allow us to implement the project relational operation.

  2. Use the following example as a guide to help you create a namedtuple for relation 2 and relation 3:

    from collections import namedtuple
    Person = namedtuple('Person', 'name age gender')
    people = set([Person('Lindsay', 16, 'female'),
                  Person('Tom', 21, 'male'),
                  Person('Eric', 33, 'male'),
                  Person('Marco', 13, 'male'),
                  Person('Kristopher', 45, 'male'),
                  Person('Kelly', 21,       'female'),
                  Person('Mitchell', 24, 'male'),
                  Person('Tammy', 30, 'female'),
                  Person('Brad', 18, 'male '),
                 ])
    
  3. After your tables are created implement the project operation:

    def project(relation, columns):
        """Relational algebra project operator
    
        >>> from collections import namedtuple
        >>> Person = namedtuple('Person', 'name age gender')
        >>> people = set([Person('Lindsay', 16, 'female'),
             Person('Tom', 21, 'male'), Person('Eric', 33, 'male')])
        >>> project(people, 'name')
        set([('Lindsay',), ('Tom',), ('Eric',)])
    
        """
        return set([tuple(getattr(row, column) for column in columns.split())
                    for row in relation])
    
  4. Perform a projection on relation 1. Project only the name column:

    print(project(people, 'name'))
    

    Examine the results. Take not of your observations. What type of object is returned by our project function? Is getting this type of object returned a good or a bad thing for our relational algebra implementation?

  5. Write a query that will find the answers to the following questions:

    1. Find the names of female riders who race Honda and Yamaha and are over 21.
    2. Find the names of riders who race Honda but not Yamaha or KTM.
    3. Find the names of tracks that have asphalt surfaces and allow KTM.

References

Update soon.

Metadata

Title:Relational Algebra
CreationDate:12/16/2013
SoftwareUsed:None
SoftwareVersion: None
Contributors: Zac Miller, Crystal Lyliston