Please keep track of the total time you spend on this assignment, and make a note of that on what you turn in. This way we can tune the assignments for future classes.
Now that you are working with data definition and larger SQL commands, you should get used to saving your work in a .sql file, and running the file with the MySQL client program. It will get old typing long DDL and DML statements into the client directly.
This week's assignment draws from specific portions of Chapter 4. The reading sections for Lectures 7 through 10 indicate the relevant sections: 4 - 4.2.5, 4.3, and 4.6 - 4.6.2.
(OPTIONAL) None of this week's assignment focuses on Large Object (BLOB, CLOB, etc.) data types, but if you are interested in learning more about when it's a good or bad idea to store large objects in a database, here is an interesting research paper discussing the topic:
Do these problems from the "Practice Exercises" section of Chapter 4. These problems require you to write SQL data-definition commands. Follow good SQL coding style.
Note that since these are in the "Practice Exercises" section of the book, solutions are available at the book's website. DO NOT CONSULT THESE SOLUTIONS UNTIL AFTER YOU HAVE SUBMITTED YOUR FINAL WORK. Failure to do so will be considered a violation of the Honor Code.
Problem 4.2 (8 points). Make sure your DDL imports successfully into MySQL.
Problem 4.3 (2 parts; 3 points per part).
Problem 4.4 (4 points). In addition to explaining what happens, come up with an alternative approach, including a sequence of DML operations, for removing a manager, that will work around the behavior specified in the DDL. (Note that you aren't allowed to change the schema definition in this case, even if that is the best idea...)
This week's exercises focus on the SQL date/time types, and on data definition. One problem also gives you an opportunity to specify a stored function in MySQL. As mentioned earlier, these operations will require more than one or two lines of SQL, so you may want to create different files for the larger exercises, and run the contents of the files from the MySQL client. That way it's easier to edit and rerun the operation.
Also, note that MySQL has a subtle variation in syntax for the EXTRACT function. The standard says that this should work:
EXTRACT (MONTH FROM sale_time)However, MySQL requires that there be no space between EXTRACT and the opening paren. In other words, it is like a function call. (This is true for the MySQL CAST syntax as well.)
This week's exercises use a new version of the grade schema. Download and import the file make-grades2.sql. The main difference is that fileset records now include the size of each file-set in bytes. (All of your queries and views from last week will probably still work, but they might have an extra column, depending on how you wrote them.)
Here are some simple SQL queries to write, that involve date and time values. Note that the database schema supports multiple submissions for a given assignment. This is not reflected in the submission table where you might expect it; every student has one record in the submission table for each assignment in the assignment table. Rather, there will be multiple entries in the fileset relation, all with the same submission ID.
For these queries, don't worry about handling multiple submissions in any special way; just write the queries as if this little detail doesn't matter.
One example of how to use the CAST operation is to break down values into groups of some particular size. For this problem, you will write a SQL query that breaks down the last four hours of final exam submissions into 30 minute intervals, reporting the number of submissions for each 30 minute interval.
NOTE: MySQL CAST rounds values to the nearest whole number, instead of truncating like I thought it would. This messes up the query results. So, instead of CAST, use either TRUNCATE or FLOOR to chop off the fractional part of the segment value. You may also notice that some segments have no submissions; this is fine. Don't worry about representing those segments with a 0 value in the results, unless you truly are starved for anything better to do...
A good place to start would be to write a query that generates a relation containing the difference between each final exam submission's submit time, and the final's due date, in minutes. There is no really obvious way to do this, but you can use the unix_timestamp(date) function to convert a date into a UNIX timestamp. This timestamp is the number of seconds since January 1, 1970. So, you can compute:
(unix_timestamp(submit_date) - unix_timestamp(due_date) / 60
This query should give you a list of how many minutes each final was submitted before the due-date. Your minute values will probably contain decimal values; this is fine.
Once you have that query working properly, it is reasonably straightforward to construct the rest of the query. You need to take the number of minutes before the due-date, and divide it by the interval size. (In this case the interval size is 30 minutes.) Casting that value to an integer will give you the segment that the submission falls into. Once you have this segment value, you can use grouping and aggregation to generate the final result.
A few additional specifications for the result:
For the next two exercises, you will take into account the fact that students may have submitted work for an assignment multiple times. (You can reread the description at the beginning of Exercise 1 for more details about this somewhat confusing schema layout.)
As a warm-up exercise, write an SQL query that reports the top 10 submit counts over all (student, assignment) combinations. List the student's username and the assignment's short name, along with the total number of submissions for that combination.
Although it would certainly be simple to do, you should definitely not write this as a correlated subquery in the SELECT clause! This turns out to be much slower to compute.
To find the "top N" of anything, you can order your results by the appropriate column, then append "LIMIT N" to your SQL query. The database will only report the first N results of your query, which will be exactly what you want. (This is not standard SQL syntax, but it is widely supported.)
The result to generate for this exercise is a report containing every assignment's short name, and the following values, computed only over the last filesets submitted by each student for that assignment:
Hint: A good place to start would be to generate the actual set of filesets that actually are the last filesets submitted by students. Once you have this, the rest of the query is quite straightforward.
Collab: Feel free to collaborate with other students in the design of this function, but write the code yourself. If you get stuck on database errors, feel free to get debugging help from other students. Sometimes simple syntactic issues can be very frustrating...
Write a stored function named sizediff that computes the difference in size between the last fileset for a given submission, and the first fileset for the submission. "Last" and "first" are relative to the date of submission.
The function will take a single INTEGER argument which is the sub_id value to look at the filesets for. Note that each unique value for sub_id corresponds to one unique (student, assignment) pair, so you can simply use this ID to select all fileset records corresponding to that submission.
The return-value should be an INTEGER value reporting the difference between the last fileset's size and the first fileset's size.
The body of the function can work something like this:
Start with the stored function code given in class, since this function will operate in a very similar manner. Note that MySQL has an interesting peculiarity - the semicolons in the function's body will confuse the SQL parser, since it will think you have reached the end of the statement! So, you must change the delimiter temporarily while you are defining the stored function. This is also shown:
-- Set the "end of statement" character to ! so that the semicolons in the -- function body won't confuse MySQL. DELIMITER ! CREATE FUNCTION acct_total(cust_name VARCHAR(20)) RETURNS NUMERIC(12,2) BEGIN -- Variables to accumulate into DECLARE bal NUMERIC(12,2); DECLARE total NUMERIC(12,2) DEFAULT 0; -- Cursor, and flag for when fetching is done DECLARE done INT DEFAULT 0; DECLARE cur CURSOR FOR SELECT balance FROM account NATURAL JOIN depositor WHERE depositor.customer_name = cust_name; -- When fetch is complete, handler sets done flag DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1; OPEN cur; REPEAT FETCH cur INTO bal; IF NOT done THEN SET total = total + bal; END IF; UNTIL done END REPEAT; CLOSE cur; RETURN total; END! -- Now back to your regularly scheduled programming... DELIMITER ;
Once you have this stored function working, use it to find the sub_id values for the submissions with the top 10 largest size differences. Include the actual size-differences themselves. Also, make sure that you don't have sub_id values appearing multiple times in the query result; that would be a bug in your query.
OPTIONAL: If you want to make your result nicer, try writing another query that shows the username and assignment short-name values for the top 10 largest size differences.