Tuesday, January 17. 2006
There are 4 main schools of thought on how to teach Data Structures in Computer Science courses, and this post quickly explains them, and asks you which one you belong in.
The categories presented here are the results of a phenomonographic (Fe Nom Oh No Graphic) study carried out by an Australian named Raymond Lister who works in the area of CS Education. Having interviewed a selection of CS educators around the world, and analysing the interview transcripts, Ray found that there are 4 attitudes toward teaching data structures, but only 2 outcomes. There are 2 ways to teach data structures courses, the first one is to teach the students how implement data structures. This is the more common of the two styles, and is used here in Maynooth. Typically a second year data structures course will show students how to implement Linked lists, Binary Trees, Hash Tables, and the like. The students gain a good deal of exposure to concepts such as recursion, pointers, and also gain a lot of experience in writing code. The main arguments for this are are as follows.
The other, more modern method of teaching data structures is simple to teach the API surrouding them. The students learn how to use the data structures provided by the STL, or JDK, or whichever libraries come with their chosen language. Instead of writing their own buggy data structures the students use real world data structures and learn how to work them. The two main arguments for this are as follows
So there you have it, convincing arguments from all sides. In picture form, here is the scenario, and you must pick one box to put yourself in. So the reason I am writing this post, is cause I want your opinions, as students/academics/ graduates/ graduate employer. What do you think is best? Please let me know what way you think it should be taught, by leaving a comment.
Display comments as (Linear | Threaded)
I was always under the impression that teaching the students how to implement data structures could also cover topics such as component reuse if progressive assignments built upon work previously done.
Interesting stuff as always Des. My (entirely unqualified) take on the topic;
I'd suggest a hybrid approach, with the emphasis on the software engineering end. Off the top of my head, I'd approach the teaching of data structures along the following lines;
1) Explain the data structures. Theoretical, big picture stuff; what it is (block diagrams, high level behaviours, etc) and why it is (uses for these data structures, real world applications, benefits and debenefits of different ones).
2) Hands on use. This is the API stuff mentioned above. Preferably more than one API/library - say, start them on your own, simplified version, then lead on to using a real world tool such as the STL or Java/.NET equivalent.
3) Implementation. As the more advanced/final part of the teaching, have the students implement their own version of at least a couple of the data structures.
1) This is me again with the big pictures and all I guess. IMHO, there's no point in teaching anybody anything like this if they don't know, in broad terms, what you're teaching and some of the why you're teaching it. Regardless of whether you take the implementation or library approach, or my combined version, you need to start out by giving a thorough high-level grounding in what the data structures are, how they work, how they behave, how they're structured, and what they're used for. If you don't do this, and do it well, the usage/library approach is just going to look like a bunch of random function calls for the sake of it, and the implementation approach is going to be some very confusing snatches of code with no connection or heirarchy.
2) I go with the usage/libraries next beacuse, IMHO, that's the more important part - if the student comes away with only a little of the course, at least let it be some idea of the nature and practical use of data structures. If they have that, without much implementation ability, they've at least got something useful to build on. If they come away with some idea of the implementation, but no handle on how to use the things in practise, they've got even less. The other reason behind emphasising the use of the things is the real world approach - as you point out, in practice, you're much more often going to be using data structures than implementing them. I'm always worried about that software engineering, re-use end of things with courses like this. You don't want to be giving the students the idea that they should actually be implementing their own version of something that already exists, unless there's a good reason for it. This seems to happen too much in a lot of college courses - students come out with the idea that writing their own version of function XYZ is somehow the better approach to using an existing version that's been developed, refined and tested by many experts over many years. Not good. The reason I suggest using more than one library, or starting with a special simplified version, is to avoid tying the student too closely to a given library - ie, that they think a binary tree is something you find and use in the STL, as opposed to a more general concept. There's also the fact that starting with a custom teaching version of a library, particularly with beginner programming skills, might give a better curve than a more complex "real-world" library.
3) That said, I'd still go for teaching the implementation of at least a few of the data structures, partly to deepen the students' understanding of them, and also as a good programming excercise. Data structures make a good practical opportunity to teach concepts like pointers, code-reuse, recursion, OO design, interfaces/abstraction, code structure, library implementation, etc. There's also the reward/undertanding aspect - once you've already used a given data structure in library form, there's a certain reward/"cool factor" to seeing your own (limited) version of it in action, probably more so than if you'd taken it from scratch.
Well, that was a shocking long "comment", hopefully it was of some use.
An interesting post...as always
In my response I'd consider the totality of first year programming, second year data structures and second year programming.
Even though first year programming is an introduction I should be addressing the "Programming skills" learning outcome. This leaves second year data strctures free to concentrate on "Transferable thinking" and possibly "Component thinking". Where second year programming re-emphasises "Programming skills" and (in our case) second year software engineering re-emphasises "Component thinking".
I'm generally skeptical of the "Knowledge of libraries" learning outcome on anything but a training module. And I do think there should be a training module directed at Java or C# etc... Most people roll this into first/second year teaching.
I feel (ATM) that the main problem in CS education is that: to write a single simple program we're implicitly assessing each of the four learning outcomes you've mentioned. We've got to learn that when we're assessing "Knowledge of libraries" that we shouldn't worry about "Component thinking". Unfortunatly, I don't (yet) know how to seperate these concerns.
What Dave said. Aidan, I can see why you are sceptical of the "knowledge of libraries" approach, I think the main thing is that people are aware of them, and have experience with libraries, and in general, a lot of the concrete, instead of the abstract, by the time they leave college. I think the hybrid approach is the only viable one, but this discussion about approach assumes plenty of time is assigned to teaching programming/software engineering/whatever, but as we all know, this time is decreasing rapidly, in maynooth anyway.
i had seen this message and i am very happy because i HATE this
I think that C should be taught in frist year, as all modern languages have evolved from C. Once you know C it only takes a little while to learn any other language as you already have the basics.
That was the way with me, I did two years of C in Tallaght,and after 3-5 months I had grasped Java. I'm looking forward to doing it in Experimental Physics this semester
Paul, I would agree with you. It can include those concept, but in my experience it rarely does. Typically the course staggers and ends up stuck on stuff like Red Black trees.
Dave, I like your idea, and your comment. The only concern I have with it, is that sometimes when you try to do 2 things properly, you end up not covering either sufficiently. But I have to agree, most courses have at least 2 data structures modules, and what you propose should be possible. I'd invite you to expand on your point, after more consideration. If I told you that you were teaching D.S next year for 2 modules, what would/wouldn't you do?
Aidan, I agree that it is dangerous to concern yourself primarily with teaching libraries as chances are they will all change, leaving the student kerfucked when he/she goes job hunting. But I think its just as dangerous to be unfamiliar with libraries as it is to be only familiar with libraries.
Mach, nice to know you like C. I'll be sure to tell Kernighan and Ritchie that you're impressed with their work.
See, the fundamental problem here is that Dave, Aidan, and Phil have all just thought more about how to teach data structures, than any of the lecturers who actualy teach it. There are very few jobs were you can get away with literally winging it, and robbing people of an education, but my oh my Computer science lecturing certainly seems to be one of them.
Thanks for the comments, please keep them coming, its beneficial to me.
Yeh I even got my hands on a copy The C programming Lanuage, though i'ts in an ebook so It will fit nicely on a CF card and will be able to read it on my PDA:).
Well, as I've just experienced the implementation level of the programming, I find that it is nice to know how they work under the hood, and from that I don't really see the point in teaching how to use them (but then I do know how to use the API, and I'm sure that alot of the people doing CS with me still don't know that).
The only thing that I do dislike about the implementation is that sometimes the stuff they ask you to do is, for lack of a better word, slow. An example would be that they asked us to write a Linked List in a lab, so me understanding it went ahead and implemented a doubly linked list just to actually challenge myself. The next weeks lab turned out to be: Modify the previous weeks Linked List class so that it's double linked.
And the same happened to me in the exam. Write a Stack using a static array. So there I go being smart and write the Stack using a circular array. Next question: Explain how to modify the code from part (previous) so that it uses a circular array. Now I probably should have read the question. But still pants.
But then I like knowing how things work under the bonnet so that I can when using them later on appreciate what is actually happening.
Cheers for an infomative entry Des
I would guess that the Raymond Lister you mentioned is the same Ray Lister who taught me 2nd year Computer Science at Queensland University of Technology in Brisbane, Australia in 1997. I would say that Raymond focussed more on understanding how the data structures worked, rather than how to use an API, although there certainly wasn't a heavy focus on students having to write the code to implement data structures.
I remember thinking at the time that his class was a whole lot easier than first year, and was probably a little too easy. I'm sure Raymond was aware that many people thought his class was too easy, but he assured us that about 30% of people still scored less than the required 50% pass mark in his class.
As an example of how easy it was, the entire assessment (including assignments and final exam) of data structures involved filling in 2 or 3 missing lines of C code in each of the binary search algorithm, and inserting a new item into a linked list.
So the linked list insert code needed to be something like:
item_to_insert->next = current->next
current->next = item_to_insert
From my own experience, although I wasn't explicitly trained in the APIs of data structures, it certainly wasn't hard to use and even guess what the APIs of the Java Vector and Hashtable were when I started using them in my first job.
Also, I've never once had to implement any of the basic data structures or algorithms such as binary search or linked lists in the 10 or so years that I've been coding. So even though I wasn't forced to write the full code to implement data structures at university, I don't think I've missed out because of it.
I think that students neither need to be able to implement data structures fully themselves, nor commit the Java collections API to memory to be able to make the best use of data structures.
The most important thing is for students to be able to make the correct choice of which data structure to use when faced with a problem. Neither knowing how to write data structures, nor knowing the API to use them necessarily gives you this information.
To be able to make the right choice, I would say that the ability to draw boxes and arrows showing what each data structure looks like and give some O(n) information about each would be a fair indication.
My name is Des, I'm the UX Lead and COO of Intercom, a fantastic CRM & messaging tool for web sites and web software.