Thursday, January 31, 2013

TED Talks on Education

TED Talks are short talks "Spreading ideas that matter". They cover a broad range of topics, but in general, they have excellent speakers who really have something to say. Along the lines of my new-found interest in Education, I offer up a couple of related TED talks that I recently stumbled into.

I suggest that you take 20 minutes to watch this TED talk by Brit Sir Ken Robinson: Do Schools Kill Creativity?. I liked it enough to add it to my Words of Wisdom Pearltree.

If you enjoy it too, there's a sequel from a few years later that is also worth viewing: Bring on the Learning Revolution!

At least one school teacher who I shared these videos with complained that the script leaned too much on humor and didn't have enough hard facts. He asked why any school teacher would be motivated to kill creativity in their students? Good question, but I'm quite sure I've encountered such in my own limited experience as a student. The video's speaker was convincing enough for me, but maybe that just shows my own biases.

Here's a short 3rd talk from the same speaker, but in a different, more visual format: Changing Education Paradigms.

Anyone got a convincing argument that it isn't so?

Saturday, January 19, 2013

"How Children Succeed"...

If you've been a regular reader of this blog in the past couple of months, you're aware that I've volunteered to teach a couple of computer courses at the North Hempstead, NY "Yes We Can" community center, which opened in my neighborhood here last September. Related links:

Approval for CS101 at the Community Center.

Keeping Busy.

I'm writing this on the evening of 1/14/2013, and the Computers-100 class that I thought was going to start in early December got pushed to 1/15/2013 3:30PM (tomorrow afternoon) when the town decided more paperwork was needed. The town's lawyers came up with a waiver for me to sign that I swear looked more appropriate for rock climbing then for teaching a computer class. It's times like that when I wish I was wealthy enough to have a lawyer on retainer to get a second opinion on what I presume to be a "contract of adhesion". Of course, that would likely have resulted in the town and I being in continuing discussions instead of trying to get this ship launched. Still, I've got to envy those scenes in Woody Allen's movies where he gets into a discussion about something, steps off screen for a second and comes back with some famous authority in tow to settle the argument.

The second class is a mentored version of Udacity.com's CS101. I'm still hoping to get that one rolling on 2/4/2013, 7-9PM on school nights.

One unexpected side effect of my getting excited about teaching these classes is that suddenly I have a previously unknown interest in "Education" as a topic. My prior education experience has been limited to my once-upon-a-time having been a student (K-12 in Union, NJ public schools [13 years], BS in IE/OR at Cornell U. Engineering [4 years], and MSE in "Computer, Information and Control Engineering" [CICE] at U. of Michigan [1 year] - 18 years of formal education), and a decade or so ago, I did seriously date an elementary school teacher. Oh, and there were all those years when my 2 children were in school. They have each now completed their undergraduate college degrees and moved on to the exciting world of work. So, with my new-found interest in "education", this month I discovered and watched the videos available on the web of last September's New York Times "Schools for Tomorrow" conference.

Several books got mentioned in the conference, among them:

"How Children Succeed: Grit, Curiosity and the Hidden Power of Character", by Paul Tough.

and

"The Shallows: How the Internet is Changing the way we Think, Read and Remember"", by Nicholas G. Carr.

So I put in requests for those books to the Westbury Public Library. Hurrah for inter-library loans. Even though my reserve request for "How Children Succeed" was 2nd in the queue for that book, the library managed to get a copy for me from a library in a nearby town. The book is new enough that they only loaned it to me for 14 days, so I had to dig into it quickly. It's due back soon. I enjoyed the book enormously and recommend it if you have children or any interest in education (and you certainly should have interest in education).

The book

The premise of "How Children Succeed" is fairly well telegraphed to you by the book's title and sub-title. The book opens with a visit to a pre-K class in Red Bank, NJ. The book's introduction (pp. i-xxiv), should not be skipped when you read the book. He introduces James Heckman, an economist at the University of Chicago. In 2000 Heckman won the Nobel prize in Economics for a statistical analysis technique he developed in the 1970's. In the late 1990's he turned his attention to a study of the GED program (high school equivalency certificate). Way back on p. 203 of "How Children Succeed", Paul Tough gets around to presenting detailed formal references published by Heckman and others about Heckman's study of the GED. I won't take the time to reproduce those citations here, particularly since I haven't chased them down and looked at them myself (and this article is shaping up to be plenty long already...).

The idea of the GED is that, given what schools develop is cognitive skill, then by testing for those skills, a high school drop-out can show he has the knowledge and smarts to graduate from high school without wasting the time it takes to actually finish high school. (My now deceased Mother dropped out of high school during the Great Depression to take a job as a telephone operator. Later, in the 1960's, as I was in high school, she pursued and received a GED degree because she wanted to finish her high school degree before I did). I hadn't realized how much growth there'd been in the GED program since it began in the 1950's. The book reports that in 2001, nearly one in five new high school "graduates" was actually a GED holder.

Heckman wanted to examine more closely the idea that young people with GEDs were just as well prepared for further academic pursuits as high-school graduates. He analyzed a few large national databases, and he found that in many important ways, the premise was entirely valid. According to their scores on achievement tests, which correlate closely with IQ, GED recipients are every bit as smart as high-school graduates. But when Heckman looked at their path through higher education, he discovered that [statistically] GED recipients weren't anything like high school graduates. At age twenty two, Heckman found just 3 percent of GED recipients were enrolled in a four-year university or had completed some kind of post-secondary degree, compared to 46% of high school graduates. In fact, Heckman discovered that when you consider all kinds of important future outcomes -- annual income, unemployment rate, divorce rate, use of illegal drugs -- GED recipients look exactly like high-school dropouts, despite the fact that they have earned this supposedly valuable extra credential, and despite the fact that they are, on average, considerably more intelligent than high-school dropouts

[...]

What was missing from the equation, Heckman concluded, were the psychological traits that had allowed the high-school graduates to make it through school. Those traits -- an inclintation to persist at a boring and often unrewarding task, the ability to delay gratification; the tendency to follow through on a plan -- also turned out to be valuable in college, in the workplace, and in life generally. As Heckman explained in one paper: "Inadvertently, the GED has become a test that separates bright but nonpersistant and undisciplined dropouts from other dropouts." GED holders, he wrote, "are 'wise guys' who lack the ability to think ahead, persist in tasks, or to adapt to their environments."

So "cognitive skills" is only part of what successful children have when they finish high school. The "non-cognitive skills" aren't tested by the GED tests, and are only indirectly gauged by completion of a high school degree. Goodness knows, I don't recall my high school paying much attention to teaching personal behavior, curiosity, self-control and social fluidity. But Heckman concludes those are all important components of being successful in life. Seems that a stable home-life is where those non-cognitive skills are developed in most children who have them.

Much of the rest of "How Children Succeed" describes visits to the schools in the slums of the South Side of Chicago and to various schools in New York City, ranging from a very upper-class high school in Riverdale, to the KIPP Infinity Middle School (a "charter" school) in Harlem.

From example after example, it is quite clear that growing up in poverty is not conducive to development of those non-cognitive skills. So the question asked is "Can schools successfully teach those non-cognitive skills?" At KIPP schools they explicitly teach SLANT (Sit up, Listen, Ask questions, Nod and Track the speaker with your eyes).

That KIPP school and the Riverdale Country School worked together to develop a grading system for evaluating students on:

  • Grit
  • Self-Control
  • Zest
  • Social Intelligence
  • Gratitude
  • Optimism
  • Curiosity

Among the unsettled differences between the 2 schools was the question of how public should the student's non-cognitive skill grades be? At the KIPP school, I gather they organize their parent-teacher conferences around discussing how a student is doing on the grades for those skills. At Riverdale, the grades are private. Riverdale's worry is that if the grades are released, that will merely foment tutoring courses to help student's boost their non-cognitive skill grades.

Not mentioned by Tough in the book, but the significance of a deficit of non-cognitive skills was certainly noted in the 1961 song "Officer Krupke" in West-Side Story. Sad that all these decades later we're still not sure what to do about it.

One of Tough's observations is that a key to success in life is the ability to recover from failure. He notes that in poverty, the few students who do master recovering from failures may actually be better prepared for succeeding as enterpreneurs than are the comparatively sheltered upper class kids of Riverdale Country School. Chapter 3 of the book spends many pages discussing a chess club at a New York City public middle school (Brooklyn's IS 318). When you play chess, it is expected that often you will lose. The coach of that team apparently has gotten very good at teaching her students not to let the losses weigh down their enthusiasm for doing better in the next game. Chess teams, of course, can do the same coaching at upper class schools, but coaches with the talent of the one at IS 318 are few and far between.

In Chapter 5, Tough quotes Jame Kwak, author of the blog post "Why do Harvard Kids Head to Wall Street?"

The typical contemporary Harvard undergraduate, Kwak wrote, "is driven more by fear of not being a success than by a concrete desire to do anything in particular." The post-college choices of Ivy League students, he explained "are motivated by two main decision rules: (1) close down as few options as possible; and (2) only do things that increase the possibility of future over-achievement." Recruiters for investment banks and consulting firms understand this psychology, and they exploit it perfectly: the jobs are competitive and high status, but the process of applying and being accepted is regimented and predictable. [...] "For people who don't know how to get a job in the open economy", Kwak wrote, "and have ended each phase of their lives by taking the test to do the most prestigious thing possible in the next phase, all of this comes naturally."

As I read that, I couldn't help but think of my own daughter, now a Columbia U. BA holder (double major in math and economics) and working in a major Manhattan bank as a business analyst. She'd likely question the reported ease of the recruiting process, having initially landed an offer from Lehman Brothers while she was still in her Junior year, but that firm managed to pass out of business during her senior year without even bothering to send her a letter formally withdrawing the job offer. She did scramble to find employment in her chosen industry despite the shambles that industry was in the year she was graduating.

Closing remarks

To close, I strongly recommend that you read "How Children Succeed". A couple of blockquotes from the book hardly do it justice. In my opinion, the book lives up to the favorable reviews printed on the back cover of the dust-jacket of the copy that I borrowed. The book's author, a college dropout himself, does a good job of pulling together a convincing argument for his point that "non cognitive skills" and the ability to robustly recover from failure are an important part of what children need to learn to succeed in life. He convincingly argues that we seem to be finally making headway in figuring out how to educate children toward a path they can follow to get out of poverty after decades of failed attempts in the ongoing "war on poverty".

As this is one of my few postings on a topic far out of my field of expertise, I look forward to seeing what I hope to be lots of comments from folks with more experience in the field of "Education" than I have. I moderate the comments to keep out the spammers, but I hope you won't be shy about posting your comments down below. If you have a blog of your own, feel free to post a link in the comments to whatever opinion piece that this article has perhaps motivated you to write, please.

1/28/2013 - Fixed a few typos.

Monday, January 14, 2013

Advice to undergraduates...

An overseas college sophomore majoring in Computer Science has been asking me for advice on what to study. "How to become a good computer programmer". I have no magic wand to wave to confer great programming skills, but I did find some web pages that look like sound advice.

One of my favorites is:

http://www.paulgraham.com/college.html

and I don't think that I only like it for its mention of Cornell U and my old professor Conway. The author of that piece seems to offer good advice.

A somewhat orthogonal point of view, but also interesting reading, is this very recent article from the New York Times:

study-in-science-shows-end-of-history-illusion

More on-topic for computer programming, is an article that I recently linked to in my blog here, but it's good enough to be worth another mention:

financial_software_disasters

So, I e-mailed these links along to that student, and got a nice thank-you note. That inspired me to think that maybe other folks will find this advice helpful so I'm sharing the links on my blog here. If you like this stuff, please leave some comments and/or sign up as a "follower" of this blog. Goodness knows, it should be clear by now that I'm not a spammer who's going to contact you about Ukrainian brides or mail-order pharmaceuticals. This blog has been running (on an irregular schedule) for several years now and I'm thrilled to see that the cumulative pageviews has finally passed 2000. Hardly a level where I would think about monetizing the clicks. Clicking on the +1 button when I manage to provide an article that you like, is a way to make me smile and it helps me decide what topics I should find more to say about.

Thanks for reading. Now hurry and pass the URL for this article to your 10 closest friends. :-)

Sunday, January 13, 2013

Udacity Unit 4 - Index data structure

In Udacity.com CS101, Unit 4, the notion of data structures is expanded beyond simple strings and simple lists. In Lesson 4-7, the web crawler is expanded to be a function:

index=web_crawl(seed)

where index is the data structure that is passed to the web page finder to find pages based on keywords and "seed" is a URL where the web crawler should start its crawl.

One student asked in the forum "What is index to be?". Fair 'nuff question, and they specifically added they didn't want an answer that solved the quiz of unit 4-7, just to understand what is wanted. To see the original forum thread, see: http://forums.udacity.com/questions/2024862/unit-4-7-i-dont-understand-what-index-should-be#cs101

Here's what I replied:

The crawl_web function is supposed to construct the index data structure that your search engine will use to find web pages. Truth is that between 4-7 and the end of the course, there will be evolution of the details of that structure. I suggest you re-watch the earlier lessons in unit-4, especially 4-2.

I'm not sure I understand your notation in this question, but as a broad hint I'll tell you that the whole content of a web page doesn't belong in the index. crawl_web's job is to extract from the web pages the information that you'll need to again find the pages that are relevant to some search word. Think about the index of a book. Say you want to find mentions of eggplant in the book. The book's index is an alphabetized list of important words in the book and for each word gives you a list of page numbers where that word was mentioned. Web pages don't have page numbers, but do have URL's. So what you want from the index data structure is something that your search function can use to tell at a glance what the URL's are for web pages that mention the search term you are looking up. (eggplant, perhaps).

At this point in the course, there are only a small number of data structures to consider. There are strings and lists, and you can compose grander things (lists of strings, lists of lists, ....). You've seen that the content of a web page can be regarded as a big string. With a bunch of effort by your program, you can rummage through that string to find keywords and links (URL's). You don't want to have to go thru all that work each time you just want to look up eggplant, so the idea is to do the work once, and capture a summary in the index data structure of what you saw. Think ahead to looking things up. What kind of index would make that job easiest?

I hope that helps. A marvelous thing about Python is that a function is able to return as its result a large complicated data structure. Other languages would insist that a function can only return simple stuff, such as a number. If you find yourself forced to work in a more old-fashioned programming language some day, you'll have a deeper appreciation of Python's generous approach.

Monday, January 7, 2013

Zero, One, Many Rule of Thumb

In Udacity CS101, Unit 2 introduces functions, parameter passing and return values. Lesson 2-17 has us define a function bigger(a,b) that returns the bigger of the 2 values passed in. As in most of the lessons in the course, they pose a question, have the student work the problem and then explain an answer.

One student noted that her answer was slightly different code than the answer that the instructor gave and asked on the forum if there was any reason one was better than the other. To see the original question and answer thread, see: http://forums.udacity.com/questions/2009058/unit-2-lesson-17-alternative-solution-not-longer#cs101

Now if there's one thing I can't resist, it's expressing opinions on a couple of little scraps of code, and defending my choice. Here's what I said:

I liked his "if a>b: r=a; else: r=b; return r" version better than your "if b>a: a=b; return a" version because even though the procedure doesn't change the passed in value of a, and isn't meant to change a out in the calling procedure, the code as you wrote it can easily be mis-read as doing something to the first parameter. Better to introduce local variable r so there's no need to waste a neuron on wondering if the procedure is changing anything in the caller's variables. Better, in my opinion, to leave the parameters passed in as read-only unless there's real reason to alter them. But my habits were formed working with languages that have different parameter passing rules than Python has. Only clear reason to stick with my habits that I can see is what if you had occasion later to add more code to the procedure and wanted access to the passed in value of a? If the code treated a as "read only", there's no chance of a bad surprise.

I actually preferred the "return a; else return b" version because it dodged the need for a short lived local variable. It's one drawback that I can see is that it isn't so obvious how to extend the code to handle more values such as a, b, c, and d. But at some point you probably should think of having a list of values and not a whole bunch of variable names. I was reading recently that code should be written to handle 0 cases, exactly one case or "many cases". Comparing a & b is one case, comparing a, b, c (and d) is clearly more than one case. Instead of coding for 2 cases and then for 3 cases, look for a way to generalize it to handle the "many" case, such as a loop to pick the biggest element out of a list of arbitrary length. But only do that if you have to handle more than one case. Generalizing the code before there's an actual requirement to do so is likely not time well spent. Zero, one, infinity rule is a decent summary, though where I saw this recently was Tim Ottinger's "Agile in a Flash" blog (Simplify Design with Zero, One, Many Rules). I can still find the article in Google reader, but as for rustling up a URL to it to share here, I'm having no success at that whatsoever. It's like the blog has lost track of that article.

Addendum

Well, better late then never. Google searching again today for that blog article found it. I also stumbled into a Wikipedia article that indicates that design rule predates that blog post by years. Alas, it's one of those "[citation needed]" Wikipedia articles that doesn't back up its information with a reference.

Saturday, January 5, 2013

Judging Code Quality

Unit 3 of Udacity.com CS101 introduces functions. One of the more challenging homework problems at the end of that unit was problem 3-8. Given a purported solution to a sudoku puzzle, represented as a list of lists, the assignment is to write a function that determines if the offered solution is valid. The function is to return True if the puzzle solution is valid, and return False otherwise.

One of the students was less than delighted with his solution. His function worked, but he wondered if it was Good Code or Bad Code? Some folks responded that the offered routine was too long, and that made it Bad Code. To see the original question and answer thread, follow this link: http://forums.udacity.com/questions/2024970/is-this-considered-bad-code#cs101

I tried to give a more nuanced answer to the opinion question...

Good code is largely a matter of opinion, which is not to say I don’t have opinions. John’s point about the length of the procedure is a valid point. I’d also argue that clarity could be improved by better choice of variable names, or at least comments explaining what each variable represents.

len(length) set off alarm bells in my head. Looking at the routine, I gather “length” is the list of lists that makes up the sudoku puzzle. Again, a better variable name or a comment would make things easier to read.

In the end, the acid test is “does your program work?”. If I can make it fail with a test case, surely, you’d have to accept that the code has a problem. The ease or difficulty of correcting things is then a valid basis for judging if it’s Good Code or not. That isinstance test to screen out floating point numbers in the data worries me. What does it do if you hand it a different data type, like, say, a complex number? If you eventually arrive at a program that handles every test case I can think of, but the code bristles with odd tests to avoid problem X, Y and Z that we discovered, that still is probably not Good Code.

My version is already posted to the forum with some other sudoku question, so I won’t repost it here. My approach was to generate a list of all the integers that would be valid in a row. (e.g. for a 9x9 puzzle, any valid value should be in the list [1 2 3 4 5 6 7 8 9]. As I matched a value to the list, I’d delete that value from the list so if it came up again in the same row or same column, I’d reject the duplicate as invalid. For each row and each column, I freshly repopulate the list of valid integers.

One aspect of my code that I wasn’t completely happy with was that my code for checking of rows largely duplicated the code for checking columns. “Don’t Repeat Yourself” (DRY) is good guidance for writing Good Code.

One other area where I think my code could use improvement is I had no explicit test to make sure the puzzle data was square (length of each row exactly matching the number of rows). Would be easy to patch that in.

http://www.pearltrees.com/#/N-u=1_391422&N-p=61027473&N-fa=3462050&N-f=1_5113795&N-s=1_5113795 is my collection of links to web pages pertaining to the topic of writing good code. The “Principles of Good Programming” pearl there is probably a good place to start reading.

If the wit and wisdom of the xkcd comic strip appeals to you, you might enjoy: http://xkcd.com/844/. If you’ve read this far and are enjoying my answer, you might try a Google search for:

code smell

to find tons more to read to help you better discern the answer to the code evaluation variant of the question posed to Dorothy soon after she arrived in the land of Oz: “Are you a Good Witch or a Bad Witch?”

And if I still haven’t convinced you that your code could be better: Consider how terse the results from your code are. Suppose your code became widely popular and you get a support call from some distant timezone at an inconvenient hour. The user says “I asked your program if my sudoku puzzle was valid and it said False. What’s wrong?”. Good Code would have precluded the need for that support call, or at least have told the user exactly where to look to find the problem. They might call for support anyway, but such is life.

Want to enrich your book shelf? Well, an Amazon search for books with Brian Kernighan as an author is a good start. The Elements of Programming Style by Kernighan and Plauger and The Practice of Programming by Kernighan and Pike are ones I highly recommend, though neither is specific to Python. Alas, putting the books on your shelf without reading them doesn’t do much good. (Full disclosure: I had the enormous pleasure of working with Kernighan and Pike a while ago). NB. there’s no shame in picking up a used book to save some money. Someone else’s highlighting and marginal notes might even add value.

Best wishes to you on your journey to learning “How to Write Good”. Exposing your code for review by others is generally a Great Idea. Pay attention to the comments and suggestions that come back, even the comments that you don’t like. The comments and suggestions may turn out to be Bad, but do take your time maturely considering the feedback before you arrive at that conclusion.

Addendum

Since posting that answer, I've established a Pearltree about "Code Reviews". I believe code reviews are the best way to check for quality of the code that goes into a project. Sadly, management of developers often is unwilling to invest the resources needed to take the time to review the code their team is grinding out. An article in that Pearltree that I especially enjoyed is this lament over the sorry level of code quality that is so typical in the commercial world.

I confess that, so far as I know, no one else has expressed serious code-review suggestions regarding my sudoku routine. Specifically, nobody ever pointed out that my procedure suffered the same code-smell of being too long for one Python procedure. And, sadly, nobody has explained to me how I can cleanly reuse the same code to check rows and check columns. Transposing the matrix data seems like too much work, so that isn't the solution to my DRY worries. Please share in the "comments" here if you see a way to eliminate the repetition or have other suggestions for improving my version of how to solve this homework problem.

Why don't we use a bucket for every keyword?

As I explained in my first 2013 blog posting, Udacity.com's CS101 course has a project running through it where you learn to program by building a web crawler and a search engine. The web crawler builds a data structure that the search engine then uses to find web pages. The initial implementation does sequential searches to find keyword matches in the data structure. Around unit 5 of the course, the scalability problem of that approach is pointed out, and hash tables are introduced as a more time efficient way to search for keyword matches. Hash tables have N "buckets" and a hash function that assigns any given keyword to a bucket. Sometimes, more than one keyword will hash to the same bucket so the hash table implementation has to know what to do with such collisions.

One student reasoned that if N buckets cuts down the search time to find a keyword match, that it would be even faster to allocate a separate bucket for each keyword. That question appears here: http://forums.udacity.com/questions/2025239/why-dont-we-make-a-new-bucket-for-every-keyword#cs101

So I explained why in my forum response:

Underneath it all, the computer memory is addressed numerically - the first storage location is numbered 0, the next is numbered 1 and so forth. Each storage location in the computer's memory typically can hold one character, so a string is a sequence of storage locations. If you set up a series of entries, each with a fixed length, and want to access the Nth entry, given the starting address of the series in memory, the length of the entry and N, it is a simple matter of arithmetic to compute the address of the Nth entry. But if you want the address of the entry for keyword "Eggplant", how do you map that string to a particular address? A hashtable is one technique. We treat the underlying representation of the letters in the string as numbers and run them through a function that computes a number that falls into a range (the number of buckets). The exact details of the computation can vary from one implementation to another, so long as the same result comes out for the same keyword each time for the same hash table.

Just now, for fun, I wrote a little program to tell me what 'Eggplant' works out to if you look at its underlying bits as a single number. In decimal that works out to: 5,139,324,068,239,476,530. 'Eggplant' isn't even a big word. How many buckets are you prepared to bring to the party? So we need a technique that'll reproducibly map that down to a number in a more convenient range and that's what we call a hash function.

There are other possible approaches than hashtables. b-trees, for instance. But that would take more explaining then I think would fit comfortably in this forum. In the end it all comes down to determining a numeric address that corresponds to the entry you seek.

There are programs that given a fixed list of words will create a "perfect hash function" - one with no collisions for the given words. Not applicable to the web search engine project, because the list of words isn't know-able in advance, but suppose you were writing a program to process Python programs. There is a fixed set of language keywords (if, else, print, def, ...) and an open-ended realm of possibilities for variable names. So a perfect hash to make it easy to look up the keywords might be useful, but the variable names will have to be handled separately.

So there are many ways to organize and store a collection of information. The amount of work to add an entry to the collection, the work to look up an entry and the space to store the collection vary with different techniques. Requirements vary, so good programmers need a large bag to hold all the possibilities they might pull out to apply to whatever problem. There are books where such things can be looked up and library modules often save you from having to reinvent things from scratch. Python embodies a tremendous long history of techniques and makes it look easy. Hurrah for progress and not needing to do pointer arithmetic. Indeed, even lists of variable length strings are pretty amazing when you think about what that involves behind the scenes.

Some day you may learn some less evolved programming languages such as C or assembler language where the underlying hardware is more visible to you. Nice place to visit, but keep your ruby slippers handy to take you back to the ethereal world of Python when you're ready to leave the registers and instruction codes behind. Layers and layers of abstraction, all of it know-able, but if you try to think at the machine code level all the time, you won't be very productive when working on some higher level problem.

The user gave my answer a "thumbs up" vote and "accepted" the answer as resolving his question. Valuable karma points. At least it made me feel I'd helped the guy out.

Friday, January 4, 2013

Approval for CS101 at the Community Center.

As I explained in my late November, 2012, blog posting, I liked Udacity CS101 enough to propose a mentored edition of the course for the new North Hempstead "Yes We Can" community center, here in my neighborhood. The idea is to offer the free CS101 course with the added value that the community center provides the PC's and Internet that folks need for working on the course, and the students get my presence, so they have someone to turn to for help if they get stuck on any points of the course. The challenge for me will be to recruit and retain students. I don't just want to get people to start CS101, I want them to complete the course and earn the certificate that Udacity offers to folks who pass the final exam. I've drafted up a brief course description. Professor Evan's video (linked earlier in this paragraph) may be better marketing than plain old text on paper, but a snazzy video is hard to leave in the hands of a school administrator as a souvenir of a meeting with me.

For more about what I hope people will get out of CS101, see my early December blog posting about CS101. I really hope that the course will influence some kids into taking the prospect of college more seriously, and to maybe even consider a possible major in computer science. Of course, it is easiest to remember your goal is to drain the swamp when you are not waist deep in alligators, and the approval for offering CS101 at the community center only came on 12/21/2012, the day that the schools let out for Christmas vacation. (It was last July that I proposed this program. It just took a while to win consent from Parks and Recreation, the folks who run the center). So, as of this writing, I have not met with the high school administration to discuss how to recruit students for the class and can sit around with high hopes. "Nothing but blue skies...". Even though I know reality is dead ahead and I'm closing in fast.

My plan is to have once/week class meetings of the whole team (Monday evenings). I can slip a little of my own material in to those days. We'll start each of those sessions with a Scrum-inspired stand-up meeting where we'll ask each person to briefly state:

  1. Who they are.
  2. What did they do last week (lessons, and forum activity).
  3. What do they plan to do this week.
  4. What obstacles are hindering their progress?

The stand-up meeting is short and we don't try to solve the obstacles during the stand-up meeting, but I hope the meeting will help people connect if someone has a problem and someone else has a solution. If there are consistent problems heard, that'll be my clue to perhaps try to explain something in class. The other "win" from this meeting is it gets people to commit to what they aim to do that week. Scrum-methodology calls for a daily stand-up meeting, but I'm uncertain we could get everyone to show up at the community center every evening. So, I am aiming to do weekly meetings on Mondays.

For the first session, I've prepared a set of slides to give people an overview of what to expect from the course. I'm hoping that first session will get people to commit to sticking with the class for as long as it takes (and as it's nominally an 8 week class, it isn't a particularly long term commitment).

I've given some thought to what we might want to cover in class on other weeks. I hope the CS101 videos convey the technical material well enough on their own, so the sorts of things I'm thinking I might need to add are:

  1. Goal setting.
  2. Time management.
  3. Good work habits. Note taking, minimize distractions. Maintaining a balance between work and play.
  4. The value of participating in the forum. Why take the time to help other people out?

Other school nights, I plan to be available in one of the tutoring rooms in the community center from 7PM-9PM. If anyone wants to see me on weekends, they'll have to make an appointment with me. The center is across the street from my home, so it's not going to be a hassle commuting to there.

If you have ideas on ways to improve the wrappings I'm suggesting for Udacity's CS101 course or thoughts on how to recruit students for that course, I'd sure like to hear them. I'm expecting we'll need January to work out a few technical issues (like the community center's overly restrictive firewall) and to get out the word about the class. I hope we'll be able to start the mentored course offering in early February.

The course is free, but membership in the community center is required to use the center's facilities. If you live in North Hempstead, NY, please consider signing up to take CS101 at the Yes We Can Community center on Garden St. in New Cassel.

Thursday, January 3, 2013

Dictionary Implementation

In Udacity CS101, the unifying thread of the course is construction of a web crawler and search engine. Without making much of a point of it, the initial implementation involves a simple list and linear search. Then, the scaling problem of that approach is pointed out, and the course explains hash tables as a more time efficient way to store searchable data. After we'd rolled some of our own hash tables, the course goes on to introduce Python's dictionary data type - Hash tables as an inherent part of the language.

One of the students asked in the forum how dictionaries were actually implemented. How does Python know how many buckets to allocate for a dictionary? To see the original question see: http://forums.udacity.com/questions/2015065/dictionary-and-hash-table. I looked into the matter and posted this reply:

There's a book from O'Reilly called "Beautiful Code". Chapter 18 of that book describes the Python dict implementation in detail and the web has positive things to say about that chapter. Alas, the book is not freely available, unless you have access to a good technical library from which you can borrow the book. My local library isn't particularly technical, but they can do interlibrary loans from distant university libraries to get what I need, so long as I'm patient. Ask your librarian for help.

Or, if you are feeling ambitious, take advantage of the fact that Python is open source, so you can look at its innards to your heart's content. Python dictionary implementation (in C). As George Lucas had Obiwan suggest: Use the source, Luke.

The most common Python implementation is Cpython, an interpreter implemented in the C language. But there's a somewhat more experimental implementation called Pypy, an interpreter implemented in Python. I speculated that with a little rummaging around, I ought to be able to find Pypy's implementation of dictionary's, but close as I found was some documentation on Pypy's dictionary optimizations. There are tons of bitbucket.org/pypy pages for Pypy source, but I didn't actually find the Pypy code that you are asking about. Feel free to show me up on locating the routine of interest to you.