Showing posts with label hardware. Show all posts
Showing posts with label hardware. Show all posts

Saturday, August 2, 2014

Python and Parallel Processing

(Image taken from http://www.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html without explicit permission, but with acknowledgement of the source of the work).


I've written about Parallel Processing here before. e.g. See "What's the fuss about Parallel Processing?". I've been worried by my reading that seems to predict that imperative languages such as Python will never be able to safely cope with massively multi-processor architectures in the future. The "Downfall of Imperative Programming Languages" basically says that functional programming languages (such as Haskell and Erlang) are going to be the way of the future. I've been trying to get my head wrapped around Haskell, but so far without much success. So I was happy to hear a clear call from the Python camp that "I'm not dead yet!".


The article that gives me new hope is "Parallelism in One Line" the gist of which is that "Sure you can use the threading library to manage pools of threads (or of processes), but that takes a lot of lines of code and is very error prone, but there's another way to do it." The other way to do it is to use a parallel version of the map function. I, for one, didn't know that there's more than one version of "map" available in the Python library.


If you do a Google search for:

Python map

you will readily find documentation for Python's standard (iterative) map routine. e.g. "Python 2.7.8 builtin functions documentation - map".


If you readily want to find documentation of the parallel version of map that the Parallelism in One Line article talks about, you need to modify your Google search to:

Python map multithreaded

which will take you to a subset of the first search. I was amused to find among the search results a 2010 ActiveState Recipe for building your own "concurrent map" function, which drew a comment from one reader asking why not just use "multiprocessing Pool.map". The recipe's author admitted not knowing about that one.


From that 2nd search I found "Multiprocessing - process based threading documentation". It does worry me that the documentation seems to have more complexity and gotchas than the Parallelism in One line article owed up to. Arguably I shouldn't be blogging about this at all until I've actually given it a try myself (but I still don't have a multi-core processor here at home).

It's a trick?


If you've been reading this carefully, you might rightfully object that it's all a bit of a trick. The map function is an element lifted from the world of functional programming and then provided in Python. The parallel version is only safe in Python if the function that you are mapping is "pure". If the function code has side-effects, then you will have race conditions and potentially suffer horribly at the hands of multi-cores. The language isn't going to inherently protect you so you have to be careful out there.


If you want to read more about the challenges of Python vs. multi-core architectures, see "Python's Hardest Problem, Revisited", by Jeff Knupp. The piece parts to roll your own multi-processes or multi-threaded Python program remain available, but be sure you have plenty of iodine and bandages on hand if you cavalierly venture into the world of multi-cores in a language that makes no promises of (much) concurrent processing safety. Design your code carefully and watch out!

Friday, June 27, 2014

Home Networking with FIOS - Don't cross the streams.

As you may remember from my article Adventures in Home Networking, we very badly wanted to get out of using Cablevision. They'd agreed to a price to provide service to our home, but then consistently billed for significantly more per month then the agreed-to price. We got the NY Public Service Commisssion involved, but every month's bill turned into a new argument with Cablevision with the PSC insisting that they accept the payment of the agreed-to amount. Cablevision would accept the payment and leave the service connected, but rebilled the unpaid difference the next month as an amount past due. There were some months they'd insist they hadn't received the payment and we'd have to get a 3-way call between the bank, Cablevision and the PSC to establish that Cablevision indeed had collected the previous month's money from the account and then mis-placed it.

We got tired of that monthly argument, and looked for alternatives. The hard part was finding Internet service. Given Internet, it is easy to get phone service (e.g. Vonage or MagicJack), and there are non-cable alternatives to television service (e.g. DIRECT-TV and Dish satellite service). We had tried Hughes Satellite Data service as an Intenet Service Provider, but found it to be expensive (limited data transfer allowed per month), high latency and noticeably prone to dropping packets. We concluded that Hughes would not be a good substitute as an Internet Service Provider for our home. As I described in that January, 2014 article, we also worked out how to put our home on AT&T's Mifi service (cellular LTE data service). The problem we had with it was again limited data transfer allowed per month, $10/GB for overages. It was fast enough, and seems reliable enough, but at $52/month base, is somewhat expensive. It also had the irresistible temptation to take the access point along when traveling out of the house, but that would leave the house without service while the Mifi was on the road. In the end, we kept the Mifi box and grudgingly pay the $52/month just for the convenience of having Internet access while away from the house. (Well, the other rationalization of the $52/month is that it ensures that I have Internet service even if my home's ISP is out of commission. It's not just that I'm addicted to blogging. I use the Internet for work too). Cablevision, of course, sings a song about having Wifi hot spots in many places. But our experience with those is that often the signal is too weak to be usable. e.g. there is no Cablevision Wifi at our house. If I go down the block, there is Cablevision Wifi in the nearby park. At our local supermarket, there is a signal if you walk out to the sidewalk near the street, but inside the store you are out of reach. Similarly, at some of my doctor's offices, there is a signal out at the street, but inside at the reception desk, the signal isn't strong enough to be reliable. The Mifi box is more secure (uses your own WEP password for encrypted Wifi) then Cablevision's unencrypted Wifi and AT&T Mifi seems to work well even inside low-rise buildings such as are common here in the suburbs.

So, after running out of alternatives, we decided that we'd drop Cablevision and go with Verizon FIOS even though it wasn't going to save us any money. The one advantage seems to be that FIOS includes Jimmy Swaggart's Son Life Network which my wife enjoys. If you want to hear a disgraced minister sing hymns, maybe you'll like it too. It's probably easier to enjoy if you simply reject those repeated charges of cavorting with prostitutes as "obviously" false (New Orleans and Los Angeles). If I was providing a sound track to go with this paragraph, I think I'd opt for the Beach Boys "California Girls". I'm just not creative enough to produce a mix of that and some suitably religious ditty. Not that it is in any way a religious ditty, but how come it seems so rare that the radio ever pairs up California Girls with "Back in the USSR"?

It probably helped minimize any shock from the bill that the price negotiation with FIOS was done by me, not my wife, so we came away with more realistic expectations of what the bill was really going to be. Verizon loves to tack unpleasant surprises onto the bill. e.g. unless you ask, the prices they quote don't include things like $27.98 "Taxes, Governmental Surcharges and Fees" and $25.55 "Verizon Surcharges and Other Charges", but the bill includes those. The first 3 months of service also carry an extra $49.99/month "Activation Fee". There were still surprises to be found by careful inspection of the bill. My wife is flummoxed by the notion of keeping an address book organized by name so you can find people's contact information. Her habit is to write the info on scrap paper, which she then loses, so she repeatedly calls 411 to get the number again. In one month that was 30 calls to 411 at $2.49/call. I keep pleading with her to learn how to use Google, but that just turns into my having to do it for her and getting peeved when I recognize that I'm looking up the same information as I'd given to her before. I wrote some of the contacts into a bright orange address book and she acts surprised every time I show her that she already had the information there.

Video problems with FIOS

The other bad surprise, in our 2nd month bill, was $156.03 for a service call we'd placed with Verizon. Multi-room DVR playback to rooms remote from the actual DVR was unreliable. They sent a guy out to trouble shoot the problem, but he didn't fix it and he didn't write it up as a Verizon problem, so they billed us for an hour of labor. I protested and they did deduct the labor and the tax on the labor from that month's bill. We found more ways to demonstrate the video problem and called for yet another repair visit. The 2nd repair guy was unable to fix the problem too, but we learned a little from chatting with the guy while he was here. In the world of Cablevision, so far as I can discern, data service and television service are quite separate, sharing the same physical cable but traveling in separate channels. But with FIOS, each television set-top box gets an IP address. Some television services depend on being able to talk via the Internet router to get video to the set top boxes. Remarkably, the Verizon service folks don't have any kind of protocol analyzer or even a simple bit-error-rate monitor to give them visibility into what is doing on the home network.

The first repairman's visit wasn't without value. He noticed that our HP printer was assigned IP address 192.168.1.100. This had been our household's long standing convention, so any PC could know how to reach the printer without needing DNS or something complicated to find it. But 192.168.1.100 thru 192.168.1.150 are "reserved" by Verizon for use by their set-top boxes. Oops. So I reassigned the printer to be 192.168.1.99 and adjusted the configurations of each of our PC's accordingly. This did make access to the printer more dependable, without having to fuss with power cycling it after a period of disuse, which had been the state of things since we switched to FIOS. But it didn't get rid of the problem with multi-room DVR playback. We'd also found that the FIOS video tutorials, which are short video-on-demand clips from some central server of theirs also had similar problems of occasional stalls and pixelization in playback. The 2nd repairman saw the problem happening and after examining the router configuration opined that we just had too many devices on our home network for the level of service we'd subscribed to. (Our service is their 50/25 speed. For more money, we can have more bandwidth, but since I'd seen the whole house [excluding FIOS video, of course] doing fine with LTE cellular data service via AT&T Mifi, I didn't think that was a good explanation of what was wrong). To the fellow's credit, he did take the time to disconnect portions of our home network from the router until the problem went away. At that point I had to accept that the problem somehow lay with our home network, though further trial and error would be needed to find the real problem.

So, at this point we had as our only instrumentation, that when the problem wasn't present, we could watch Verizon FIOS training videos without a problem, and that when the problem was present, that a few minutes of playing the training video would result in pixelization and stalled playback. I set about through trial and error to find which of the wires he disconnected mattered. My first suspect was the long cat-5 cable to the router in the garage apartment. The dog had occasionally chewed on that cable and we'd taped it up and it seemed to work well enough, but I certainly wasn't prepared to swear the cable still met all cat-5 specifications. I plugged it back in to the FIOS router and the picture stayed clear, so it wasn't that cable. Only other cat-5 cable I knew about was one that runs through the attic to get to the rear bedroom where the printer resides. Puzzler was that while doing my wall crawl to reconnect things, I found 2 cables running up to the attic. What was the additional cable? Took some more crawling in the rear bedroom to figure it out. Turns out we had 2 cat-5 cables running from the front bedroom to the rear bedroom. Once upon a time, one was for the PC in the rear bedroom and the other was for the printer, but now the PC and the printer are connected to an inexpensive 100-base-T switch so the 2nd cat-5 cable to the room is unneeded. Mistakenly, both cables were plugged into the Verizon router in the front bedroom, and both were plugged into the switch in the rear bedroom. That's just plain wrong. I don't think it ever bothered the Linksys router that used to be in the front bedroom, but the FIOS router showed video playback problems unless you disconnected the erroneous 2nd cable from the router to that 100-base-T switch. Oops. Mixing video playback in with the mostly non-realtime Internet data traffic apparently got fouled up by the wiring mistake. What a pity that Verizon didn't have appropriate tools for their field support people to really find this problem. Anyhow, now that the extra wire is disconnected, things have been much better with the multi-room DVR.

More FIOS Woes - You Own the House Wiring

But we did have other FIOS problems. Cablevision ran a coax cable to a "cable modem" in the front bedroom. There were 2 phone jacks on the cable modem that provided our house phone line and our fax phone line. FIOS doesn't work that way. FIOS mounted a big white box on the outside of the house and ran the fiber to that box. They also installed a large UPS on the wall inside the house and ran power from the UPS to the white box on the outside wall. They then re-used Cablevision's coax cables to connect a coax cable from the FIOS box to the various TV's via a splitter box on the outside wall just below the FIOS box. There is no "cable modem" box as we had with Cablevision, but the Verizon Internet router has a coax connection of its own that it uses to talk to the FIOS network, including the settop boxes. The outside FIOS box also has about 8 phone jacks in it. So, Verizon ran 2 phone lines from those jacks to ancient NY Telephone demarc boxes on the outside wall, to tie the 2 phone lines to the ancient inside house wiring. The deal as I understood it was that as long as Verizon provided dial tone at the demarcation box, that any remaining phone problem was a "house wiring" problem and is only covered by Verizon if you pay them a ridiculous monthly surcharge.

After a particularly rainy Spring night here, the Fax machine lost it's dial tone. I know I had a corded phone around the house, but darn if I could find it. So we traipsed off to Walmart where for $6 I bought a nearly feature-less corded phone. It looks sort of like an old Trimline phone, but I found the base is just a place to set down the handset. The wire from the walljack runs straight through the base and into the handset. The touchtone keypad, ringer, etc. are all in the handset.

Armed with my corded phone, I could verify that things were funky at the demarc boxes. For starters, the boxes weren't labeled with current phone numbers, but worse, the installer had hacked around some problem in at least one box so the modular jack for test access wasn't working at all. So we arranged for a service visit for the coming Monday for Verizon to look at lack of a dial tone on our 2nd line.

More FIOS Woes - the Fragility of the Fiber from the Pole to the House

But Friday, before we ever got to that Monday service visit, something happened to our FIOS service. We lost everything - TV, phone and Internet. Verizon, contacted via cell phone, agreed to send a repair guy out on Saturday. The FIOS box has an LED in it to tell the repairman if the fiber is delivering a signal to the FIOS box. It immediately told the repairman that there was no signal. Alas, he arrived on a truck with ladders, but the fiber drop cable to our house is connected to the distribution fiber mid-span, between 2 telephone poles, so the junction is only accessible with a "cherry picker" truck.

From the ground, something looked dangly at that junction, but nothing was actually laying on the ground. So we waited for them to dispatch a cherry picker truck with a buddy of the repairman's to help him. Once that arrived, they determined that something had severed the junction from the distribution fiber to the drop fiber. So they had to replace the drop fiber. This was surprising to me as this is now my 4th drop fiber in the short times I've had FIOS service. The first drop fiber was installed much like this one, but drooped a little low over the street. One day a large cement mixer truck drove down our street and managed to knock down our fiber and the one to the house next door too. So Verizon sent a repair crew to replace the fibers to our homes. They sent 2 trucks, one for each house, but they decided to work together. The lady with the ladders worked the house end of the job and the guy with the cherry picker truck worked out by the poles on the street. I was impressed with the result. At the house, the fiber came off a board sticking up from the peak of the house's roof. It then flew super high across the street direct to the pole at the corner. It was well clear of any trees and certainly too high to ever get struck by a passing truck on the street. Beautiful looking installation. Alas, it was no match to Hurricane Sandy. Fiber #2 ended up on my lawn after Sandy passed through. Annoyingly, Cablevision's coax to our home survived the storm just fine. Since we were off of FIOS by the time of Sandy, we only had Verizon out for a non-emergency clean-up call to get their fiber off of my lawn. Of course, when we resumed FIOS service this Spring, the installer had been told this was a simple re-connect of a previous customer. Lots more to the job than he'd expected. He installed drop-fiber #3. It ran from the board at the roof peak through the trees to the mid-span point where drop fiber #1 had been attached to the distribution plant. I think the board at the peak added enough height that #3 was going to be safe from passing trucks. But the community center opened up and installed a driveway under the distribution span. The guy who installed drop fiber #4, opined that maybe a school bus turning into the community center disturbed the distribution fiber enough to jostle our drop-cable into failing. He put in a request for the outside plant people to come around to raise the distribution cable a smidge where it crosses the driveway, but I didn't see them come to do that. We'll just have to wait and see how long drop fiber #4 lasts.

The Internet tells me that I'm not the only FIOS customer having repeated service calls for fiber repair.Hungry Ants Knock Out FIOS Service ... Again. I can only assume that these problems are evidence that fiber to the home is new technology and there will be some time before they gain enough experience to work out the Field Engineering kinks to make this reliable. The price of repeatedly replacing the drop cable presumably is enough motivation to encourage Verizon to eventually get it right. The Mifi box and my cell phone at least make it tolerable when we have an occasional day of no FIOS service. Of course, there is the annoying whining sound of my wife complaining when she misses an episode of General Hospital, but so it goes.

When the fiber connection to the outside FIOS box was restored, I asked the Saturday repairman to look at the problem of no dial tone to our Fax machine. He advised that the white FIOS box is now the point of demarcation. He verified that there was dial tone available on lines 1 & 2 at the FIOS box, but that means the crappy old NY Telephone demarc boxes are now part of the house wiring that is my responsibility.

So I looked and found on the house a disused demarc box that then had a phone line running back to the rear bedroom. My wife reminded me that some years ago her Aunt Dolly had lived in that rear bedroom and had a phone line of her own. Aunt Dolly has long since moved out of here and is now deceased. We attended her funeral this Spring. So, I snipped the wire off the old NY Telephone demarc box and ran it into line #2 inside the FIOS box. I added a new modular jack to the spot where that line comes into the rear bedroom and then ran a phone cable from that modular jack to the Fax machine. Voila, the Fax machine has dial tone again and we're using less of the ancient house wiring then we'd been using before.

The future of FIOS??

There's some evidence that Verizon is unhappy with their return on investment in converting their distribution network from copper to fiber. e.g. Wall Street Journal: Verizon to End Rollout of FIOS and DSL Reports: Verizon Again Confirms No Future FIOS Expansion. Will the day come when they try to close down their fiber business? Seems unlikely to me, but we do live in "interesting" times.

I've got to wonder where are the regulators in the process of Verizon making non-aggression pacts with the cable television companies. e.g. Verizon’s Anti-Aggression Treaty With Big Cable May Be the End of FiOS. I really think it is better for the economy when the cable companies have at least one competitor in every neighborhood, preferably one with a different technology and the fire in the belly to want to rewire the country to fiber. Verizon once upon a time had that fire, but now, like AT&T, seem to have lost their sense of direction. Too bad!

Cross the Streams?

I keep forgetting that Ghostbusters was an increasingly long time ago (1984). When our FIOS video problems turned out to be from the mixing of Internet traffic and FIOS video packets, Egon's warning about the dangers of "crossing the streams" immediately sprung to my mind. If you haven't seen Ghostbusters recently, you really should rush out and rent it. But thanks to Youtube, we can give you the relevant clips from the movie. First, the scene where Egon warns "Don't cross the streams!" And, second, the climatic scene where the team makes a slight change of plan and deliberately crosses the streams. A bit of a spoiler, but you still really should see the whole movie.

Message In a Bottle

If you know someone who maybe knows someone in Verizon who'd be in a position to officially react to this note, please pass along the link to this article. I'd be delighted if my woes actually got to the ears of someone who could be properly embarrassed at the lack of network diagnostic equipment in the hands of the Verizon field support staff, or the need for improved field engineering of drop fiber installation and could maybe nudge Verizon in the direction of actually doing something about it.

Sunday, June 22, 2014

How Secure is Your part of the Internet?

Problems with Internet security have been much in the news lately as our homes have been getting more and more connected to the Internet with the growth of the Internet of Things. Please note that I'm one of those people who make a distinction between the terms hacker and cracker.

Image borrowed, without permission, from http://ariefuchulz.blogspot.com/2012/02/hacker-vs-cracker.html, apparently the blog of Arief ucHulz.

A specific recent report that caught my eye:

Until We Fix Our Connected Homes, Crackers Will Keep Screaming At Babies

The simple conclusions offered by that article is that if your home is connected to the Internet, to be secure you should:

  1. Take care to set secure passwords on all your devices. Leaving the manufacturer's defaults are just asking for intruders to come by.
  2. Register your devices with their manufacturer so the manufacturer can get in touch with you about security updates. Unsaid is that such registration will open up your e-mail inbox to a likely flood of promotional e-mail (a.k.a. spam).
  3. Keep your device's firmware/software up to date. Unsaid is that not all devices can accommodate updates, and not all manufacturers put much effort into providing updates on old products. Maybe the manufacturer no longer makes that product. Maybe the manufacturer no longer supports that product. Maybe the manufacturer has gone out of business.

The article mentioned briefly that the homeowner had "secured" their home's router that connected their home devices to the Internet. I wished that article had explored that statement in a little more depth. "Securing" a router is quite an essay topic in itself. If you have a router connecting your home to the Internet, please stop and consider how secure is it?

  1. Have you set a secure password so only you have administrator access to your router?
  2. Assuming your router provides you with Wifi (wireless Internet connectivity), have you configured the router to have a serious, non-default password protecting your Wifi network from intruders? There's more than one choice available for Wifi Encryption. Which Wifi encryption option have you picked? If you picked WEP, you'd be well-advised to switch to WPA2. There's freely available software that anyone with a notebook PC within range of your Wifi signal can run (e.g. Automatically crack Wifi with only 30 seconds work. I offer the link as an example, but haven't actually tried to follow that page's instructions myself). The software needs only to listen to your WEP-encrypted traffic for a short time and it will then reveal what the password is that your Wifi network is using. In other words, WEP encryption isn't at all secure if faced with anyone who wants to intrude on your wireless network.
  3. And now we get to the hard question: What is your router configured to do with incoming traffic from the Internet? If the router rejects all packets coming from the outside world, it isn't going to be much use. If you use your browser to visit a web page, you send out packets requesting the web page and web page's server sends back packets that tell your browser what the web page says. If you configured your router to reject those outside packets, you'd likely be most unhappy with your router's behavior.

    Most routers will let you accept only traffic that comes in reply to packets that you sent out to the Internet and that covers most cases. But there are ugly cases where, for example, you initiate an FTP connection and the remote FTP server replies using a different port than the one that you used to initiate the connection. If your router is configured to reject packets that don't look like replies to traffic that you initiated, you're likely going to have trouble doing FTP file transfers.

    A stickier problem is do you ever want to access your home Internet from elsewhere? For example, some folks have home security systems (or perhaps a baby monitor) and want to be able to check in on it from their travel PC while away from home. Almost certainly that requires the router to be configured to allow outside Internet traffic, traffic that isn't in reply to inside-traffic, to come into your home. How secure is your home network to outside traffic coming from a wannabe intruder?

You can try to secure your home, device by device.

Consider, for example, the HP Inkjet printer in my home, an HP 7410 all-in-one printer/scanner/copier with built in Wifi capability. When we first set it up, we put it in a room that didn't have wired ethernet available so we configured it to use that built in wireless capability. Worked wonderfully. Then as I became aware of how easy it was to break WEP encryption, I decided to reconfigure our home router to use WPA2 encryption. Surprise! The HP 7410 printer's built in Wifi support only knew how to handle WEP, not WPA2. I looked into how to upgrade the firmware of the printer, but so far as I can see, the firmware of that printer is permanent, not updateable. Now, for all I know, I could go out and get a new printer that has WPA2 support, and probably has other advantages like, perhaps a faster printer speed and just maybe less expensive ink. (well, I can dream, can't I?). But this printer still works fine, so I'd feel guilty throwing it away just because it doesn't support secure Wifi. So, instead, I grumbled and bought a long Cat-5 cable so I could have a wired ethernet connection from the router to the room with the printer. For less than $10 we also added a wired 8-port 100-base-T ethernet "switch" to that room so we could easily connect all the devices in that room to the wired network. I talked a bit more about my home network back in January. See: Adventures in Home Networking

Barry Shein, one of the early pioneers of the Internet as we know it today, recently posted this note to Facebook:

Internet security is so bad because it was never particularly designed to be secure.

I've been on the net since the 1970s, involved in its technology and politics. I don't remember anyone ever saying or promising "and it has to be secure, right?" until the 1990s.

What happened is in the 1990s a bunch of people figured out how to make A LOT OF MONEY off the net, or save a lot of money, same thing.

But most of their plans required the net to be secure.

So security became a BIG ISSUE. Ok.

It's like coming to a river in your car and thinking hmm, maybe I can just slap some wings on this thing and fly across.

The power of the net is that it enables everyone to share information very easily and very widely.

Now, re-read that sentence with security in mind.

If you aren't uncomfortable yet, I've got more for you to read. Shortly before Barry posted that cautionary note on Facebook quoted up above, he posted on Facebook:

If you try to engage me in a conversation about computer and network security and I don't know for a fact you're an expert I'm going to check whether you read this article. And if you haven't I will politely ice you out.

Everything is Broken

Now there are many different opinions as to what you should do. I don't have the energy or time this morning to track down exact references for what Richard Stallman suggests, but at the risk of mis-reporting what he has in mind, I'll tell you what I think he has said:

  1. Don't trust software that you can't examine and modify on your own.
  2. Don't allow untrusted 3rd parties to have control of the software on your devices. e.g. allowing auto-updates of your PC by Microsoft, Apple,Adobe, Oracle (Java) and Google (Chrome, Android) is imprudent. Even if you trusted the software after careful evaluation of it yesterday, how do you even know what the software you are running today will do?

The trouble with "trust no one!" is that you are cutting yourself off from much of the world. And even if you insist on only running software where you can examine the source code, you likely are only fooling yourself. There's too much software in layers and layers for you to have any hope of being able to detect security problems. Security problems can be quite subtle and hard to recognize. Consider for example the recent brouhaha over the security of OpenSSL in its Debian implementation. The source code was all open and freely available, but it took years for anyone to notice that a security bug had been introduced into the code. The xkcd comic had some good jibes at the security of other open-source systems: http://xkcd.com/424/.

I'll go so far as to suggest that if you refuse to allow auto-update of the software on your devices, you are doomed to never being able to keep anywhere close to current on the latest security updates. There are just too many of them and they come out too often to try to track them by hand. And you'll have a hard time convincing me that the reason you insist on tracking them by hand is you want to research what each one is about before you install it. Good luck with that!

And then there's the problem of web-services like dropbox, gmail, Google drive, Google docs, Facebook, ... Pearltrees, and the list goes on. Generally, you don't get to see the source code that implements those services, and often you have no control over when that service implementation is updated. At some point you have to decide which suppliers you are willing to trust. Stallman will tell you that Facebook surely doesn't belong on that trusted list. My wife has no Facebook account and insists that no one should share her picture or name there. Her children don't buy into that "no Facebook" policy because it would cut them off from keeping in touch with their friends.

I could go on and on, suggesting that you look into "Virtual Private Networks" for securely allowing connections from the outside Internet into your home. But you pretty much have to trust somebody to do the right things to protect you.

But who can you trust? 12 biggest baddest [known] software backdoors of all time. "all time" in that title underestimates what the future could hold. And the article isn't very keen to point out that it is only talking about known backdoors. Goodness knows what unknown backdoors are lurking out there.

In closing, here's a 17 minute TED talk that defends "hackers" as a necessary part of the Internet eco-system. The talk doesn't draw a distinction between hackers and crackers, but so it goes.

Hackers: The Internet's Immune System.

Saturday, September 14, 2013

Bicycles for the mind... A Steve Jobs talk from long ago.

1980

In the Introduction talk to Udacity CS101, Professor Evans mentions Steve Jobs having compared computers to "bicycles for the mind".
I think I've now stumbled across the talk where Jobs more or less said that. Run time for the Jobs talk is about 20 minutes. The bicycle reference is around the 6 minute mark in the Jobs video. Evans attributed the quote to circa 1990, but the talk is said to be from 1980, and given Steve's youthful appearance in the video, I believe the 1980 date.


Speaking of small errors, one of the annoying things in Jobs talk is he speaks of meeting with some 4th and 5th graders, but he then calls them 4-5 year-olds. I think 9-11 year-olds would be a lot closer to correct. 20 kids and 6 Apple computers? What a depressing student/machine ratio that would be these days!


Jobs mentions Visicalc in his talk like that is something that everybody in the audience knows of. But here we are some 30+ years later and I'm no longer sure that you all know what Visicalc was. Happily, Wikipedia remembers. Even if you remember Visicalc, I recommend that you visit the Wikipedia page. It has some wonderful interesting links, including one that lets you download a free copy of Visicalc for your x86 Windows PC. That version was written for DOS 1.0 so it only works within the current directory. DOS 1.0 didn't have directories, just disk drives. Most folks back then didn't have disks with any more capacity than a 5.25" floppy disk. In 1980, that would have maybe been 140KB of storage space.


One other link to particularly take note of is the one that asks "What if Visicalc had been patented?". If you haven't been paying attention to the arguments about software patents and why they are not good for the economy, you really should Google up some background reading for yourself, maybe sit through a Richard Stallman talk or 2 about "intellectual property". Be forewarned that Stallman's talk is a 2 hour talk, so take a bathroom break and get yourself a fresh mug of coffee, tea, or whatever before you fire up the Stallman talk.


If I'm going to mention Steve Jobs and Richard Stallman in the same blog post, it is probably appropriate for me to point also you to this short video where Stallman contrasts his own accomplishments vs. those of Jobs and of Bill Gates.

Time marches on, but progress?

Listening to Jobs 1980 predictions for what the heck we'd do with even more computing power, I can't help but be disappointed with how little real progress we've made on that front. The computing power has, of course, materialized as predicted, and I suppose the graphical user interfaces of Windows, MacOS and web browsers is something of a usability improvement compared to DOS 1.0, but I was sending e-mail and posting netnews items aplenty back in 1980 and it isn't like that process is hugely different today. To keep things in perspective, the Macintosh computer was introduced in 1984. Here's an early 1984 video of Steve Jobs giving a timeline leading up to the Macintosh. It's only about 7 minutes long and includes the famous "1984" teaser ad for the Macintosh; an ad still worth watching, in my opinion. Here's a 10 minute video of Steve Jobs actually introducing and demonstrating the Macintosh.


Still, if you have a problem that you want to solve with a computer, are the barriers to solving your problem significantly lower today or about the same despite the powerful GUI computers that are now available today? If there's an existing product that fits your needs ("there's an app for that"), your path is easy, but if you really need custom software, perhaps a custom database, I expect you still have a rough road ahead. "Cobol is to Visicalc as Y is to Z", but what are Y and Z?


Comparing the Python language to the programming languages of 1980 (C, PL/I, Cobol, Fortran) I guess there's some evidence of our having learned to apply plentiful compute power to making the programming job a little easier, but there's still a steep hill to climb to bring computers to bear on your problem, whatever your problem might be. The Internet, the World Wide Web and search engines seems to be the most evident signs of progress in the computing world since 1980. I do wish the world had more progress to show on reducing the barriers to applying computers to solve problems given the passage of 30+ years since that Jobs talk. Are there specific improvements in the computing world that I'm overlooking here and not giving proper credit to? Should smartphones get mention or are they just scaled down screens with battery powered small computers.


If I was better at HTML, maybe I could rig this article to provide background music as you read the previous paragraph. Or am I being too sentimental about lack of technological progress?


If you are completely unfamiliar with Stallman's contributions to the notions of "free software", you might give a quick read of a past blog post of my own as a way to get started at understanding software licensing and Stallman's GPL in particular: See "Copied Code or Unfortunate Coincidence".

06/11/2014 - Updated: The Steve Jobs talk link went bad! Why didn't anyone tell me with a comment so I'd know? Anyhow, I found a link that works (today).

Thursday, August 22, 2013

What's the fuss about parallel programming?

What's the fuss about parallel programming?

A young friend of mine, now a 2nd year computer engineering student, asked me:

What is parallel programming? Why is parallel programming regarded as important in the future?

I don't have any idea about parallel programming and try to learn  by Googling. Yet,  it is difficult to understand. Why mutable data has been considered as inefficient in programming recently? How it creates problem and in what way functional programming avoids this mess? Will functional programming increase the performance of multicore systems?
Also, to which OS books should I refer? As I am starting my study on my own, and I want to get good at OS and basically able to understand the difference between linux and windows, which book should I follow? Earlier, you said that you are interested in operating systems and also best at it. Please, just suggest me some books which would able to justify the differences between linux and windows technically.
In which language is OS programming done?

Image of multiple processors taken somewhat out of context with a thank-you to Tom's Hardware, a web site where people try to keep up with this stuff


This is my reply to that e-mail...
You ask "What is parallel programming?"   That's a very similar topic to another topic you recently asked about:   concurrent programming.   Both concern how to write programs that do more than one thing at once so that overall performance is improved.   e.g. if the time to run the program depends on "n" (perhaps n is the amount of input data to be processed), then what a parallel program wants to do is apply more than one processor to the problem so the job can be completed sooner than one processor would be able to.

For example, if the job is to sort n items, you might divide the list up into a separate list per processor so each processor needs only sort a shorter list of items.   Of course, before the job is finished, the multiple shorter lists need to be merged together to make the final result.

Distributing the items across the processors is work, merging the lists back together again is work.   Whether the overhead of those extra steps is worth it or not depends on things like how much memory each of the processors has good access to.   If the items divided make a small enough list to fit in the RAM of each processor, then things are probably going to go very fast.    But if the sub-problems are still big enough that you need to spill things out to intermediate work files, and if the extra processors don't have good access to the disk space used to store the spill files, then the dividing up of things might turn out to be a net loss in performance.

http://en.wikipedia.org/wiki/Parallel_programming_model

Moore's Law

You also ask "Why is parallel programming regarded as important for the future?".   Well, if you go way back to the early days of integrated circuits, Gordon Moore predicted in 1965 that the number of transistors on an integrated circuit would double every 2 years.   He thought that observation would hold true for 10 more years or so.   We actually have gotten through a lot more doublings than that and aren't done yet (though folks are starting to fret that they can see ultimate limits ahead - so it won't go on forever).
His prediction was more and more transistors and it isn't entirely obvious that that translates to mean faster computers.   But, in fact, what folks have done with those transistors is figure out ways to apply them to make faster computers.    If you look back to the earliest IBM PC's, the processor chip didn't even do floating point arithmetic.   If you needed faster floating point, you'd have to add a math co-processor onto the motherboard (there was a socket for that additional chip).

I confess to liking that idea of having separate useful pieces that you can custom integrate to create a tailored computer with exactly the strengths that you want.   Alas, the expense of having multiple chips connected together at the circuit board level argues powerfully against that piece-part model of the chip business.   The trend instead has been to absorb more and more functionality into a single chip - whole systems on a chip - just to be rid of the sockets and pins and propagation delays of getting off-chip and on-chip and back again.

So where did all the transistors get spent to speed things up?   Some of it is obvious.   Computers today have amounts of memory that were unthinkable just a few years ago.   Along with more memory, you certainly have more cache and more layers of cache to speed up access to that memory.   There's much to be learned in contemplating why there are more layers of cache instead of just bigger cache.   But that's a more hardware-centric topic than I'm comfortable explaining here as a software guy.

Besides more memory and more registers, the paths and registers have gotten wider.   Where there were 8 bits in the beginning, there are often 64 bits today.    You can try cranking that up in the future to 128 bits, but at some point you get into diminishing returns.   Slinging around 128-bit pointers in a program that could be happy dealing with only 32-bit pointers may not be optimal.    Maybe the problem is just that we need a little more time for programs to comfortably exploit gigantic memory spaces.   My PC today only has 2GB of real RAM.    32 bits is more than enough to directly address that much memory.  2^32 in fact is enough to directly address 4GB of RAM.   So the line of needing more than 32 bits isn't super far away. But 64 bits is enough to directly address 16 exabytes of RAM.   I can't even afford a Terabyte of RAM yet, so needing more than 64-bits is surely a long way away. (1 Terabyte=1024 Gigabytes. 1  Petabyte=1024 Terabytes.   And 1 Exabyte=1024 Petabytes).

http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html

Those are really big numbers.   Bigger than even Doc Brown is likely ready to contemplate:

http://www.youtube.com/watch?v=I5cYgRnfFDA

But it isn't always obvious how best to spend the many transistors that the progress predicted by Moore has provided to us.   I see a certain amount of oscillation in design approaches as things get wide and then get back to serial again.   Look at ATA vs. SATA, for example.

http://en.wikipedia.org/wiki/Serial_ATA

One way to spend transistors is to make more complex circuitry to make the time for each instruction be shorter - do faster multiplication or division, but there's only so far that you can push things in that direction. Current consensus seems to be that making faster and faster processors is getting to be very difficult.   As clock speeds go up, the chip's thirst for electrical power goes up too and with that the amount of heat that has to be taken away from the chip to avoid reducing it to a puddle or a puff of smoke.   So, the industry's current direction is toward spending the transistors on having more processors with moderate speed per processor.   The aggregate instruction rate of such an array of processors multiplies out to nice high numbers of instructions per second, but the challenge is how to effectively apply all those processors to solve a problem faster than an older uniprocessor computer would be able to. Hence the anticipated growing importance of parallel computing in the future.

I think so far I've answered the questions in your subject line.   I hope you have the patience for me to try answering the questions in the body of your mail too.

A Day at the Races

I see your next question is "Why the fuss about mutable data?"   Well, as I understand it, the concern is that if your data is mutable, you need to worry about inter-processor synchronization and locking so that when a processor updates a stored value, that it doesn't interfere with some other processor.
The processing of read-only (immutable) data doesn't have to worry about locking and synchronization.  But consider something as simple as A=A+1, where A is a mutable value.    Underneath it all, your processor needs to figure out where the value of A is stored, fetch the value into some arithmetic register, add 1 to the value and store the value back into the location for A.   If A is accessible only to your one processor, there's little to sweat about, but if A is accessible to multiple processors there's a potential for a race.   What if both processors have fetched the value of A and both have incremented their copy.    Only one of them has the right answer.   If they both store their new values for A back to the shared location, the final result is one less than it ought to be.

One solution is to have specialized hardware that makes the A=A+1 operation be atomic, indivisible, so there's no chance of one processor seeing the old value when it should be using a new value.

There's the challenge of figuring out exactly which atomic instructions are most useful additions to your instruction set design.   IBM mainframes had an interesting, though complicated instruction called compare and swap.   As I remember it, the instruction took 2 registers and a memory location.   If the first register matched the value in the memory location, then the 2nd register would be stored into the memory location.   If they didn't match, then the memory location would be loaded into the 1st register. And the whole operation was indivisible.    So a processor could do it without having to worry about whether some other processor was operating on the same memory location.   So, you could use compare and swap to do our A=A+1 operation safely.   You fetch the value of A into a register. You copy that register to a 2nd register.   Add 1 to the 2nd register.    Now do a compare-and swap to store the result back to memory.   If the compare and swap sets the condition code that says the 1st register didn't match, then sorry, but you have to repeat your computation.   Copy the newer value from the first register to the 2nd register. Add 1 to the (new) value to get a newer value and try the compare and swap again.   Of course, if there are many processors in hot contention for the value of A, then you might have to spin for a while in that loop trying to compute the right value and get it back before it becomes stale.

The compare-and-swap instruction can be used for more than A=A+1 kinds of computations.    For instance consider a linked list of items, perhaps the run-able thread list in your operating system kernel.   You want to be able to remove an item from that list.   That involves fetching the link to the next item, fetching the link to the item after that and then storing the link to the next next item into the location where the link to the item you are removing from the list came from.

    A  ----> B ----> C becomes A ----> C

As with the A=A+1 case, there's the potential for a race if there are multiple processors that are contending to pick B off the list.  compare-and-swap can at least make it safe from races, but again, if there is hot contention among many processors, there can be much wasted spinning before a processor succeeds in grabbing B off the list.

So, if you have careful control at the machine instruction level, the problem is practically solved.   But that sort of implies that you drop down into assembler language from time to time or you have a compiler that generates incredibly clever object code that knows where to use these specialized multi-processing instructions.   What if you are using a garbage-collected language like Java or Python?   Maybe your problem is worse than the value of A that you used in your computation became stale between your fetch and your store back to memory.   Maybe the location of A has changed entirely and your store operation is smashing something else entirely different than the variable A.   Big trouble ahead...   In fact, if you think in terms of Python, maybe by the time you are trying to store the new value, A isn't even an integer any more. "Gee, it was an integer value when I fetched it.   Who the heck changed it to be a floating point number in the meanwhile?".   Could be subtler: Python will happily and silently promote a int to a long if the value gets too big to fit into an int, so you need to be very careful that the value you fetched still makes sense before you store the result back to memory.

The article I pointed you to the other day "Downfall of Imperative Programming” asserts that "Imperative programs will always be vulnerable to race conditions because they have mutable variables".   So functional programming languages, by avoiding mutable variables, dodge a major bullet in the multiprocessing world. The thing that I don't know is how to be sufficiently productive in functional programming languages for Haskell to be worth the trouble to learn.   The Downfall article predicts that the race conditions are an insoluble problem for imperative programming language implementations.  I'll happily accept that there's trouble ahead to watch out for, but I do have a bit of difficulty accepting that the races absolutely can't be resolved.

Python's Global Interpreter Lock

Python worries about the possibility of races among threads in interpreting the instructions of Python code. They have a "Global Interpreter Lock" (GIL) to assure that one interpreter thread won't change a value in use by another interpreter thread. Folks worry that this coarse level of locking will keep Python programs from being able to scale up with increasing numbers of processors.
I've seen some clever dodges of the GIL in Python programs, mainly by spreading the program across separate address spaces (multiple Python interpreters, each with their own GIL) and limiting interprocess interaction to some carefully controlled set of places in the code with appropriate locking protections.  On the one hand, this doesn't give transparent scaling up from a uniprocessor to M processors all running in parallel, but on the other hand, it does get the job done.

My (weak) excuse for not having more first hand experience with this...

My home PC doesn't bring multiprocessors to the party.   Some day I hope to replace it with an i5-ish based computer with 64-bit addressing and >4GB of memory.   As a retiree with a rather modest pension, that's a discretionary expense that I've been postponing into the future.  Maybe in the meanwhile my target will shift to something with way more processors than an i5.   What I have in mind is something with enough oomph to be able to run Linux and Windows both in virtual machines (Based on Xen, VMWare, something else?  I don't know...). Heck, Microsoft isn't even making it easy to buy such a configuration without paying for a Windows license twice (once bundled into the PC's base price and then again for an installable copy that can be installed into a VM).  I'm assuming that a re-install CD that wants to reload Windows onto a bare PC isn't going to be able to install into a VM environment.   I'm expecting that multi-processor race conditions and their associated problems will come along naturally to my world once I have a rich enough configuration and that encountering those problems on more than just paper will motivate me into doing something about them.
Maybe I'm just old-fashioned in thinking that what I need is a richer computing environment here at home.   Maybe the right thing to do is to venture out into things like Amazon's Cloud Computing service and see what kind of trouble I can get into using other people's multi-processors via the Internet.  One of my worries about that, is maybe the underlying MP nature of their cloud services is too deeply wrapped for me to really see the problems I'd be up against from MP.   And, "look, dear, here's the marvelous new computer I just bought" is a much easier conversation to anticipate having with my wife then "Just let me pay this bill for cloud services.   It isn't so much money and I did really learn from having tried their services."

Comparative Operating Systems

You ask me to recommend an OS book to better understand Windows vs. Linux.  I don't know which book is the right choice.    Certainly an Amazon or Google search will turn up a large number of candidate titles. Perhaps your school's library has some of those titles so you can look them over, or perhaps they can arrange inter-library loans for you to be able to look over some of the candidate titles.   "Which of these is best" is always a tricky question because the answer depends so much on your particular criteria for "best"
So let me turn this around and ask you for a summary of your findings from digging into the too-long list of candidate titles and your recommendation.   You might want to ask your question of your school's professor for the OS classes too.   Maybe he's got a more formed opinion on this topic than I have.

Linux Weekly News

Meanwhile, I stand by my suggestion that you should make an effort to keep up with lwn.net  (free of charge at the price of having to lag back a week from the most current articles) to see what is going on in the Linux world.  Don't feel obligated to have the newest and most experimental kernel on your home PC,  But if you spend some time watching the evolution and planning of kernels, you'll have a better idea of Linux's strengths and weaknesses and what "they" are doing about the weaknesses.  Unlike Windows, if you are sufficiently motivated to want Linux to be different then it is today, you can make that happen.

Kernel programming languages?

What programming languages show up in OS programming?  Well, at this time, I expect the correct answer to that is C.  Other languages (e.g.Java and Python) do show up in supporting roles, but generally don't make it into kernel code.   Even C++ tends to need too demanding an environment to be a good candidate for kernel code.   Maybe as time goes on the kernel will sprout suitable layers of capability to make higher level languages more attractive for implementing functionality within the kernel but right now if someone tells you a kernel is written in C++, ask them more questions to confirm that.   It wasn't all that long ago that the likely choice for programming an OS kernel was surely assembler language.  Unix introduced the C language and the then radical idea of using a higher level language in the kernel and even having kernel code that is somewhat portable across computing system architectures. (To calm the historians in the audience, I'll concede here that I may be under-crediting the Multics operating system, portions of which were written in PL/I. And the Multics site gives credit to Burroughs for having done a kernel in Algol, but that's way before even my time).
Stack overflow article on the languages of the Android OS:

http://stackoverflow.com/questions/12544360/on-what-programming-language-is-android-os-and-its-kernel-written

Stack overflow article on the languages of MacOS, Windows and Linux:

http://stackoverflow.com/questions/580292/what-languages-are-windows-mac-os-x-and-linux-written-in

Not every answer is to be trusted to be correct on stackoverflow....

One sub-link of that article that I followed and that does look interesting and credible:

http://www.lextrait.com/vincent/implementations.html

lwn.net article on what's new in the Linux 3.11 kernel expected to become available in September 2013...
http://lwn.net/Articles/558940/
This is a particularly interesting link from one of the many comments on that lwn.net article about 3.11:
http://www.softpanorama.org/People/Torvalds/Finland_period/xenix_microsoft_shortlived_love_affair_with_unix.shtml

In Closing...

You quote me as saying that I'm best at operating systems.   I tried rummaging in old mail to you to put that statement in context, but didn't succeed in tracking down what I said.   I will concede that I'm especially interested in operating systems, and given a list of computer science topics, I'm probably more interested in operating system then in most of the others, but claiming I'm best in operating systems sounds like it surely needs some context.


I confess that except for command line pipelines, I've never actually written a multi-threaded program of my own. So don't assume more expertise here than I actually have.