Assignment 1: Write a simple web scraper that grabs text from a page or series of pages and saves it to a file.
Since we are using Beautiful Soup to scrape sites, hence precluding those using .js, and since we are just amalgamating the results to a list, I thought the best sources would be retail sites.
While Mortuary Mall was intriguing just out of personal interest, I decided to go with Hammacher Schlemmer, because nothing says absurdist capitalism like a company that sells a machine to make “The perfect Ice Cube” for $800. One. Ice. Cube….. Going for the most bang for my buck, I scraped from “The Unexpected” section.
The code is super simple and Beautiful Soup worked…beautifully. I had some annoying formatting issues and I probably dealt with those inefficiently, but I did get the desired output. However, when I went to send the results to a text file, I was getting the following error, which I understand what it’s getting at, but need to ask you how to deal with that in this specific program: ‘ascii’ codec can’t encode character u’\u2019′ in position 999: ordinal not in range(128)
Maybe we can petition for The Productivity Boosting Nap Pod. ($16,000.00) at ITP?