Wednesday, May 20, 2015

Installing Scrapy + Scrapy Tutorial baby steps

Installing Scrapy was somewhat of a challenge for me. *Shakes fist at the gods*

I couldn't get it to work through pip install Scrapy

It kept saying that Twisted was not installed.

But I did get it to work with sudo easy_install Scrapy

Go figure.

After Scrapy was finally installed, when I tried to start the tutorial project, it gave me the following error message:


UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'.  Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied.  Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification.  Many valid certificate/hostname mappings may be rejected.

Do not visit the link. Just enter this into your command line:
pip install service_identity and you should be good to go!

Of course, after that, I hit another roadblock right away. I couldn't get my spider to crawl because I was in the wrong directory. Your "project directory" refers to the outer "tutorial" folder (highlighted below), NOT the folder you created to store your project.
tutorial/
    scrapy.cfg
    tutorial/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
            ...
More notes from a newbie, just in case this helps someone out there:
-To uninstall something (replace something with the name of whatever it is you want to uninstall), simply pip uninstall something

-If that doesn't work, try inserting sudo in front of pip.

-To see what packages you have installed on your computer already, enter pip freeze and your Terminal will spit out a list of packages along with versions. Very helpful.

-To exit a Scrapy shell, ctrl+D

SQL challenge: Customer's Orders

Challenge: customer's orders (solution in white below)

SELECT customers.name, customers.email, sum(orders.price) as total_purchase FROM customers LEFT OUTER JOIN orders ON customers.id = orders.customer_id GROUP BY customer_id ORDER BY total_purchase DESC

Takeaway notes from a newbie:
To sort in descending order, use DESC

LEFT OUTER JOIN refers to LEFT as customers in this case (the first one listed). You need to put the table with more info on the left, and join it with the right. In this case we have customers who may not have placed any orders yet. So all orders have customers who placed them, but not all customers have placed orders. So you need orders to go on the left.

SQL - Joining tables to themselves

It's strange what kinds of things you stow away from all those years of school. I remember the day I first learned in biology class that most bacteria reproduce asexually. No couples therapy for them, I guess. No nights in the doghouse or roses on Valentine's Day or any of that nonsense that humans have to deal with just to please their significant others.

My abnormal(?) fascination with this concept probably explains why I immediately thought of bacteria when I got to the lecture on self-joining tables in the Khan Academy SQL course.

This is a rather twisted topic, so I thought I would write out the concepts, just to help myself remember and understand what's going on when you try to join a table to itself.

We'll use the example from the course. We have a table of students' first names, last names, email addresses and buddy_ids. Each student has a different buddy. If we want to generate a table that shows Student 1's first name & last name alongside Student 1's buddy's email, then we would issue the following command:
SELECT students.first_name, students.last_name, buddies.email as buddy_email FROM students JOIN students buddies ON students.buddy_id = buddies.id;

In a normal JOIN command you have two separate tables, but since in this case, you're joining a table to itself, you need to create an alias for the table name. In this case, the alias for students is buddies, indicated by "students buddies" above.

The ON portion is what ties the fields together (what info to join the data on).

"buddies.email as buddy_email" -- the "as buddy_email" is not required, it's just what you want the column name to be in the data you're selecting.

Tuesday, May 12, 2015

New SQL course on Khan Academy

When it comes to a book or movie, the sequel is usually not as good as the original.

But that doesn't mean you shouldn't give SQL a chance! This course seems pretty great so far.