Thursday, September 17, 2015

Committing code to both Heroku and Github

I found it a bit confusing that Heroku uses git, but whenever I entered git push heroku master, I wasn't updating my github repository...I was only updating the Heroku git location.

To make sure you update both locations correctly, from your master branch, enter

git remote -v
in the command line to see what other branches you currently have set up.

If you are in the midst of working on a Heroku deployment, but have also committed code to a Github repository in the past, you will get a list of both Heroku & Github links when you enter the command above (differentiated clearly in parentheses after each link).

This is useful if you want to double check that you have the correct Github repository set as "origin" before doing committing code by typing:
git push origin master

Hooray! 

Wednesday, August 12, 2015

Handling Django secret key & other sensitive info

I have been rather confused over the whole process of keeping sensitive info out of my Django settings.py file.

This is unavoidable if you ever plan on uploading your code to Github...which is pretty much what you have to do if you ever want to deploy & share your project with the world! I wrestled with the idea of uploading to a private BitBucket repository but decided not to take any shortcuts. As this page demonstrates, there are many ways to go about doing this. I just found this way to make the most sense for my simple application.

I chose to go with Marina Mele's method:

Add this get_env_variable() function to your settings.py:

def get_env_variable(var_name):
    """ Get the environment variable or return exception """
    try:
        return os.environ[var_name]
    except KeyError:
        error_msg = "Set the %s environment variable" % var_name
        raise ImproperlyConfigured(error_msg)

Then copy the text SECRET_KEY = 'YOURSECRETKEYHERE' (the one automatically generated when you started your Django project) to your clipboard...you will be using it very soon. Add this line to settings.py in its place:
SECRET_KEY = get_env_variable('SECRET_KEY')

And then in your command line, activate your virtual environment. (Type everything after the $ on each line)

$ workon (name of your virtual env here)
$ cd $VIRTUAL_ENV/bin
$ subl postactivate

This should open up your postactivate file in Sublime Text. If you don't have this set up in your bash profile just yet, add this line to your bash_profile*:
alias subl="/Applications/Sublime\ Text\ 2.app/Contents/SharedSupport/bin/subl"

Paste this into the file (leave the quotes):
export SECRET_KEY='YOURSECRETKEYHERE'

Save the file and close it. Back in your command line, type:
$ subl predeactivate
and add the line:
unset SECRET_KEY

Save and close. 

Now we are ready to test that it worked! In the command line, activate your virtual environment and type:
$ echo $SECRET_KEY 
It should spit out the value you just added to the postactivate file.

Deactivate your virtual environment and press the up arrow to echo $SECRET_KEY again. A blank line should appear. Your command line's lips are sealed because your virtual environment is no longer active! Once you deactivate your virtual env, "predeactivate" runs, so that SECRET_KEY is unset.

Do the same to store other sensitive data, like secret API keys and such.

*To add something to your bash profile, type this into the command line: 
open -e .bash_profile
Add the line to the file in TextEdit and save and close. To make these changes take effect without closing and reopening Terminal, type one last command:
source ~/.bash_profile

Thursday, June 4, 2015

My First Slurp of Beautiful Soup

I have seen the movie Mary Poppins many times in my life. Her magical bag held lamps and all sort so ginormous objects. Her snapping fingers cleaned a room. Her umbrella helped her fly. She was magical.

And you know what else is magical? Beautiful Soup!

As Mary Poppins might say, a spoonful of Beautiful Soup makes the medicine go down...

The medicine, in this case, is the task of parsing HTML. Beautiful Soup will rescue you from the horrors of regex and help you navigate the DOM like a classy person.

I used Goodreads' Most Read Books in the U.S. as a platform for my first soupy adventure.

I wanted to scrape the book urls and book titles from all 50 books on the list.

A quick look at the source code reveals that each book has a link tag with a class of "bookTitle".

Recipe for Beautiful Soup
1. To use Beautiful Soup, simply install with pip install beautifulsoup4 and then import it at the top of your python file: from bs4 import BeautifulSoup

2. Create the soup.
book_soup = BeautifulSoup(page.content)

3. Generate a list of all segments of the DOM that start with: <a class="bookTitle"
book_links = book_soup.find_all("a", class_="bookTitle")

4. Use list comprehension to compose a list of book titles from the aforementioned list
    popular_book_titles = [book.find('span').get_text() for book in book_links]

5. Use list comprehension to compose a list of book links.
    popular_book_urls = [url + link.get('href') for link in book_links]

6. Ladle out the results.
    return (popular_book_titles, popular_book_urls)

Tuesday, June 2, 2015

Goodreads API + Python = An Adventure

I've been playing around with the Goodreads for about a week. Here are some things I learned the hard way:

-Use rauth. I tried using oauth2 but kept getting a "Invalid OAuth Request" message. Then I realized it was a problem with the library. I somehow missed this sentence on the API documentation: We recommend using rauth instead of oauth2 since oauth2 hasn't been updated in quite a while.

-According to a developer message board on Goodreads, the Goodreads API only supports OAuth1 (for the indefinite future), so don't accidentally start using/reading the OAuth2Services or OAuth2Sessions sections of the rauth docs.

-Look at this. Very, very carefully. It will help you set up your request token so that your app's user can grant you access to make changes to the user's Goodreads account.

The examples on the aforementioned resource concerned setting up an OAuth1Session and adding a book to a user's shelf. Both very important, but nothing about how to GET information rather than POST information from/to a user's account.

Getting the user ID proved to be a challenge. Once a user has granted you access through rauth, the Goodreads API documentation says:

Basically this explains that it returns an xml response with the Goodreads user_id. However, the response will consist of more than just the user_id...you will have to do a bit of digging to find that exact information.

I created a function called get_user_id() that uses the GET url to get the xml object
user_stuff = session.get('/api/auth_user.xml')

Then use parseString (make sure you include this import statement at the top: from xml.dom.minidom import parseString) to parse the xml.

The important part is that you need to parse the xml content, not the xml object. 
xml_stuff = parseString(user_stuff.content) 

Now you're ready to getElementsByTagName. Thanks to Steve Kertes's code on github, I was able to figure out that user_id = xml_stuff.getElementsByTagName('user')[0].attributes['id].value

Now just return the str(user_id) and you're done! High five yourself. One small step for API experts, one giant leap for me. 

I owe a million thanks to this code, this, and this. Where would I be without the Internet???

Wednesday, May 20, 2015

Installing Scrapy + Scrapy Tutorial baby steps

Installing Scrapy was somewhat of a challenge for me. *Shakes fist at the gods*

I couldn't get it to work through pip install Scrapy

It kept saying that Twisted was not installed.

But I did get it to work with sudo easy_install Scrapy

Go figure.

After Scrapy was finally installed, when I tried to start the tutorial project, it gave me the following error message:


UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'.  Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied.  Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification.  Many valid certificate/hostname mappings may be rejected.

Do not visit the link. Just enter this into your command line:
pip install service_identity and you should be good to go!

Of course, after that, I hit another roadblock right away. I couldn't get my spider to crawl because I was in the wrong directory. Your "project directory" refers to the outer "tutorial" folder (highlighted below), NOT the folder you created to store your project.
tutorial/
    scrapy.cfg
    tutorial/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
            ...
More notes from a newbie, just in case this helps someone out there:
-To uninstall something (replace something with the name of whatever it is you want to uninstall), simply pip uninstall something

-If that doesn't work, try inserting sudo in front of pip.

-To see what packages you have installed on your computer already, enter pip freeze and your Terminal will spit out a list of packages along with versions. Very helpful.

-To exit a Scrapy shell, ctrl+D

SQL challenge: Customer's Orders

Challenge: customer's orders (solution in white below)

SELECT customers.name, customers.email, sum(orders.price) as total_purchase FROM customers LEFT OUTER JOIN orders ON customers.id = orders.customer_id GROUP BY customer_id ORDER BY total_purchase DESC

Takeaway notes from a newbie:
To sort in descending order, use DESC

LEFT OUTER JOIN refers to LEFT as customers in this case (the first one listed). You need to put the table with more info on the left, and join it with the right. In this case we have customers who may not have placed any orders yet. So all orders have customers who placed them, but not all customers have placed orders. So you need orders to go on the left.

SQL - Joining tables to themselves

It's strange what kinds of things you stow away from all those years of school. I remember the day I first learned in biology class that most bacteria reproduce asexually. No couples therapy for them, I guess. No nights in the doghouse or roses on Valentine's Day or any of that nonsense that humans have to deal with just to please their significant others.

My abnormal(?) fascination with this concept probably explains why I immediately thought of bacteria when I got to the lecture on self-joining tables in the Khan Academy SQL course.

This is a rather twisted topic, so I thought I would write out the concepts, just to help myself remember and understand what's going on when you try to join a table to itself.

We'll use the example from the course. We have a table of students' first names, last names, email addresses and buddy_ids. Each student has a different buddy. If we want to generate a table that shows Student 1's first name & last name alongside Student 1's buddy's email, then we would issue the following command:
SELECT students.first_name, students.last_name, buddies.email as buddy_email FROM students JOIN students buddies ON students.buddy_id = buddies.id;

In a normal JOIN command you have two separate tables, but since in this case, you're joining a table to itself, you need to create an alias for the table name. In this case, the alias for students is buddies, indicated by "students buddies" above.

The ON portion is what ties the fields together (what info to join the data on).

"buddies.email as buddy_email" -- the "as buddy_email" is not required, it's just what you want the column name to be in the data you're selecting.