Cookies and Chips: 2015

Friday, November 6, 2015

How to Use virtualenvwrapper: A Basic Overview

Once you have virtualenvwrapper installed...here's how to:

CREATE A NEW VIRTUAL ENVIRONMENT

mkvirtualenv env1

will create a virtual environment by the name of env1.

OPEN UP A VIRTUAL ENVIRONMENT
First things first, go reward yourself with a bag of Cheetos. That was hard work. Once you've wiped all the neon orange, dangerously cheesy residue off your fingers, and you're ready to work, enter

workon env1

into your Terminal, and you're good to go.

In your command line, you know you're successfully in a virtual environment when you see the name of the environment in parenthesis before the $ symbol, like this:

(env1)$

EXIT YOUR VIRTUAL ENVIRONMENT
Simply enter:

deactivate

The parenthesis should be gone and the next line in your Terminal should look normal.

FORGOT YOUR VIRTUAL ENVIRONMENTS?
To see a list all the virtual environments on your system, fire up your Terminal and run

lsvirtualenv

To see what packages you have installed in a particular virtual environment, activate the virtual environment with:

workon name_of_your_virtual_environment

and then enter:

pip list

Easy as pie. Or sloth snacks.

Saturday, October 24, 2015

Using Selenium + Python for Scraping Sites with Usernames

I've been playing around with scraping data from a website that requires a username and password to view information related to my profile.

Until recently, I was able to log in by using code like the below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
# log in
link = (insert login page link here as a string)
driver.get(link)
userbox = driver.find_element_by_id("Username")
userbox.send_keys(myuser)
passbox = driver.find_element_by_id("Password")
passbox.send_keys(mypass)
passbox.send_keys(Keys.RETURN)

However, the site recently implemented a CAPTCHA hurdle into its login process, which means that the above code no longer works.

Firefox Profiles to the rescue!

In the course of my research, I learned that Selenium pros tend to prefer using custom profiles for faster page loads anyway, so maybe this was a blessing in disguise. Plus, I learned something new!

How to Bypass a CAPTCHA/Log-in Page With Selenium WebDriver

First, create a Firefox profile.
What's a Firefox profile, you ask? Mozilla says: "Firefox saves your personal information such as bookmarks, passwords, and user preferences in a set of files called your profile, which is stored in a separate location from the Firefox program files."

You have a default profile already, but let's create one just in case you want to test Selenium with different settings than you normally use.

In Terminal, run /Applications/Firefox.app/Contents/MacOS/firefox-bin -P

You will be led to a window that directs you to set up a profile in a new clean instance of the Firefox app. This will appear on your dock way under your other app icons. Now make sure to log in to the site you want to test on, and it will create a cookie that saves your password so next time you visit using Selenium you will be able to bypass the login and CAPTCHA test.

Reminder: Make sure to log in on Firefox using this special profile* (by running the Terminal command mentioned above) because cookies expire and your Firefox profile won't be able to access the page if you haven't logged in recently enough.

*You can also check the box that says "use the selected profile without asking at startup" if you want to just use this profile all the time, not just for Selenium stuff.

Update Your Code

It's time to update your code to include this Firefox profile. Depending on where your python file is located, you should update the path accordingly. The below assumes that your code is located in a folder that's two levels below your Library folder. Don't worry, the space between "Application" and "Support" is not a problem. I've highlighted the areas of code that need to be personalized based on your specific needs.

profile = webdriver.FirefoxProfile('../../Library/Application Support/Firefox/Profiles/yourprofilenamehere')

driver = webdriver.Firefox(firefox_profile=profile)

driver.get(link)

Happy scraping!

Thursday, September 17, 2015

Error handling

The worst feeling in the world is seeing that ugly default page whenever you stumble across an error in your Django app. Make lemonade out of those lemons by customizing your error pages like so:

Add this to urls.py:
handler404 = 'eat_decisive.views.handler404'
handler500 = 'eat_decisive.views.handler500'

Add this to views.py:
def handler404(request):
return render(request, '404.html')

def handler500(request):
return render(request, '500.html')

---

Create your html files (place in top level of your templates folder)
404.html

       

{% extends 'eat_decisive/base.html' %}



{% load staticfiles %}



{% block title %}Page not found {% endblock title %}



{% block body_block %}

<h1> YOUR ERROR HEADLINE HERE. </h1>



<p> YOUR ERROR MESSAGE HERE. </p>

{% endblock body_block %}

Now create a similar file for 500.html.

I added a Jane Austen quote to my 404 page, just for kicks. And because I love the phrase "intolerably stupid." Gotta remember to use that more often.

Committing code to both Heroku and Github

I found it a bit confusing that Heroku uses git, but whenever I entered git push heroku master, I wasn't updating my github repository...I was only updating the Heroku git location.

To make sure you update both locations correctly, from your master branch, enter

git remote -v

in the command line to see what other branches you currently have set up.

If you are in the midst of working on a Heroku deployment, but have also committed code to a Github repository in the past, you will get a list of both Heroku & Github links when you enter the command above (differentiated clearly in parentheses after each link).

This is useful if you want to double check that you have the correct Github repository set as "origin" before doing committing code by typing:

git push origin master

Hooray!

Wednesday, August 12, 2015

Handling Django secret key & other sensitive info

I have been rather confused over the whole process of keeping sensitive info out of my Django settings.py file.

This is unavoidable if you ever plan on uploading your code to Github...which is pretty much what you have to do if you ever want to deploy & share your project with the world! I wrestled with the idea of uploading to a private BitBucket repository but decided not to take any shortcuts. As this page demonstrates, there are many ways to go about doing this. I just found this way to make the most sense for my simple application.

I chose to go with Marina Mele's method:

Add this get_env_variable() function to your settings.py:

def get_env_variable(var_name):
""" Get the environment variable or return exception """
try:
return os.environ[var_name]
except KeyError:
error_msg = "Set the %s environment variable" % var_name
raise ImproperlyConfigured(error_msg)

Then copy the text SECRET_KEY = 'YOURSECRETKEYHERE' (the one automatically generated when you started your Django project) to your clipboard...you will be using it very soon. Add this line to settings.py in its place:
SECRET_KEY = get_env_variable('SECRET_KEY')

And then in your command line, activate your virtual environment. (Type everything after the $ on each line)

$ workon (name of your virtual env here)

$ cd $VIRTUAL_ENV/bin

$ subl postactivate

This should open up your postactivate file in Sublime Text. If you don't have this set up in your bash profile just yet, add this line to your bash_profile*:

alias subl="/Applications/Sublime\ Text\ 2.app/Contents/SharedSupport/bin/subl"

Paste this into the file (leave the quotes):
export SECRET_KEY='YOURSECRETKEYHERE'

Save the file and close it. Back in your command line, type:

$ subl predeactivate

and add the line:

unset SECRET_KEY

Save and close.

Now we are ready to test that it worked! In the command line, activate your virtual environment and type:

$ echo $SECRET_KEY

It should spit out the value you just added to the postactivate file.

Deactivate your virtual environment and press the up arrow to echo $SECRET_KEY again. A blank line should appear. Your command line's lips are sealed because your virtual environment is no longer active! Once you deactivate your virtual env, "predeactivate" runs, so that SECRET_KEY is unset.

Do the same to store other sensitive data, like secret API keys and such.

*To add something to your bash profile, type this into the command line:

open -e .bash_profile

Add the line to the file in TextEdit and save and close. To make these changes take effect without closing and reopening Terminal, type one last command:

source ~/.bash_profile

Thursday, June 4, 2015

My First Slurp of Beautiful Soup

I have seen the movie Mary Poppins many times in my life. Her magical bag held lamps and all sort so ginormous objects. Her snapping fingers cleaned a room. Her umbrella helped her fly. She was magical.

And you know what else is magical? Beautiful Soup!

As Mary Poppins might say, a spoonful of Beautiful Soup makes the medicine go down...

The medicine, in this case, is the task of parsing HTML. Beautiful Soup will rescue you from the horrors of regex and help you navigate the DOM like a classy person.

I used Goodreads' Most Read Books in the U.S. as a platform for my first soupy adventure.

I wanted to scrape the book urls and book titles from all 50 books on the list.

A quick look at the source code reveals that each book has a link tag with a class of "bookTitle".

Recipe for Beautiful Soup
1. To use Beautiful Soup, simply install with pip install beautifulsoup4 and then import it at the top of your python file: from bs4 import BeautifulSoup

2. Create the soup.
book_soup = BeautifulSoup(page.content)

3. Generate a list of all segments of the DOM that start with: <a class="bookTitle"
book_links = book_soup.find_all("a", class_="bookTitle")

4. Use list comprehension to compose a list of book titles from the aforementioned list
popular_book_titles = [book.find('span').get_text() for book in book_links]

5. Use list comprehension to compose a list of book links.

popular_book_urls = [url + link.get('href') for link in book_links]

6. Ladle out the results.

return (popular_book_titles, popular_book_urls)

Tuesday, June 2, 2015

Goodreads API + Python = An Adventure

I've been playing around with the Goodreads for about a week. Here are some things I learned the hard way:

-Use rauth. I tried using oauth2 but kept getting a "Invalid OAuth Request" message. Then I realized it was a problem with the library. I somehow missed this sentence on the API documentation: We recommend using rauth instead of oauth2 since oauth2 hasn't been updated in quite a while.

-According to a developer message board on Goodreads, the Goodreads API only supports OAuth1 (for the indefinite future), so don't accidentally start using/reading the OAuth2Services or OAuth2Sessions sections of the rauth docs.

-Look at this. Very, very carefully. It will help you set up your request token so that your app's user can grant you access to make changes to the user's Goodreads account.

The examples on the aforementioned resource concerned setting up an OAuth1Session and adding a book to a user's shelf. Both very important, but nothing about how to GET information rather than POST information from/to a user's account.

Getting the user ID proved to be a challenge. Once a user has granted you access through rauth, the Goodreads API documentation says:

Basically this explains that it returns an xml response with the Goodreads user_id. However, the response will consist of more than just the user_id...you will have to do a bit of digging to find that exact information.

I created a function called get_user_id() that uses the GET url to get the xml object.

user_stuff = session.get('/api/auth_user.xml')

Then use parseString (make sure you include this import statement at the top: from xml.dom.minidom import parseString) to parse the xml.

The important part is that you need to parse the xml content, not the xml object.

xml_stuff = parseString(user_stuff.content)

Now you're ready to getElementsByTagName. Thanks to Steve Kertes's code on github, I was able to figure out that user_id = xml_stuff.getElementsByTagName('user')[0].attributes['id].value

Now just return the str(user_id) and you're done! High five yourself. One small step for API experts, one giant leap for me.

I owe a million thanks to this code, this, and this. Where would I be without the Internet???

Wednesday, May 20, 2015

Installing Scrapy + Scrapy Tutorial baby steps

Installing Scrapy was somewhat of a challenge for me. *Shakes fist at the gods*

I couldn't get it to work through pip install Scrapy

It kept saying that Twisted was not installed.

But I did get it to work with sudo easy_install Scrapy

Go figure.

After Scrapy was finally installed, when I tried to start the tutorial project, it gave me the following error message:

UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'. Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied. Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification. Many valid certificate/hostname mappings may be rejected.

Do not visit the link. Just enter this into your command line:
pip install service_identity and you should be good to go!

Of course, after that, I hit another roadblock right away. I couldn't get my spider to crawl because I was in the wrong directory. Your "project directory" refers to the outer "tutorial" folder (highlighted below), NOT the folder you created to store your project.

tutorial/
    scrapy.cfg
    tutorial/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
            ...

More notes from a newbie, just in case this helps someone out there:
-To uninstall something (replace something with the name of whatever it is you want to uninstall), simply pip uninstall something

-If that doesn't work, try inserting sudo in front of pip.

-To see what packages you have installed on your computer already, enter pip freeze and your Terminal will spit out a list of packages along with versions. Very helpful.

-To exit a Scrapy shell, ctrl+D

SQL challenge: Customer's Orders

Challenge: customer's orders (solution in white below)

SELECT customers.name, customers.email, sum(orders.price) as total_purchase FROM customers LEFT OUTER JOIN orders ON customers.id = orders.customer_id GROUP BY customer_id ORDER BY total_purchase DESC

Takeaway notes from a newbie:
To sort in descending order, use DESC

LEFT OUTER JOIN refers to LEFT as customers in this case (the first one listed). You need to put the table with more info on the left, and join it with the right. In this case we have customers who may not have placed any orders yet. So all orders have customers who placed them, but not all customers have placed orders. So you need orders to go on the left.

SQL - Joining tables to themselves

It's strange what kinds of things you stow away from all those years of school. I remember the day I first learned in biology class that most bacteria reproduce asexually. No couples therapy for them, I guess. No nights in the doghouse or roses on Valentine's Day or any of that nonsense that humans have to deal with just to please their significant others.

My abnormal(?) fascination with this concept probably explains why I immediately thought of bacteria when I got to the lecture on self-joining tables in the Khan Academy SQL course.

This is a rather twisted topic, so I thought I would write out the concepts, just to help myself remember and understand what's going on when you try to join a table to itself.

We'll use the example from the course. We have a table of students' first names, last names, email addresses and buddy_ids. Each student has a different buddy. If we want to generate a table that shows Student 1's first name & last name alongside Student 1's buddy's email, then we would issue the following command:
SELECT students.first_name, students.last_name, buddies.email as buddy_email FROM students JOIN students buddies ON students.buddy_id = buddies.id;

In a normal JOIN command you have two separate tables, but since in this case, you're joining a table to itself, you need to create an alias for the table name. In this case, the alias for students is buddies, indicated by "students buddies" above.

The ON portion is what ties the fields together (what info to join the data on).

"buddies.email as buddy_email" -- the "as buddy_email" is not required, it's just what you want the column name to be in the data you're selecting.

Tuesday, May 12, 2015

New SQL course on Khan Academy

When it comes to a book or movie, the sequel is usually not as good as the original.

But that doesn't mean you shouldn't give SQL a chance! This course seems pretty great so far.

Sunday, April 26, 2015

Frustration of the Week: How to Overwrite Apple Git (old version) with Homebrew Git (newer version)

So you've got two versions of Git on your Apple machine: the Apple-installed version and the Homebrew version. Unfortunately, the Apple version is outdated, and your computer won't stop using that one instead of your shiny new Homebrew version. You know because when you type git --version into the command line, you see the old version showing up instead of the new one. Rats!

Entering:

export PATH=/usr/local/bin:$PATH

and then

git --version

will temporarily get your console to show the updated version. But once you exit out of the Terminal and reopen the Terminal, your computer will go back to its old ways again.

That means that

which git

tells you

/usr/bin/git

instead of

/usr/local/bin/git

You want /usr/local/bin/git because that is where the newer version is installed.

Why, computer gods, why??

It's really quite simple. Your computer does not hate you. It's just following the instructions located in its .bash_profile.

To tell it what you want it to do, you need to edit your bash profile.

In Terminal, type the following commands:

cd
touch .bash_profile

open -a "TextEdit" .bash_profile

Your .bash_profile will open in TextEdit. Add this line to the file:

export PATH="/usr/local/bin:$PATH"

Save it and enter this in the Terminal to finalize the change:

source .bash_profile

Now you have to link your Homebrew git, by saying

brew link git

Now (FINALLY), which git should say /usr/local/bin/git

...which is where Homebrew's updated git version should be located. Success!

Now every time you need to update your git, you just say:

brew update

brew upgrade git

That's it! Awesome.

A million thanks to the kind helpers who shared their wisdom via the following threads:

http://superuser.com/questions/409501/edit-bash-profile-in-os-x

http://superuser.com/questions/708601/homebrew-cant-link-git

Sunday, March 22, 2015

Chapter 12 - User Authentication with Django Registration Redux

A couple notes for this chapter. In 12.1, make sure you are updating the correct urls.py file. There are two - one is: tango_with_django_project/tango_with_django_project/urls.py, while the other is in tango_with_django_project/rango/urls.py.

You want to update the first one with (r'^accounts/', include('registration.backends.simple.urls')),

In 12.3.6 you are asked to update the base.html template to account for the new links shown on the page http://127.0.0.1:8000/accounts/ (shown above). Don't panic if you see a weird looking 404 error page. Just look at the links listed and see why it makes sense that the tutorial is asking you to replace 'register' with 'registration_register', and 'login' with 'auth_login', etc.

Also important to read this sentence: Notice that for the logout, we have included a ?next=/rango/. This is so when the user logs out, it will redirect them to the index page of rango. If we exclude it, then they will be directed to the log out page (but that would not be very nice).

The sentence refers to this direction:

logout to point to <a href="{% url 'auth_logout' %}?next=/rango/">

In the future add "?next=<link path here>" to the end of a link if you want to redirect somewhere else.

However, to redirect a user to a custom page after a password has been changed, create a file called password_change_done.html and save it in templates/registration. See this documentation page for more details. You can base it off your base.html template and make it something like:

{% extends "rango/base.html" %}

{% block body_block %}
<h1>Password Changed</h1>
<p>Good news! Your password has been changed.</p>
{% endblock %}

At first, I attempted to achieve this by adding a function in views.py that redirected to a template file in the regular templates/rango folder. The correct way is create a custom password_change_done.html file and save it in the registration template folder so that it shows you that page instead of the default page shown here:

This is a problematic page. Sure, it tells you that your password change was successful, but what if, after changing my password, I want to continue to browse the site? There's no link to the homepage. Clicking on "home" takes you to an admin login page, so if you're not an administrator, you've reached a dead end. It makes much more sense to direct the user to a custom page that tells the user that his/her password has been changed successfully, along with a link to the homepage.

This took me a long time to figure out, but I'm glad I managed to cobble together a solution. Phew. Onto the next chapter...

Wednesday, March 18, 2015

Chapter 11 - Cookies and Sessions

Chapter 11 is all about cookies. Unfortunately, not of the edible kind. I just spent 5 minutes pointlessly looking for the phrase ">>>> TEST COOKIE WORKED!" on the registration page, when in fact, it's supposed to print in your console window. Oops. Don't make the same mistake I did. Read the directions closely.

Note to self: stuff is not printed to the rendered HTML. It is printed in the console. It's like a private message to yourself, allowing you to test new features!

Also, if you want to implement a grammatically correct site counter ("you have visited this site __ time(s)") without the (s), put this in your about.html file:

{% if visits > 1 %}
<p>You have visited this site {{ visits }} times.</p>
{% else %}
<p>You have visited this site {{ visits }} time.</p>
{% endif %}

An if/else statement to the rescue! Yay, grammar!

Thursday, March 12, 2015

Ch. 9 User Authentication

If you are tempted to change your urls.py to match the snippet provided in this chapter, don't do it!

urlpatterns = patterns('',
    url(r'^$', views.index, name='index'),
    url(r'^about/$', views.about, name='about'),
    url(r'^category/(?P<category_name_slug>\w+)$', views.category, name='category'),
    url(r'^add_category/$', views.add_category, name='add_category'),
    url(r'^category/(?P<category_name_slug>\w+)/add_page/$', views.add_page, name='add_page'),
    url(r'^register/$', views.register, name='register'),
    url(r'^login/$', views.user_login, name='login'),
    )

I am struggling with regular expressions, but I kind of hacked my way through chapter 8 and got the forms to work. In this chapter, however, I noticed that

url(r'^category/(?P<category_name_slug>[\w\-]+)/$', views.category, name='category'),

changed to:

url(r'^category/(?P<category_name_slug>\w+)$', views.category, name='category')

do not change it! Leave it as url(r'^category/(?P<category_name_slug>[\w\-]+)/$', views.category, name='category') or your application will complain.

Sunday, March 1, 2015

6.9 Exercises - How to Resolve an IntegrityError

One of the exercises in 6.9 is:

Update your population script so that the Python category has 128 views and 64 likes, the Django category has 64 views and 32 likes, and the Other Frameworks category has 32 views and 16 likes.

I kept getting a long error that ended with:

django.db.utils.IntegrityError: column name is not unique

This response on StackOverflow helped a lot. The key is that in models.py, the Category name is set to unique=True, so since a category with name Python already existed in the database (and thus was not unique), we were not able to update it with views and likes.

After migrating the changes to models.py, and before running populate_rango.py, go into the admin page, delete the three categories. Alternatively this thread taught me that I could also delete the db.sqlite3 file and then run this command in terminal: python manage.py syncdb

Then create the superuser again, apply migrations and run populate_rango.py. You should now see views and likes updated in each category. Success! (This also comes in handy in the next chapter, when you create slugs).

Chapter 6 - Models and Databases - populate_rango.py

In the text for populate_rango.py, make sure to remove the indents in the add_page function:

def add_page(cat, title, url, views=0):
    p = Page.objects.get_or_create(category=cat, title=title)[0]
        p.url=url
        p.views=views
        p.save()
    return p

p.url, p.views and p.save() should not be indented here.

Chapter 6 - Models and Databases

Ooh, models. Unfortunately we're not talking about extremely attractive people. According to the official Django tutorial, A model is the single, definitive source of information about your data. It contains the essential fields and behaviors of the data you’re storing. Generally, each model maps to a single database table.

Well, ok, then.

The coolest thing about this chapter is creating your superuser. Woohoo! Kind of makes you feel like you're on your way to becoming a superhero.

Note to self, this seemed important for future use:

Thought I'd put it here because it seems like something I will forget later.

Another thing about this chapter to note:

If it bothers you (like it bothered me) that the admin interface slaps an S on Category to make it Categorys instead of Categories, the tutorial helpfully points out that you can add a nested Meta class within your Category class in models.py, to alert the admin to a special plural spelling.

It's simply a matter of inserting this right under the __unicode__ function in your Category class:

class Meta:

verbose_name_plural = "Categories"

...and voila. Your admin is now grammatically correct. Yay!

Chapter 5 - Templates

Hit a roadblock today where I accidentally moved all the files to the wrong folder, so I had to start chapter 4 over from scratch.

The instructions say: In your Django project’s directory (e.g. <workspace>/tango_with_django_project/), create a new directory called templates.

<workspace> is referring to code/tango_with_django_project

(NOT code/tango_with_django_project/tango_with_django_project).

So the structure of your project should look like this:

Lesson learned. Add as many print statements to settings.py as needed.
I added print "base directory is " + BASE_DIR to mine, so that I could spot "base directory is" quickly among all the mumbo jumbo in the command line.

In which this English major gets with the program.

Greetings, Earthlings.

I created this blog for the non-computer-science people who are currently learning how to code.

I studied English in college. Unlike most of my peers, I didn't have any plans to become a lawyer or teacher. I just wanted to get paid to read and write. After graduation, I moved to New York and managed to find a job that would let me do just that. Sounds like a happy ending, no? Not quite.

About a year and a half ago, I was laid off from my copywriting gig at what I thought was my dream job/company. I found another job in less than a month, but it awakened a formerly dormant part of my mind that grew and grew until it became a looming question: Wasn't it time I took control of my future?

Since I've always wondered what makes the Internet so magically addictive, I decided to learn how to program. Thanks to the plethora of online courses out there (most of them free!) it's easy for anyone with a laptop and an Internet connection to get started.

Python was my first introduction to the world of programming. Coursera offers a series of great Rice University courses: Intro to Interactive Programming in Python, Principles of Computing and Algorithmic Thinking. Warning: the third course is really challenging! But it's definitely worth trying...even if it makes you want to bang your head against the wall at times.

After that was over, I signed up for a Udemy course and learned some Javascript and PHP. That's when I really began to appreciate Python. It just makes more sense to me than other languages.

I created this blog to document my attempt to learn how to recreate Eat Decisive using Django and deploy it on Heroku. I am using this Django tutorial to get started.

Throughout my journey, I've read and will continue to read countless tutorials and Stack Overflow answers that have helped me immensely. Some of the tutorials I came across were written for people who were more experienced than little ol' me. It was frustrating at times, kind of like when you look a word up in the dictionary and the definition contains another word you don't know. With that in mind, some of my blog posts will be way too basic for many people. But those same basic posts will probably help other people who are in the same boat as me. So with that latter group in mind, I will try my best to share whatever information I acquire along the way, without assuming much prior knowledge.

Join me as I navigate the world of programming with a side of gifs and puns (I can't ignore my inner humanities major too much). Lots of fun times ahead.