Inserting geometry with PostGIS and psycopg2

This is here mostly to remind myself when I forget later this week, but here’s a look at some problems I had trying to insert geometry into a PostGIS database with python and psycopg2.

Basically in trying to update Matt Waite’s TweetsWithR repo I kept running into the same errors, first with spatialite and then with PostGIS. My goal was to take a pair of tweet coordinates from the Twitter api and turn them into point geometry in the database.

My initial (read wrong) plan was to store the coordinates wrapped in the sql call to ST_GeomFromText:

coordinates = "ST_GeomFromText(POINT(%s %s))" % (x,y)

then pass that to the database in an insert along with some other data from the tweet I’d gathered earlier:

c.execute('INSERT INTO tweets(tweet_id, created_date, text, screen_name, geom) VALUES (?,?,?,?,?)', (long(tweet.id), tweet_date, tweet_text, user, coordinates))

Unfortunately that greeted me with geometry error:

pysqlite2.dbapi2.IntegrityError: tweets.geom violates Geometry constraint [geom-type or SRID not allowed]

I tried this first in spatialite and then got the same in PostGIS, which I already had running, so I knew it wasn’t an issue with pyspatialite or the like.

Eventually I found this stackoverflow thread, which lays out the correct way to geometry inserts from psycopg2. The key is to have the PostGIS function in the actual insert and not as part of the python variable you pass to the insert:

coordinates = "POINT(%s %s)" % (x,y)
c.execute('''INSERT INTO tweets(tweet_id, created_date, text, screen_name, geom) VALUES (%s,%s,%s,%s,ST_GeomFromText(%s,4326))''', (long(tweet.id), tweet_date, tweet_text, user, coordinates))

Unfortunately no one on the internet has ever had this issue with SQLite3 or spatialite apparently, so it wasn’t until I started searching for PostGIS results that I found this.

Quick web scraper in Python

Quick little project this weekend that will hopefully grow into a larger one soon.

I wrote a scraper that cycles through and downloads all the voter registration statistics on the Oregon Secretary of State’s website. These are total numbers of all voters registered by parties in counties and state legislative districts.

The problem is that not only are the tables all formated slightly differently (the Oregon SOS at one point changed what parties they report so the columns don’t match up) but they’re all safely locked away in PDFs, some that are simply scans of printed reports.

That means the next step of this project is some OCR and then formatting/cleaning the text to fit in a database. Looking into some ways to do that in Python as well, but we’ll see.

Messing around with leaflet maps

So I decided to try out mapping with leaflet this weekend. Been meaning to a) get back into mapping and b) stop using Google FusionTables. As much as I enjoyed the features and simplicity of Google, changes (such GoogleApps for small businesses going paid) have made me nervous about trusting them with anything and expecting it to be there next time I need it.

My finished product is Salem, Ore., city council wards with a simple rollover adding the names to a legend at the top and a click listener that focuses the map on a particular ward. Enjoy.

Tracking Oregon newspaper endorsements

Another quick post for a quick project.

Putting together my ballot for the November election, I looked up a few Oregon newspapers to check on their endorsements on certain issues. I don’t put a whole lot of stock in these (the main value is a reality check on smaller ballot measures to make sure they’re not full of it), but having worked at a paper and knowing some of the smart, thoughtful people who write these endorsements, I always like to see who went for what.

Of course that ended up in a spreadsheet, which I saw as a great opportunity to practice using Twitter Bootstrap and Tabletop.js. Thus Oregon Newspaper Endorsements was born.

I know this has zero real-world value for anyone, but it was a good exercise for me to learn some new tools and keep up on jquery, using data I had lying around basically. It was a lot of fun, with the added bonus of having the election (and Oregon’s early vote by mail) give me a tangible deadline.

 

Using the Twitter and Face.com APIs to put googly eyes on people

UPDATE: So obviously with the Face.com API changes this project no longer works, but if a project of mine had to die I’m glad it was something like this and not, say, an open government tool. The world is not worse off for not being able to put googly eyes on twitter avatars.

This year I’ve decided to really make an effort to learn coding, in all its glorious and torturous forms.

That’s led me to look for more excuses to do anything somewhat code related, from joining CodeYear to building more random things for this site.

I’ve been doing more with APIs at work recently (mostly OpenStates) and wanted to give myself a project to force some practice. Thus was born TweetEyes.

It’s a very juvenile application which allows a user to search for a Twitter account, load the avatar and then — if there’s a face in the photo — put googly eyes on it. World Peace will surely follow.

The page relies on the Twitter and Face.com APIs. While Twitter is recognizable to most, Face.com does most of the heavy lifting. The application takes a photo and then parses it for faces, returning data such as gender, whether or not the person is smiling or is wearing glasses and most importantly for me, position of features such as eyes, nose and mouth.

I first saw it used in Dan Nguyen’s Congressmiles, which rates the smiles of congresspeople based on tags returned by Face.com.

One of the first things that popped into my head as a possible use of the API was the continue one of the running jokes in the newsroom, which is putting googly eyes on anything and everything, from trees to rocks and even kittens. So here we are.

The HTML itself is pretty basic. I load some CSS, Google fonts, JQuery, and some custom JavaScript we’ll get to in a moment. Below there’s a <div> called image which hold the three key components, an <img> which will hold the Twitter avatar and two which will hold a transparent png with the googly eye picture. Below that there’s the search button users will add Twitter handles to, some text which acts first as directions and will later have update the status of a search and then Flickr credit for the frame photo serving as the background. The JavaScript holds the actual meat of the application, calling the two APIs and then manipulating the display elements.

A function tweetFace starts by attaching the PNGs to #eyel and #eyer, storing the user’s search term and updating the text to let him or her know the process has begun. After that the Twitter API get user url is stored as twitter_json.

In sections of the code I use query’s built in $.getJson() method, but for the Twitter call I used $.ajax because the default doesn’t have a good way of inserting a failure callback. On success the json is parsed, storing the twitter user name (which isn’t used at the moment, but my plan is to insert it into the text eventually) and the path to the twitter account’s image. Since twitter API user call returns the normal size image, I replace that with “bigger” to get the larger image option. Error handling is done with complete.

I added two helper functions, one to hide both eyes and another which will be used later to set the position once the eye coordinates are found (I’ll come back to this in a minute).

getPhoto does most of the work of the application, beginning with sending the twitter image url to the Face.com API’s faces.detect and storing the json.

Here I use the default $.getJson(). After pulling the width and height of the image, I check that the json comes with tags, which indicates whether or not a face was detected. From there I used $.each() to start looping through the tags on the photo.

Eye_size resizes the eye based on the size of the face. The formula was a little trial and error, but seems to work OK so far. Since the photo is placed based on its top left edge, I need to move it slightly up and to the left to get the center on the actual eye position, so I use a modified Pythagorean setup to get the position correct.

Originally I had the eye functions combined, but after testing I found out faces can come back with only one eye if the subject is in profile, so now I test for the presence each eye specifically in addition to checking for a face. If the eye isn’t found its visibility is set to none.

Eye position is returned as a pair of x and y coordinates set to a percentage of the photo width and height. To get pixels I reversed that and then subtracted the Pythagorean outcome from before.

Those positions are then sent to the eyeSet function, which sets the CSS positioning of each eye, makes it visible and adjusts the height and width based on the face.

At the bottom is the else case of all this, which alerts the user that no face was found and calls hide_eyes().

If everything goes well the avatar image is updated with the new Twitter user image.

At the end of the script I add the code which will actually call all the code when the user click or hits enter (thanks to this tutorial for help on that):

    $('#search').click(tweetFace);
    $('#query').keyup(function(event){
       if(event.keyCode == 13){
           tweetFace();
       }

That’s it. Definitely not something which will make the world a better place, but it was a good exercise without a lot riding on it. If it collapsed around me, no harm.

958 words on weirdness in the AL MVP ballot

Edit: Why not to base your title on word count: Found I missed 2007, so now we’re up to 958.

The annual Baseball Writers Association of America awards are bringers of great joy this time of year, not for celebrating the best the game had to offer in the past season but creating fantastic opportunities for snark and studies of human nature.

I’ve spent the past half an hour starring at the AL MVP Ballot, which gave Justin Verlander the MVP, the first time a pitcher has won the award since 1992.

While pitchers on the ballot is a whole other discussion, what fascinated me was the wide range of opinions demonstrated up and down the ballot, leading me to conclude that the 28 writers selecting the award have almost no cohesive idea how to value performance by baseball players.

Before I get into it, let me say that on its face, that’s not a bad thing. Having many different voices is a great thing and should be applauded most of the time. There was definitely no “right” choice this year in the AL, and it was expected to be a tight race with lots of candidates at the top.

Still, there seems to be almost no consensus of what any given player meant to his team, which you would expect to see (at least a few times) when 28 people who do nothing but watch baseball get together to decide which one of those players was the best.

I start with Michael Young of the Texas Rangers.

Yes, he had a good year, hitting .330/.380/.474 on the year, good for 3.8 fWAR. He spent the majority of the year as a DH.

While that was a good year, it wasn’t even the best on his own team. Ian Kinsler player good defense at second base and had a .370 wOBA, earning 7.7. fWAR.

Despite that, Young got nearly four times the number of MVP points as Kinsler. But that’s not what jumped out to me.

Young got a vote in all 10 positions, from first place to 10th. Someone thought he was the best player in baseball, someone else found nine other guys. Both writers cover American League baseball for a living.

Since Young was a DH you can take fielding out. That means someone thought Young was a better hitter this year than Jose Bautista, or more likely, that Bautista didn’t count becuase the Blue Jays didn’t make the playoffs (which still wouldn’t explain away Miguel Cabreara …)

There has been lots of discussion about whether or not this was “right,” as in correct, but I don’t mind that. Writers have made awful choices in these awards all the time, with mostly harmless results. You cover the Rangers and think Young is a gerat guy and that counts more than talent at hitting a baseball? Knock yourself out.

If Young got his one first place vote and the rest down ballot, whatever. It’s the range that seems odd. If you got all the writers in a room, there would be almost no agreement how much Young actually contributed this year.

Young wasn’t even the only player to get a first and 10th place vote, with Jacoby Ellsbury of the Red Sox also pulling off the feat.

Of the top eight finishers (using Young as our floor), no player had a range of less than seven, with Yankee Curtis Granderson getting first though seventh place votes.

After looking through some past ballots, though, this isn’t so surprising.

Jose Bautista got first and 1oth place votes last season, though everyone agreed winner Josh Hamilton was at least one of the top four players in the league.

In 2009 Miguel Cabrera got first and tenth, with that one first place vote the only thing keeping Joe Maurer from an unanimous victory. Cabrera’s supporter put Maurer second.

In 2008 Francisco Rodriguez got a first and 10th for setting the single season saves record. While Dustin Pedroia only received votes in first through fourth, he was left off a ballot.

In 2007 ARod ran away with the vote, with no first-10th players.

In 2006 Johan Santana was first and 10th, while teammate Justin Morneau won with no voted lower than fourth.

In 2005 there were no first-10th players, with everyone agreeing winner Alex Rodriguez and second-place finisher David Ortiz were two of the top three players in the league.

In 2004 Vlad Guerrero ran away with the award, no votes below fourth, no first-10th.

Aw, but 2003, that was a vote.

It had the most recent total for a winner (242 for Alex Rodriguez) lower than Verlander’s 280, saw eight players get at least one first place vote and six first-10th players.

There were also two players, Manny Ramirez and Shannon Stewart, who pulled Michael-Young’s with votes in every position.

What’s really surprising is none of it had to do with pitchers, who seem to mess up the ballot. A’s reliever Keith Foulke was the top-finishing pitcher at 15th.

While this wasn’t a scientific study in any way, it appears the BBWAA freaks out every once in a while. There also seems to be deep divides in how to measure the value of a pitcher and someone who doesn’t play much in the field, with those two types more likely to get a wide range of votes.

What’s holding me back from digging into this more is that lack of ballots from pre-2003. BBWAA has those up on their website, but Baseball-Reference only has order of finish, not full ballots, before then.

As I was writing this up Joe Sheehan tweeted something I thought put this into a better light. We have to remember that award voting is a side-benefit to being a baseball reporter, not something integral to the job. Reporters are not hired based on their voting behavior.

The ability to report and the ability to evaluate are completely disparate skills. The voting pool pulls from reporters, not evaluators.

@joe_sheehan

Joe Sheehan

 

 

My favorite albums, 2000-09

I’ve found that the further away I’ve gotten from high school the less my musical tastes match up with those of my friends.

In high school it was almost a given that everyone in a given group liked the same music. Sure, there was always a weird person who kept trying to force country or hip hop on everyone, and individual tastes diverged at the edges, but we all agreed on around 83 percent of everything we were exposed to.

Now it’s a rare treat when I can toss band names at a party and get some nods of recognition. Over time musical tastes become more of a personality quirk than a passion for many of us, with previously insignificant traits such as one’s opinion on heirloom tomatoes, hydraulic fracturing or professions emerge as important signals.

This week we had dinner with friends of our expecting a kid soon. Eventually music came up, with a very specific intention.

“Basically I’m asking to talk about old punk bands,” our friend Sarah dropped onto the table.

The four of us are the rare meet up, people who met and became friends before music came up, only to find we shared many of the same tastes.

We spent the next half hour going through our CD collection, discussing the finer points of the career arc of the Suicide Machines and whether or not that guy from Bright Eyes has a weird voice.

So thanks to Sarah and William (and some inspiration from this Keith Law post) I started to curate the CDs and my iTunes library, looking for my favorite albums over the past 10 years.

I came in thinking I’d actually rank them, but I like it better as an alphabetical list. Seems cleaner. I came in thinking I’d add an explainer for each, but as the list grew that became more unlikely. I may still do it, but I wanted to get this up before it got buried behind other projects.

Also decided to go with 2000-09, an actual 10-year period and a very interesting period in my life. In that time I started dating my (eventual) wife, graduated from high school, moved 500 miles from my childhood home, graduated from college, got my first real job, got married and bought a house. I grew the fuck up (kind of).

So this really became a spreadsheet that tells part of the story of my life. From righteous high school anger (AFI), to righteous college anger (Against Me!), to subdued adult contemplation (Avett Brothers).

If you’re so inclined, here’s a spotify playlist with most of the albums represented.

Words about baseball

It’s taken me about 24 hours to process everything that happened last night.

I sat on my couch, three Major League Baseball games running on my TV at once through my Xbox, my wife patiently waiting on the other end for everything to stop so we could make dinner.

Then 10 minutes later I had watched live the most overwhelming sequence of sporting events I’ve ever seen. Hunter Pence, Robert Andino and Evan Longoria hit balls on the infield, outfield and over fences. Exponentially the night grew from exciting to unbelievable to legendary.

It’s hard for me to take seriously anyone who calls anything the “most” or “greatest” anything. I’m sure there has been a more exciting night of baseball somewhere, sometime, but I wasn’t there for it.

Throughout the night I watched games on a split screen. I had the Rays-Yankees on the left and Red Sox-Orioles on the right. I could still see the Orioles celebrating on one side while Longoria’s homer snuck over the tiny left field fence at Tropicana Field.

I had followed the game from my work, on the car ride home, our vets office and finally back in front of the TV. It all led up to those 10 minutes.

In those 10 minutes the Red Sox playoff odds went from 81 percent to zero.

I takes something remarkable to connect a large group of people and even more so for that event to be something good and not a tragedy (not taking into account the feelings in Boston and Atlanta).

Walking around the Oregon Capitol this afternoon I listened to Jim Bowden talk about Wednesday night and completely connected with his sentiment, his emotion and passion. It was the first time I had ever agreed with anything Jim Bowden had thought or expressed.

The past few weeks have filled the baseball with a renewal of the antagonism between those that analyze and those that feel, as the release of the Moneyball movie brought up all of the same arguments as the publication of the book.

But last night was a common ground. It is the foundation of all sports, not just baseball. The moments that cannot be predicted and unite people without a common background beyond a love of the game.

Every outlet and corner of the internet had the same message for one night and one morning.

Baseball is awesome.