using chatgpt to build automated openstreetmap changesets metadata retrieval and storage

# Let's try ChatGPT

I asked ChatGPT the following question earlier:

can you write me python code that retrieves changesets from openstreetmap?

It came up with a decent first try! I had to feed it some more questions to iron out some kinks and add some more features, which it did.

# Asking follow-up questions to fix issues and add functionality

In its first try it misunderstood my intent and used the Overpass API to retrieve newly created nodes, ways, and relations. So I asked it to retrieve changeset metadata instead:

I was looking for code that retrieves changeset metadata from https://planet.osm.org/replication/changesets/ - could you rewrite to retrieve from there please?

Then I asked it to store the results in a sqlite database:

can you rewrite that to store the changeset metadata in a sqlite database?

Next, I thought I'd add scheduling:

can you add code that schedules retrieving the most recent changesets every hour?

Then I had it add a check for the most recent changeset id already in the database:

can you add a check for the last sequence number already stored in the database?

The last piece of functionality I wanted to add was creating the database.

can you add code that checks if the database exists and create it if it doesn't?

Finally, I asked it to use the Python scheduler module and add a main loop

can you rewrite this to use the python scheduler module?

can you add an if __name__ == '__main__:' section?

# Result

The final result was this almost-working code.

# From ChatGPT output to a working script

I spent the next 30 minutes or so fixing the code by hand and adding a bit of useful output. The final code looks like this.

You can see the diff here. (thanks diff2html-cli!)

A summary of the fixes and improvements I made, referring to the destination file line numbers in the diff:

line 10: add BytesIO import to deal with gzip file handling (see below)
line 23: add missing bbox column
lines 50-51, 57: log messages
lines 63-71: correct parsing of the changeset sequence number
lines 75-81: fix gzip response procesisng
lines 86-88: variables for log messages
line 89: fix xml DOM error
line 91-93: add check for still-open changesets
line 109: add try/catch for the database insert code
line 121: log messages
line 140: change scheduling frequency to 1 minute

Overall, these are not major changes, and they took me not more than 30 minutes to implement. (I asked ChatGPT for a fix for the error in the original gzip response parsing, which it helpfully provided.)

It's still not perfect code by any stretch. Most notably it will not backfill missed sequences, something that would not be too hard to add.

# To conclude

I'm pretty impressed that ChatGPT was able to produce mostly correct code do something very specific.

Some people express fear that software engineering skills will become obsolete. In the link, John Carmack rightly points out that "product skills" will become more important. Understanding what problem you are solving rather than focusing on the specific tools. I agree, and this is bad news for a few different types of (aspiring) software engineers:

Lazy / bad SWEs who rely mostly on Stack Overflow to copy/paste bits of code
SWEs who have trouble communicating clearly, for example because they don't speak English¹ well.
SWEs who stick their heads in the sand and pretend tools like ChatGPT / Github Copilot have no value or no place in software.

I don't think I've seen such an interesting inflection point in technology since I started messing with the internet in my university's computer lab in 1992.

Or whatever language people speak where you work ↩︎