28 9 / 2011
node.js FTP server
Got to use node.js for a work project recently. We needed an FTP server with special user authentication that would run custom code after a file was uploaded. There was one node.js FTP server implementation on github, so I forked it and started rounding out the basic functionality. My fork is here.
The first significant change I made was to encapsulate the data connection logic. File lists and file contents are transferred over the data connection (FTP commands and responses over the control connection). I quickly found that some clients are super eager to send you data and will do so once a passive data connection is made, even before the FTP server tells them it’s ok to do so. This was especially problematic for file uploads over passive data connections. Without a workaround for these aggressive clients, the flow looked something like this:
- Receive PASV command from client
- Start listening on a port and tell client which port to connect to
- Receive STOR command from client, stating it’s going to upload a file
- Attach data listener to data connection that saves incoming data to file
- On data connection end event, make sure file data has been written
- Close the file
Due to the asynchronous nature of node.js, data often arrived between steps 3 and 4, before a data event listener had been attached. Some of the file chunks were falling through the cracks and the saved file was incomplete.
Also regarding the above flow, ensuring that all data had been written to disk before we attempted to close file was … well … convoluted. Maybe a solution could be found using fs.createWriteStream, but I’m still happier with what I’ll outline next.
Once I found out that data was falling through the cracks I did some searching and found that setting up a buffering data handler was a common solution to a common problem. Good to know! With a persistent data buffer in place things became much easier, like so:
- Receive PASV command from client
- Start listening on a port and tell client which port to connect to
- Once we get hint of a connection (socket connect event), listen for data events and push each data chunk onto a stack
- Receive STOR command from client, stating it’s going to upload a file
- On data connection end, open file
- Loop over buffered data
- Save each buffered chunk before moving to next chunk
- Close the file
It’s less than ideal to buffer each uploaded file in memory first, but it works and is simple. Success!
In addition to the above, I’ve encapsulated things further in my forked repo. The FTP server object itself emits some additional events which you can listen for. My goal was to encapsulate the basic FTP functionality and provide a way to take special actions when:
- A user connects
- A user attempts to log in (listen for these events and handle authentication yourself)
- User uploads a file
Hopefully someone with a slightly different use-case will come along and take things the rest of the way.
Permalink 18 notes
15 10 / 2010
Node Explorations
In my quest to learn new things, specifically the following things, I’ve been both excited and drained:
By “new things” I really mean “some cutting edge stuff that seems pretty cool”, the “cutting edge” portion being the source of most of my struggle. New tools, languages and software have sharp edges.
Since I love fast and light software, node.js is my current casual focus. Also helps that I’ve been learning more JavaScript lately (through work and various side projects).
I’d heard of Sinatra for Ruby and figured I’d like to try a similar micro-framework for node. Stumbled across Picard on github a few months ago, but forgot about it … Last week I noticed Express on the node github page, installed it, and proceeded to hit brick walls.
What, I have to install npm, a package manager? Makes sense, but had a somewhat awkward installation process. Can’t stand dependencies that I’m required to install manually (it’s not in apt). It’s just not ethical.
I eventually got npm installed, then installed Express using it. Fine. When I go to dive into Express, I see it likes to use jade for templating. Think I tried using haml and that failed. Sounds nice, but jade won’t install via npm. And “npm update” doesn’t work. When I try to re-install npm, it doesn’t change a thing. So npm seems hosed … might as well avoid having to use it at all. I can handle that!
Luckily I stumbled back upon Picard. So I downloaded the source, copied the sample app code into own, pointed my app at the picard libs. Done! Heeeeaaaawwww! And picard uses haml, out of the box, nothing extra to install. Yes! Fists shall forever be raised at dependencies (ok, not really).
So now how about database access? Came across a few ORMs listed on the node wiki page, because I really don’t want to start from scratch with sqlite3 on node. Nor did I want to interact with Redis directly from node.
biggie-orm seemed promising. Had a nice API, but I couldn’t get it to save. Looking back on it, maybe it was trying to use MULTI and the version of redis for ubuntu doesn’t support that command. Damn.
Found a few other redis orms … redis2json which is read-only. It reads into a json data structure. Nice, but I need to save too, and I don’t want to do it by hand. redis-node-client kept bombing on me, outputting dirty characters. redis-node works, but when I tried to use transactions (which seems to be the right thing to do), I ran across the fact that MULTI isn’t supported. Shit.
Later that day I found mention of node_redis, specifically that it’s FAST, and has transaction support. So I manually installed a newer version of redis in my home directory, tried biggie-orm again and it still wouldn’t save all the data. But that’s where I stopped.
What can you learn from all this?
That redis looks fun, and even seems to model some patterns better than databases do. Specifically looking forward to using this to get back ids of objects that have all the specified tags:
sinter tag:1:objects tag:2:objects tag:10:objects tag:27:objects
sinter stands for “set intersect”. Each additional parameter is a key in redis whose value is a set. So it fetches the values for those keys and intersects them. The elements in the “tag:1:objects” set are ids of objects that have tag 1.
Node also looks fun, requiring a new way of thinking about code, especially code that involves any sort of IO. You have to write it in Continuation-passing style.
Whew. Too many links.
Permalink 1 note
07 10 / 2010
IE bugs: link tags, href, jquery
Trying to do a brain-dump of things I’ve encountered at my new job, since I’ve been doing work in several new areas.
My employer’s website was crashing on IE7. When I tried to re-create the issue on my IE8 machine, it crashed for me as well. Also crashed when I put IE8 in IE7 mode (the Developer Tools are quite nice).
Took me a while to get used to the IE Developer Tools, and eventually I found the JavaScript debugger. Problem was, I couldn’t figure out how to debug a site that crashes the browser once it loads. I thought maybe I could load the page and start the debugger before it crashed. But that didn’t work because what I really needed to do was start the debugger and also tell it to break immediately. If I remember correctly, it defaults to breaking on errors. You have to de-select that button and then click the one next to it so it breaks immediately. Needless to say I didn’t have enough time to do both.
After some fighting and many browser restarts, I did the following:
- Set my home page to about:blank
- Start the debugger on the homepage with the options to break immediately
- Go to the URL that I need to debug
Simple, right? It took me a while to consider that the debugger would actually stay ON as I traveled away from about:blank to a new URL, rather than turning itself off.
Now that I was able to step through the JavaScript, I traced the issue to a link tag that was programmatic-ally being inserted into the head tag. The empty link tag was being inserted using jQuery just fine, but when the href attribute was set to the correct stylesheet URL, the browser would crash (IE8 in IE7 and IE8 mode). The workaround is simple: insert a fully complete link tag at once … don’t modify the href attribute.
Permalink 10 notes
20 7 / 2010
Szelector
A month ago I got excited about node.js. Watched two presentations, installed it, ran a few tests, and then ran out of steam for lack of a real project idea. Hate when that happens.
At the same time I also got excited about learning advanced Javascript concepts since it would no doubt help me with node.js, so I decided to investigate jQuery and see what makes it tick. And what better way to really investigate jQuery than to try to build something similar on my own? Actually, that’s only part of the story: I wanted to use jQuery on an XML document, but discovered that the namespaces render it useless. I saw no reason why it shouldn’t work. If I couldn’t fix jQuery, I’d have to build something similar myself.
I couldn’t make sense of jQuery enough to fix it, so Szelector was born. And really that’s all it is at the moment: a selector. In other words, it parses a CSS3 style selector string and returns elements from the DOM that match the string. I haven’t finished attribute matching, but hierarchy-related fetching mostly works.
My code changed a lot as I learned about optimization (work from references whenever possible, reduce the number of function calls, optimize loops as appropriate, etc) and got a better understanding of the problem space. I even used the sizzle benchmarking setup to test my project against sizzle, jQuery, dojo, mootools. In some cases my project is faster, but mostly sizzle is blazing. And here’s why:
Gecko 1.5.1 brought querySelectorAll() to the JavaScript API, which sizzle uses whenever it can. On startup, if it can, sizzle swaps out its parser + selector logic with querySelectorAll() and friends. Cheater! The browser’s selector code will always be much faster, so it makes sense. But still, I was surprised to find that going on under the hood.
Trying to keep this high level … what else have I gained from this project?
- I got familiar with Chrome’s JavaScript debugger
- Can now recognize namespace pollution avoidance in JavaScript
- Came to appreciate once-again just how much jQuery does for me.
At this point it’s been a month since I’ve touched the project. Not what I intended to happen, but there’s a lot of mental energy necessary for something like this, and my brain is needed elsewhere for the time being.
If you have any questions, feel free to ask.
Permalink 1 note