A research-motivated hackathon in web development
Recently, I spent a 12-hour workday to build a full-stack website that timely support a project in my research lab.
Context
There is something called the Machine Learning pipeline, including 3 “steps”: data collection, modelling, and evaluation. Much has been talked about the later two steps. However, the first step is talked about much less despite being no less important.
One of the tasks related to data collection is to build annotation interfaces. An annotation interface is usually a web application that presents inputs to human annotators, and collect their outputs. The better the interface looks, the happier the annotators are, the higher the quality of the collected data is. Therefore, building a good annotation interface is important. But eventually, the “interface” must sit on a highly reliable database so that all the data being collected are intact.
Then, suddenly, my lab needs a new annotation interface in 24 hours. I was tasked with looking for any working solution given that amount of time. At that time, I know a bit of web development before. With that, the hackathon started.
(My labmate said that I looked apparently excited to start the project.)
Starting with a trap
At 11pm, I started with the question of high-level “stack” choice. There were 3 solutions available:
- prebuilt “packages” for annotation interfaces such as LabelStudio, PyBossa, and Doccana
- building a website on my own: a) using a quick prototyping framework, such as Flask and Django (both in Python!); b) using the only stack I am comfortable with, which is React + some database solution + deployment on Github Pages or our campus-only server
- (Backup plan) asking annotators to write things into text files, using a parser to check their syntax, and using GitHub to sync the files.
I spent 3 hours in the middle of the night to try option 1 (PyBossa), 2a, and 2b. Option 1 fails because the doc turned out to be bad, so I was afraid I could not get help if I got stuck. Option 2a also fails because the default database of Django (and probably also Flask) is SQLite, which I saw people said to have bad concurrency affordance. Therefore, I went for 2b.
However, I immediately got into a “trap”. I was looking up for how to create a boilerplate for my React app:
The trap
The React doc is actually not encouraging to use the vanilla React! Instead, it nudged me to NextJS. I bitten it. And the next hour was spent for me to learn the new concepts of NextJS that I didn’t know, like client component vs server components. To me at that moment, those things are unnecessary complicated given such a little time budget I was having and a whole full-stack app to implement.
Luckily, I realized that I was in a trap and I should go to bed, so I dropped the NextJS idea, decided to go with vanilla React, and went to bed (at 2am).
At that time, the only concrete milestone I had achieved was making 1000 images in our dataset publicly accessible via individual URLs (ref1, ref2). I texted Jeongsik (my labmate): “No worries. I got this.”
8 hours of translating requirements into code
I had to wake up the next day at 7AM for the birthday of a close friend in Vietnam. But something happened, I was free at 7AM, so I resumed the project.
I first started with testing database connections. I chose MongoDB as the database because I have it’s the db used in the only web-dev course I have learned. However, I quickly realized that I need a separate backend hosted somewhere in order to connect to my MongoDB database. That’s too much hassle. I switched to Firebase, which is a Google’s service that can act like a combination of a backend server + database, for FREE. I have developed a Trivia Night scoreboard using React + Firebase before, so why not?
After that, everything is pretty “straight-forward” – creating a good amount of React components, linking them together, styling them, separating utility functions and constants, designing the database using an ER diagram, adapting it to Firebase, etc.
However, there was one single technical challenge that took me 2-3 hours to think and search for solutions. It was like this:
Given two React states A and B in the component App. B is sent as a prop to Form, a child component of App. B can be mutated by 3 types of events: whenever A changes, when App first renders and data is fetched from cloud, and when user interacts with Form to change it. What would be the flow to update B to avoid infinite rendering loops?
I guess this is easy for professional React devs, but I am not, so I had to experiment with quite a few ideas before reaching some working solution, even though it requires a redundant data fetching :)
By 5PM, the app basically works and deployed on Vercel. Mission completed. I was cheering with myself happily in my lab.
What did I learn from that?
Firstly, stack choice is situational. There is no “best stack” for everyone, at every time. Everyone is comfortable with a certain stack, and should use that when they need to get things done. I used to somewhat “shame” people for always staying in their comfortable stack. Even though this has some legitimacy to it (people should explore new knowledge), it is mostly wrong. Learning any new language/tool is an investment in time. Sometimes it’s huge. And the annoying thing is you don’t know how much time it’ll take you to understand the stack enough to build the product. But eventually, there is only one thing that matters – whether the product is built and users are happy with it. Almost no users care about the programming language behind a software. Almost no users care if you implemented some clever algorithm, using some fancy stack, or not paying any money (paying a bit of money for an important app is fine!)
Secondly, I learned that I am capable of building and deploying into production a full-stack web app. There is something really safe and exciting to know that, similar to the experience when I cut my hair for the first time.
Finally, web development is still so attractive as a skill to learn. Yes, people pay well for web developers. However, the mere satisfaction in being able to build a website to do almost anything we want, usually for free, is absolutely empowering. Therefore, once in a while, I want to expand my stack, building something meaningful, as a hobby, similar to this website I built for my college.
Next step
- do some project that uses a traditional database (postgresql or mysql), probably via django. Purpose: to see exactly why people say django is good for fast prototyping (once you know it?)
- do some project that uses NextJS. Luckily, the last annotation interface doesn’t require routing. Otherwise, it would have been tedious to set up ReactRouter. I already saw how NextJS makes it very easy to route (via the very folder structure of the codebase).