You've heard it from the boss: "How much work would it take to make a program that does [this]?". Then there's the client looking for a maintainer for their 'legacy' system. Or maybe you're just looking to hone skills through the open-source community. Regardless of what you're facing, you'll need to size up the task on sight and know what you're stepping in to (and what side-stepping excuses to pull out). No single problem makes a project 'advanced'.
Yes, there is such a thing as a beginner-tier kernel patch.
No, a million lines of code is not inaccessible for college freshman.
Rather than seek the silver bullet, we'll weigh several aspects of the difficulty problem to understand where the real complications lie.
Existing Code Base
Picking up someone else's code is a necessary evil. The best we can hope for is that they left it in good condition, with a familiar, sensible structure and enough comments to preclude research stoppage. Be prepared for lines that were touched exactly once, or worse: code that was 'refactored for efficiency'. Read: High-density artwork from the local coding genius. (By the way, he doesn't work here anymore).
|Code Quality||Difficulty Adjustment|
|No existing code||+0|
|Publishing or peer review qulity code||+0|
|First pass code||+1|
|Super genius obfuscation||+1|
Lines of Code
Size matters, even in software engineering. Small projects are usually quick to pick up, extend and rebuild. However, the difficulty climbs with size. Projects breaking 10,000 LOC (should!) have structural patterns that the dev team has (un?)consciously committed along the road. Namespaces have come to life and newcomers must study the code closely to ease in to the project dialect. Strange things happen after 100,000 LOC. The original developers have taken their vision and moved on to greener pastures. The refactor buzzsaw has cut across the code base multiple times leaving the 2nd string to stitch everything back together. The project may be several years old and showing subtle signs of bit rot.
|Code Base Size||Difficulty Adjustment|
|Very Small  (fewer than 100 LOC)||+0|
|Small  (under 1,000 LOC)||+0|
|Medium  (1,000 to 10,000 LOC)||+1|
|Large  (10,000 to 100,000 LOC)||+1|
|Very Large  (more than 100,000 LOC)||+2|
Language choice matters...somewhat. But I won't be settling any long-standing debates today, since the big break is between high-level (C[++], Java, Python) and low-level languages (x86, ARM, MIPS). Another way to cut it is between standardized languages and esoteric Turing tarpits. If the target language is high-level, used by thousands of people, and controlled by a industry working group, then no points are awarded to project difficulty. If your project has multiple languages, consider the language that adds the most to the project difficulty.
|Language Specifics||Difficulty Adjustment|
|Any standarized, high-level Language||+0|
|Machine/Assembly or Esoteric High-Levels||+1|
How does your project execute on the target system? The easiest projects are strictly userland applications, such as games. These programs have unrestricted library access and the operating system provides both protections and managed access to system resources. The other end of the spectrum lies kernel programs, with limited tools and real risks for going off the rails. There is a middle ground -- some user programs interact directly with the system using unwrapped system calls (software libraries, utilities, emulators) or DMA (old MS-DOS apps).
|User space applications||+0|
|User space syscall interfaces or DMA||+1|
Does your project need multiple threads running on more than one CPU core? Or worse: Is execution distributed across many systems? If you need to break out locks and memory barriers to support SMP, then the complexity of this project just went up a notch.
|Concurrency Level||Difficulty Adjustment|
|None - Single thread/single processor||+0|
If your project includes distinct modules that force you to shift your mindset depending on the milestone of the minute, then chances are you're working across different paradigms. Full-stack web developers live here: One morning the manager caves to customer demands for a feature and you're simulating heavy database transaction volume (SQL) by mid-afternoon. Tomorrow, the toy plugs in to the front-end and you're neck deep in PHP. Even games can be considered multiparadigm if they implement scripting languages on top of a core engine. (Think C++ and Lua). Academically speaking, when a program uses both OO designs mixed with procedural, functional, or declarative subsystems, then you should check the multiparadigm box.
Code grows old and dies without the love and affection of programmers. Qualifying how this really works is difficult to do, so I'll go with my gut. If the first commit in the code base happen before you started programming, then you should bump up the difficulty. One step further: If the LAST commit happened before you started programming, add another dose of difficulty. You'll probably be working in a VM at this point and should seriously consider a total port.
|Code Age||Difficulty Adjustment|
|First commit older than your years of programming experience||+1|
|Last commit older than your YOE||+2|
Putting It All Together
Armed with ten points to throw at your next project, let's push it a little bit more. First, I consider a project that scores a 2 or below as beginner territory. If you think you're a beginner, aim here and try to incorporate something you haven't done before. That utility you made last year? Try to multithread it. Or redo it in another language. Projects that hit 5 have a lot of moving parts and are probably for advanced developers or knowledgeable specialists in the particular area that bumped the project in to that zone.
Trick example -- All of my 'Let's Make' projects to date fit in to this category: First pass code quality (+1), under 10,000 lines of code (+0), imperative (+0), userland (+0), single thread (+0) and made almost yesterday (+0).
Many open source projects fall here. Consider this interpretation of the classic game Master of Orion. 200,000 lines of code (+2) with few comments (+1) with the first commit over 15 years ago (+1), puts this project at a solid intermediate (+4) for those reading this article.
Let's just go for broke and cite the most recent data from the Linux Kernel. Code quality (+0), since it is well-documented. Lines of code is large (+2). Procedural only, but it is multi-architecture (+1), Kernel space (+2), multiprocessing (+1), Earliest code is quite old (+1). Yes, the Linux kernel is usually considered an advanced project (+7), although if you only work on a small part, such as an lkm, it could be intermediate.
Other examples of advanced projects would be software libraries and emulators.