Sanitize user input

It would be helpful if the task author would be a little more specific about what he is after. How user inputs must be "sanitized" entirely depends on how the data is going to be used.

For internal usage, in Raku, where you are simply storing data into an internal data structure, it is pretty much a non issue. Variables in Raku aren't executed without specific instructions to do so. Full stop.

Your name is a string of 2.6 million null bytes? Ok. Good luck typing that in.

You're called 'rm -rf /'? Wow. sucks to be you.

Now, it may be a good idea to check for a maximum permitted length; (2.6e6 null bytes) but Raku would handle it with no problem.

The problem mostly comes in when you need to interchange data with some 3rd party library / system; but every different system is going to have it's own quirks and caveats. There is no "one size fits all" solution.

In general, when it comes to sanitizing user input, the best way to go about it is: don't. It's a losing game.

Instead either validate input to make sure it follows a certain format, whitelist input so only a know few commands are permitted, or if those aren't possible, use 3rd party tools the 3rd party system provides to make arbitrary input "safe" to run. Which one of these is used depends on what system you need to interact with.

For the case given, (Bobby Tables), where you are presumably putting names into some 3rd party data storage (nominally a database of some kind), you would use bound parameters to automatically "make safe" any user input. See the Raku entry under the Parametrized SQL statement task.

Validating is making sure the the input matches some predetermined format, usually with some sort of regular expression. For names, you probably want to allow a fixed maximum (and minimum!) number of: any word or digit character, space and period characters and possibly some small selection of non-word characters. It is a careful balance between too restrictive and too permissive. You need to avoid falling into pre-conceived assumptions about: names, time, gender, addresses, phone numbers... the list goes on.

When passing a user command to the operating system, you probably want to use whitelisting. Only a very few commands from a predetermined list are allowed to be used.

   if $command ∈ <ls time cd df> then { execute $command }

or some such. What the whitelist contains and how to determine if the input matches is a whole article in itself.

Unfortunately, this is very vague and hand-wavey due to the vagueness of the task description. Really, any language could copy/paste 95% or better of the above, change the language name, and be done with it. But until the task description is made a little more focused, it will have to do.

Last updated