If its XSS you only need to stop malicious characters, and even then you need to be doing more.If its people from entering names you don't like, then you're SOL you'll never get a regex that handles every name in every culture.However, if you're ever going to dynamically execute code from the db, then you have to be careful (such as using the exec() statement.I think that the assumption that every website must accommodate every possible name is fallacious.Although this seems like a trivial question, I am quite sure it is not :) I need to validate names and surnames of people from all over the world. If it were only English ones I think that this would cut it: I doubt that this is feasible - there are just to much Unicode symbols to exclude all unwanted symbols (and how will tell you what Chinese symbols to exclude?) and there are surly to many valid symbols to inlcude them all (and you will have Chinese symbols problem again).I'll try to give a proper answer myself: The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. This would sum up to this regex: Sorry, you're still going to leave valid names out in the cold. Hi John, the regex does support diacritics (arabic is also in the test cases) with the \p. in your example those would be "John W." (or "John" and "W.") and "Saunders". In case (1), you can allow all characters because you're checking against a paper document.I haven't seen any other case in the list of corner cases. I strongly suggest you read up on diacritics in Arabic, especially those are separate Unicode characters but which combine with letters to change them. In case (2), you may as well allow all characters because "123 456" is really no worse a pseudonym than "Abc Def". Trying to get every umlaut, accented e, hyphen, etc. Just exclude digits (but then what about a guy named "George Forman the 4th") and symbols you know you don't want like @#$%^ or what have you.
My last name is too long to fit on many credit cards and government forms, so I just truncate it. Some people might legitimately have an "@" in their names, but this number is minuscule compared with the number who simply are making an error. It's just a much wider world out there than you seem to realize, and your simplistic, Western-oriented rules will simply not work in general.
Regarding numbers, there's only one case with an 8. But even then, using a regex will only guarantee that the input matches the regex, it will not tell you that it is a valid name EDIT after clarifying that this is trying to prevent XSS: A regex on a name field is obviously not going to stop XSS on it's own.
However, this article has a section on filtering that is a starting point if you want to go that route.
However sometimes it's nice to head dear little-bobby tables off at the pass and send little Robert to the headmasters office along with his semi-colons and SQL comment lines --. NET includes regular alphabetic characters and various circumflexed european characters.
However poor old James Mc'Tristan-Smythe the 3rd will have to input his pedigree in as the Jim the Third. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Search for validating training:
If stopping XSS was a simple as finding a magic regex, a lot of us would be out of jobs.