Just because I say something doesn’t mean I can always complete the tale. For example, asking about a service outage or pending repair might never be answered, particularly around a brief event like a live webcast or meetup. The detective in me wanted to know why an entry for personal gender identity was a requirement at all, much less having binary choices.
Before diving into a specific case, I want to review the thought process a database developer would go through in building up a record structure related to a person, say, for admission to a concert. They might think recording gender would make sense. Or they were told to do so. If they just go ahead unthinking what effect collecting and storing this information could be, we’ll look at presently.
Here’s a snippet from a table definition:
create table x ( -- gender char(1), -- );
In this example, of course, there is no requirement of an entry, much less the often-used and misused “M/F”, which I’ll leave to the junior detectives to research for their own particular SQL flavor/flavour. Here’s a tip (from a MySQL schema for image storage):
-- color char(1) not null default 'Y' check ( active in ('Y', 'N') ) , --
Now suppose we are sent an export (like a PDF from a freedom-of-information-act request obfuscation) that has full words, such as “MALE”, “FEMALE”, or, possibly “UNKNOWN”. We’ll assume a possibility no value is entered there, for kicks. So we might need a temporary table to keep this evidence chain before analyzing it further. You know, check the logs, I mean, locks.
create table y ( -- gender char(10), -- );
Either of these data definitions would allow SQL to count people, whether they wanted to be or not. As an old school DBA, I found a few code snippets online (links below) to help illustrate the thought process further.
I am trying to get the total of males, females and unknowns in each agerange.
Here is one of the suggested query logic:
10 count( case when gender = 'M' then 1 end ) male, 11 count( case when gender = 'F' then 1 end ) female 12 from t
Again, I will omit analysis of this, other than to show subsequent responses that should trigger a clue.
count( case when gender = 'M' then 0 when gender = 'F' then 0 else 1 end) unknown
This implies counting entries other than M or F. It also allows a NULL value to be considered. The database could have a rule requiring an entry in the column (NOT NULL), or it could allow a blank (again, assuming a single character width, and a LATIN-1 alphabet, and English words).
Suppose this isn’t an English-reading user? Will we need to decode the local words for gender and store them in a table somewhere? Or will we put the rules at the data dictionary level and have the frontend application developers work out the rules per our published specs.
Does the application layer coincide with the underlying database?
To encode “no entry”, a colleague used this conversion (Python):
@six = isnull(gender, ''),
The intent is to have the database record a null rather than a zero length string, if that computes. What is unintended is the mixing of six and gender, from a declaration level. But that’s beside the main point of this investigation. Please note I altered the Python variable name string per SAP community blog post rules, which I am endeavoring to support in the fairest way I can given the sensitivity of personally identifying information and characteristics.
What happens if the database designer doesn’t require an entry, instead removing the column entirely, as opposed to allowing nulls, blanks, unknown, or free form entry? That’s fine if the decision is early enough that user-facing components don’t include that column. Otherwise, not just the data dictionary needs adjustment, possibly a bunch of code and/or configuration needs to be as well.
work=> create table demo ( person_id int, gender char(1), primary key(person_id) ); CREATE TABLE work=> alter table demo drop column gender; ALTER TABLE
After the column is dropped, you could get hit with this error (postgreSQL specific, but any database will fail in some way):
pq_driver: [PGRES_FATAL_ERROR]ERROR: column "gender" of relation "demo" does not exist LINE 1: INSERT INTO "work"."demo" ( "gender","person_id") VALUES ( '... ^ (caused by statement 'INSERT INTO "work"."demo" ( "gender","person_id") VALUES ( 'M','3')')
I registered for @SAPTechEd 2021. The *required* binary gender/salutation prefix choice was a little cringe-worthy. I’d prefer an option to leave this out.
Hey, Jim. Thank you for bringing this to our attention. I shared your feedback with the team, and am happy to report the field has been removed completely from the SAP TechEd registration form. Thanks! – Dianna, SAP Social | ‘cc @JelenaAtLarge @se38 @MatthSteffen
Hi Jim, hi @JelenaAtLarge, hi all, The team have now updated go.support.sap.com/minisap/ too. The Project Manager needs to keep the field mandatory for tech reasons. But we recognize that this is wrong, and can cause distress, eg to GNC / NB users. So: The radio buttons are now…
Here, amidst much else going on, we have another name for the Gender column: Salutation. Which implies that the entity capturing this data wants to later say, “Hi, Mr. Spath” (as the case may be). Fine, I guess, if that’s important in your data design universe.
And I cannot credit Jelena enough for being ahead of this issue. We often hit the same topics at different frequencies, which is pretty cool I think. As Jelena said, “… when I see a salutation field on a form my reaction is: none of your business.?”
The short time between my post (October 13, 2021), and the SAP team response (October 14) is phenomenal in my experience with enterprise application software event planning. I expect the conversations may have already occurred, given Jelena posting the same complaint over 2 years earlier, so it could be as easy as pushing a draft fix out.
Issues and Caveats
First, when you’re in the design chain pipeline, keep your wits about you, and if there is a decision that doesn’t seem right, ask questions. I had a volunteer role copying data from an organizations main repository to a downstream reporting system. The upstream include birth date, but I questioned whether we would need to use that for anything. The only response was “maybe we want to see what young/old people want” which was too slim for me to commit, so we left that out of the clone. I felt better than no one could grab that level of personal information from a repository I was responsible for.
Second, as a user, you should speak up when a design issue like the above presents itself. There are definitely legitimate reasons for personal data capture such as gender, perhaps for health service delivery (I’m being vague here), and equity (“are females still earning much less than males in this organization”).
Third, as the customer service link in the system, you might empathize with the application users, as Julie Plummer found in the tweets above. “Hi everyone, I agree it’s a good point” is what a caller wants to hear.
Deeply hard-coded values will bite you later. I took this “free sample” (SOS: “Son Of Service”; see links below) and used it for a little project. The more elegant design approach would be to put these enumerated/allowed values into another table as a key constraint. Or a trigger at the application level as long as you can ensure that’s the only access method.
ALTER TABLE [dbo].[Strings] WITH CHECK ADD CHECK ( ( [Type_]='phone_number' OR [Type_]='email_address' OR [Type_]='service' OR [Type_]='work' OR [Type_]='role' OR [Type_]='skill' OR [Type_]='relationship' OR [Type_]='extended' ) ) GO
Or (in MySQL):
-- type enum('extended', 'relationship', 'skill', 'work', 'phone_number'), --
(Slight aside: I could not name the column defined above as “type” as that’s a reserved word, so used “type_“.
If the database design is well-baked and difficult to unroll, you’ll hear sentiments such as:
The Project Manager needs to keep the field mandatory for tech reasons.
No. This is _not_ a technical decision, in my opinion. It’s financial (time to repair, down time estimates, testing costs) and a social engineering path one should recognize.
“Yes / No / No choice” No Choice is the *default*.
The workaround, a good compromise between doing nothing, and doing way too much work, is to hard-code a NULL value as the text value “No choice”.
fin: I have seen 2 productions of Dickens’ A Christmas Carol this month, and some of the seasonal hopefulness shines through, possibly in my words, if not in spirit.