[E-trademarks] has USPTO documented whether and how it maps Unicode to ASCII? (was The ZWSP mistake at the Trademark Office)

Carl Oppedahl carl at oppedahl.com
Mon Oct 13 11:42:59 EDT 2025


On 10/13/2025 7:53 AM, Pamela Chestek wrote:
> That's the assumption i'm not willing to make because I'm not sure 
> what happens when they are added to the pseudomark field. Do the 
> lookalike Cyrillic letters get coded as the regular character 
> lookalike? if you're searching the CM field for the mark, you will be 
> searching the pseudomark database too.

Thank you Pamela for posting this.  Thanks to you I now know to wonder 
about this.  For example I wrote this:

    And after that is [Unicode] U+0415 : CYRILLIC CAPITAL LETTER IE.  It
    looks like an "E" but you won't get a match if you search for a "E".

The two search examples that you provided suggest that maybe some 
human-initiated process at the Trademark Office sometimes leads to a 
mapping of a U-0415 into an ASCII 76 (a Latin "E") in a PM field.  Or 
maybe this is actually /*an automated process*/ that reacts to any 
U+0415 character by (a) creating a PM field and (b) mapping U+0415 to 
ASCII 76 in that newly created PM field.  So now we get to start asking 
questions.

If a mark in an application or registration contains U+0415, then is it 
/*ensured*/ by Trademark Office software that (a) that application or 
registration automatically gets a pseudomark entry, and (b) does that 
Unicode character get coded as an ASCII decimal 76?  (Call this 
"/*automated software mapping of Unicode to ASCII into the PM field*/" 
or "/*ASM*/".)

Or does the Trademark Office rely upon /*human-being processing*/ of 
each application for some human being in the Trademark Office to 
actively decide to create a pseudomark entry for that application, 
meaning that there might be applications and registrations where no PM 
entry got created despite presence of (say) a U+0415 in the mark?  In 
other words human error might lead to a failure to create a PM entry in 
an application where it is actually needed?

If the latter (if the Trademark Office relies upon human-being 
processing), then when the human being decides to create a pseudomark 
entry for that application, is it automatic (in software) that 
the U+0415 gets mapped to ASCII 76?  Or does it rely upon alertness and 
accuracy for the human being to map the the U+0415 to ASCII 76?   In 
other words human error might lead to a failure to map the the U+0415 to 
ASCII 76?  Maybe the human being might inadvertently failure to map it 
at all?  Maybe the human being might inadvertently map it to some other 
ASCII value?

Let's suppose that it was only recently that the Trademark Office 
commenced /*ASM*/**for newly filed applications/*.*/ If so, then it 
would be safe to assume that there are countless numbers of 
earlier-filed applications and registrations in which the ASM had not 
taken place and in which U+0415 characters represent guaranteed search 
failures.  Maybe the Trademark Office needs to carry out a "clean-up" 
ASM on the entire corpus of earlier-filed applications and 
registrations?  Has the Trademark Office done so?

One could imagine a number of ways that an ASM process directed to the 
entire corpus of earlier-filed applications and registrations could lead 
to unintended results.  Suppose that a particular registration contains 
an ASCII 36 (a dollar sign) that got mapped (years ago) into a PM entry 
with an ASCII 83 (a Latin S).  And suppose that particular registration 
also contains a U+0415 that until now had not gotten mapped into a PM 
entry.  Now let's say the Trademark Office decides to do a "clean-up" 
ASM on the entire corpus of earlier-filed applications and 
registrations.  The ASM will presumably encounter the U+0415 in that 
particular registration and will decide "we had better create a PM 
entry". Does that lead to a discard of the previous PM entry that mapped 
the ASCII 36 to the ASCII 83?

But also recall the Invitation to the Public to Submit Suggestions 
Regarding Database Design Codes and Pseudo-Marks 
<https://www.uspto.gov/web/offices/com/sol/og/2009/week52/TOCCN/item-381.htm#cli381>(OG 
notice December 29, 2009).  Some PM entries exist not because of some 
automatic mapping but instead because a human being decided the PM entry 
needed to exist.  It might be that a Trademark Office person decided 
it.  Or the applicant or registrant decided it.  Or a third party 
decided it.  In any case suppose some PM entry in a particular 
application or registration exists because a human being decided it 
should be so.  And now suppose that particular application or 
registration also contains a U+0415 that until now had not gotten mapped 
into a PM entry.  Will the "clean-up" ASM lead to a discard of the 
previous PM that happened due to human intervention?

I would be astonished if it were to turn out that ASM already exists in 
USPTO software.   There are at least two reasons why this would astonish 
me.  First, you can go to the search box at the USPTO web site and plug 
in "Unicode".  Yes you will get a few hits, but they are limited to two 
areas -- patent application DOCX, and ST26 genetic sequence listings.  
Not one of the Unicode hits leads to anything about trademarks or 
pseudomarks.

Now we all know the search box at the USPTO web site is very limited in 
what it can find.  We all know that Google (for example) finds lots of 
stuff on the USPTO web site that the local search box won't find.  So 
you can also use Google to try to find places where the USPTO somehow 
discussed or discusses Unicode as it relates to trademarks or 
pseudomarks.  But that also comes up empty.

It seems to me that if the Trademark Office had gone to the trouble to 
create ASM, surely this would have somehow leaked out into some place 
that would reveal itself in a web search.

But let's suppose that the Trademark Office has, against all odds, 
actually created and implemented ASM despite no hint or suggestion of it 
in the public record.  If so, then one would hope that in the spirit of 
transparency, the Trademark Office would /*publish the mapping*/.  Some 
published document would say "we always map U+0415 to ASCII 76".  This 
way everyone would know what to expect.  And if the mapping were flawed 
(suppose the Trademark Office failed to realize that U+0415 needs to get 
mapped to ASCII 76?) then there would be an opportunity for the 
trademark community to suggest remedies for such failures.

It should come as no surprise that in various open-source software 
communities, this mapping of Unicode to ASCII is well-plowed ground.  
Each open-source software community is filled with very smart people who 
think about such things.  Each open-source programming language (for 
example PHP and Python) has one or or more functions, for example, for 
trying to do such mappings.

Maybe the Trademark Office has already reached out to one or more of the 
open-source software communities and has learned ways to be smart about 
Unicode.  Or maybe not.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oppedahl-lists.com/pipermail/e-trademarks_oppedahl-lists.com/attachments/20251013/c696603f/attachment.htm>


More information about the E-trademarks mailing list