[E-trademarks] has USPTO documented whether and how it maps Unicode to ASCII? (was The ZWSP mistake at the Trademark Office)
Carl Oppedahl
carl at oppedahl.com
Mon Oct 13 11:42:59 EDT 2025
On 10/13/2025 7:53 AM, Pamela Chestek wrote:
> That's the assumption i'm not willing to make because I'm not sure
> what happens when they are added to the pseudomark field. Do the
> lookalike Cyrillic letters get coded as the regular character
> lookalike? if you're searching the CM field for the mark, you will be
> searching the pseudomark database too.
Thank you Pamela for posting this. Thanks to you I now know to wonder
about this. For example I wrote this:
And after that is [Unicode] U+0415 : CYRILLIC CAPITAL LETTER IE. It
looks like an "E" but you won't get a match if you search for a "E".
The two search examples that you provided suggest that maybe some
human-initiated process at the Trademark Office sometimes leads to a
mapping of a U-0415 into an ASCII 76 (a Latin "E") in a PM field. Or
maybe this is actually /*an automated process*/ that reacts to any
U+0415 character by (a) creating a PM field and (b) mapping U+0415 to
ASCII 76 in that newly created PM field. So now we get to start asking
questions.
If a mark in an application or registration contains U+0415, then is it
/*ensured*/ by Trademark Office software that (a) that application or
registration automatically gets a pseudomark entry, and (b) does that
Unicode character get coded as an ASCII decimal 76? (Call this
"/*automated software mapping of Unicode to ASCII into the PM field*/"
or "/*ASM*/".)
Or does the Trademark Office rely upon /*human-being processing*/ of
each application for some human being in the Trademark Office to
actively decide to create a pseudomark entry for that application,
meaning that there might be applications and registrations where no PM
entry got created despite presence of (say) a U+0415 in the mark? In
other words human error might lead to a failure to create a PM entry in
an application where it is actually needed?
If the latter (if the Trademark Office relies upon human-being
processing), then when the human being decides to create a pseudomark
entry for that application, is it automatic (in software) that
the U+0415 gets mapped to ASCII 76? Or does it rely upon alertness and
accuracy for the human being to map the the U+0415 to ASCII 76? In
other words human error might lead to a failure to map the the U+0415 to
ASCII 76? Maybe the human being might inadvertently failure to map it
at all? Maybe the human being might inadvertently map it to some other
ASCII value?
Let's suppose that it was only recently that the Trademark Office
commenced /*ASM*/**for newly filed applications/*.*/ If so, then it
would be safe to assume that there are countless numbers of
earlier-filed applications and registrations in which the ASM had not
taken place and in which U+0415 characters represent guaranteed search
failures. Maybe the Trademark Office needs to carry out a "clean-up"
ASM on the entire corpus of earlier-filed applications and
registrations? Has the Trademark Office done so?
One could imagine a number of ways that an ASM process directed to the
entire corpus of earlier-filed applications and registrations could lead
to unintended results. Suppose that a particular registration contains
an ASCII 36 (a dollar sign) that got mapped (years ago) into a PM entry
with an ASCII 83 (a Latin S). And suppose that particular registration
also contains a U+0415 that until now had not gotten mapped into a PM
entry. Now let's say the Trademark Office decides to do a "clean-up"
ASM on the entire corpus of earlier-filed applications and
registrations. The ASM will presumably encounter the U+0415 in that
particular registration and will decide "we had better create a PM
entry". Does that lead to a discard of the previous PM entry that mapped
the ASCII 36 to the ASCII 83?
But also recall the Invitation to the Public to Submit Suggestions
Regarding Database Design Codes and Pseudo-Marks
<https://www.uspto.gov/web/offices/com/sol/og/2009/week52/TOCCN/item-381.htm#cli381>(OG
notice December 29, 2009). Some PM entries exist not because of some
automatic mapping but instead because a human being decided the PM entry
needed to exist. It might be that a Trademark Office person decided
it. Or the applicant or registrant decided it. Or a third party
decided it. In any case suppose some PM entry in a particular
application or registration exists because a human being decided it
should be so. And now suppose that particular application or
registration also contains a U+0415 that until now had not gotten mapped
into a PM entry. Will the "clean-up" ASM lead to a discard of the
previous PM that happened due to human intervention?
I would be astonished if it were to turn out that ASM already exists in
USPTO software. There are at least two reasons why this would astonish
me. First, you can go to the search box at the USPTO web site and plug
in "Unicode". Yes you will get a few hits, but they are limited to two
areas -- patent application DOCX, and ST26 genetic sequence listings.
Not one of the Unicode hits leads to anything about trademarks or
pseudomarks.
Now we all know the search box at the USPTO web site is very limited in
what it can find. We all know that Google (for example) finds lots of
stuff on the USPTO web site that the local search box won't find. So
you can also use Google to try to find places where the USPTO somehow
discussed or discusses Unicode as it relates to trademarks or
pseudomarks. But that also comes up empty.
It seems to me that if the Trademark Office had gone to the trouble to
create ASM, surely this would have somehow leaked out into some place
that would reveal itself in a web search.
But let's suppose that the Trademark Office has, against all odds,
actually created and implemented ASM despite no hint or suggestion of it
in the public record. If so, then one would hope that in the spirit of
transparency, the Trademark Office would /*publish the mapping*/. Some
published document would say "we always map U+0415 to ASCII 76". This
way everyone would know what to expect. And if the mapping were flawed
(suppose the Trademark Office failed to realize that U+0415 needs to get
mapped to ASCII 76?) then there would be an opportunity for the
trademark community to suggest remedies for such failures.
It should come as no surprise that in various open-source software
communities, this mapping of Unicode to ASCII is well-plowed ground.
Each open-source software community is filled with very smart people who
think about such things. Each open-source programming language (for
example PHP and Python) has one or or more functions, for example, for
trying to do such mappings.
Maybe the Trademark Office has already reached out to one or more of the
open-source software communities and has learned ways to be smart about
Unicode. Or maybe not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oppedahl-lists.com/pipermail/e-trademarks_oppedahl-lists.com/attachments/20251013/c696603f/attachment.htm>
More information about the E-trademarks
mailing list