Entwickler-Blog

March 5th, 2023

An Actually Helpful Character Type

As we have seen last time, having char8_t is not terribly useful. But we discovered that another character type is actually quite useful.

In the last post, I recommended not to commit to a single string encoding for a given program but instead use the encoding native to the domain the string is used in. This avoids conversions which may cause problems for special cases, like Windows allowing file names containing invalid UTF-16.

Now, there are situations where you want to use the same string in different domains, for example a file name in both UI and the OS file API. We found that quite often, these strings contain solely ASCII characters, in particular when dealing with string constants: file names, query parameters, etc. In these cases, conversion is trivial, cheap and, very importantly, works character by character, no need to supply a complete string.

We introduced a character type for this purpose, tc::char_ascii, which documents and asserts to only contain ASCII and converts implicitly to all character types, without triggering a narrowing compiler warning. It‘s part of the think-cell library, so try it out!

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


February 20th, 2023

char8_t Was a Bad Idea

We recently pondered whether to change all our chars to C++20 char8_t and decided against it, at least for now.

At the face of it, adding char8_t seems straight-forward and appealing. There is already a char16_t and char32_t for UTF-16 and UTF-32 strings, so why not have a char8_t for UTF-8 strings?

And I am all for strong types, that is, aliases of basic (or other) types that share the same representation but do not implicitly convert to other types to avoid mistakes. If we are counting apples, we may as well have a type for it, so we do not accidentally assign the count of apples to a variable counting oranges. If we want to express, for whatever reason, buying the same number of apples as oranges, we can still convert explicitly between counts of apples and oranges.

Unfortunately, C++ support for strong types is not great and requires a lot of boilerplate. Nothing helpful made it into the standard library so far. Rust, being a more modern language, is much better in this regard.

So why not applaud the introduction of char*_t so we have at least strong types for Unicode encoded characters?

First of all, nowadays most strings are Unicode-encoded anyway. So we go through the effort of introducing a new keyword to distinguish the 1% non-Unicode encoded characters from the 99% Unicode characters.

And we do it while the real problem is elsewhere: strings do not only differ in their encoding but, actually much more frequently, in other invariants. File paths may not contain certain characters. Strings may be escaped, for example according to JSON or XML or SQL rules, which are all different. The UI may only support printing a subset of control characters. Do you want to ring a bell when encountering BEL?

It gets worse. For file paths, Windows is happy with any sequence of 16-bit values, it does not even have to be valid UTF-16. This is actually a good reason not to settle on a single character encoding (UTF-8 being the most attractive) for the whole program: some strings we encounter in the wild may not be proper Unicode, but as long as your program is aware of these idiosyncracies and uses these strings only in their specific contexts, everything works just fine.

Ideally, all these strings with different invariants should really be different types. But keep in mind that we are talking about string types, not single character types. Many of the invariants, including encoding and escaping, are invariants of a sequence of characters, not of a single character. Having a type for a single character does not help to maintain the invariant.

Making the strings different types is not so easy either. We already use different string types because we want to store strings in different ways. String constants are C arrays, dynamic strings are std::basic_string, and they may be passed by reference as std::span or std::basic_string_view. At think-cell, in our cross-process shared heap, we store strings in std::vectors because they work with allocators with custom (in this case, relocatable) pointer types.

The invariants of a string are orthogonal to its storage. We would need to take any of these storage types and restrict their conversion rules and other operations, for example by some sort of tagging. We have not done this at think-cell. Instead, we rely on conventions. We use character type aliases such as tc::filechar and prefix each variable, for example those containing OS file or path names with "path" and those containing HTML with "html".

Of course, this is really weak. But as you can see, introducing char8_t does not help a bit with these problems, and just creates more conversions. This is why we decided against it.

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


February 6th, 2023

Evil Reentrance

Everyone knows parallel programming is hard, and we talk about it a lot. We talk much less about which I think of as the little brother of parallel programming and also a popular source of bugs in complex systems: reentrance.

Just recently, we discovered a bug in Microsoft Office where mysteriously, one of the events that Office add-ins expect, OnStartupComplete, did not arrive. We could never reproduce the error, and only after one of us disassembled the Office binary around the OnStartupComplete call site, we understood what is going on. Office is iterating quite harmlessly over the collection of add-ins that are stored in a vector-like data structure accessed by index (not iterator). Now it so happens that another Excel add-in dynamically removes items from the add-in list in OnStartupComplete. Removing items from the vector shifts the item indices, resulting in items being skipped. If a skipped item happens to be our add-in, we never receive OnStartupComplete.

Now Microsoft is in the process of deciding what to do. Rewriting the loop to be completely reentrance-proof makes the simple loop much more complex and less efficient, so there is little chance that they will go that route.

Sometimes, though, reentrance can be handled quite gracefully. We talked about it a bit when covering tc::change. Another case is optional. When emplacing into a std::optional<T>, the standard does not specify if the optional is reporting to be engaged or not during the time the constructor of T runs. I think this is an unnecessary complication.

In our code, we have constructors which are also performing tasks which must follow the actual construction but are not really part of it. For example, the representation of a PowerPoint window stores its currently visible slide. After construction, a new window always updates its visible slide for the very first time. So the function to update the visible slide is not only called regularly during the lifetime of a window but also at the end of the constructor. For simplicity, this function should not need to do anything special if called from the constructor, so the state of the window object should already be "constructed".

To handle such situations, our tc::optional reports "engaged" as soon as the constructor starts, "disengaged" as soon as the destructor starts, and has asserts if the constructor or destructor is reentered while another constructor or destructor is still running.

The same logic applies to smart pointers such as unique_ptr or shared_ptr, but we did not need them yet. Feel free to contribute them to our library :-)

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


January 23rd, 2023

The Meritsof optional<T const&>

Passing function parameters by const& is very common:

void foo(T const& t);

Now, what do we do if such a parameter is optional?

std::optional<T> does not allow T to be a reference. We could use

void foo(std::optional<T> const& ot);

but this defies the purpose of passing by const&: if we want to pass a T, we would need to first copy it into an std::optional<T>, but we pass by const& precisely to avoid copying.

C++ implements references as pointers, and pointers, unlike references, can be null. They also have the same syntax as std::optional: contextual conversion to bool checks for null, and operator* accesses the value. Of course, pointers were there first, so really, std::optional got its syntax from pointers.

So why not use pointers?

void foo(T const* pt);

This compiles to the right code, but is somewhat inconvenient to use. Instead of

foo(t);

one must write

foo(&t);

or, if afraid of operator& being overloaded, even

foo(std::addressof(t));

If t is an rvalue, it gets even worse because C++ does not allow taking the address of an rvalue:

foo(std::addressof(static_cast<T&>(make_t())));

It does the right thing, but is not how we want to program.

Also, once inside foo, you can do operations on pointers that you should not be able to do:

void foo(T const* pt) {
… pt[1] … // this compiles :-(
}

So why doesn't the standard allow optional references?

void foo(std::optional<T const&> ot)

The reason is that the C++ committee could not agree on what would happen when assigning the contained type:

std::optional<T const&> ot(…);
T t=…;
ot = t; // what does this do?

Does this rebind ot to now point to t, or is this equivalent to *ot=t, changing the referenced value, under the precondition that ot is not std::nullopt?

However, for many use cases, rebinding isn't needed. Actual references also cannot be rebound. And assigning to the contained value can be achieved by *ot=t, which may be the clearer syntax anyway.

So we can postpone the discussion to another day by not supporting assignment to optional<T&> at all. optional<T&> is still very useful to pass and return optional reference parameters. The think-cell library allows references for its tc::optional. tc::optional differs from std::optional in another way, which we will talk about next time.

Do you have feedback? Send us a message at devblog@think-cell.com !


January 10th, 2023

When Order Matters

Last time, I talked about inlining of single-call functions. In fact, there are situations where even for functions called multiple times, inlining functions into a larger one is beneficial.

Say you are processing a change on a think-cell chart. If the user changed the color, you need to apply the color and may have to invert text to keep it legible on a potentially changed background:

ApplyColor();
InvertText();

If the user changed data, you also need to place the labels to a new position. Inversion of text depends on its background, which depends on the text's position. So PlaceLabels must be followed by InvertText:

ApplyDataChange();
PlaceLabels();
InvertText();

In reality, the sequence of such operations can be much longer. The order of them has been carefully crafted so that all dependencies are respected. In some situations, you can skip some of them, but the order should always stay the same.

If we split up the operations into separate functions, we must be careful that this order is adhered to at all call sites. It violates the DRY (Don't Repeat Yourself) principle that we have to ensure this order in many places in the program.

The solution is to put all operations into a single function in the correct order, and pass one or more parameters that control which subset is actually run. If the operations do not require parameters, the parameters may be just a bitmask. If they do, each operation can be controlled by an optional parameter pack, which if it is null, means skipping the operation.

At think-cell, of course we learned this approach the hard way, after gotten bitten by calling functions in the wrong order a few times…

P.S. I learned that other people have similar ideas regarding inlining: http://number-none.com/blow/blog/programming/2014/09/26/carmack-on-inlined-code.html

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


December 30th, 2022

Always Inline Single-Call Functions

Conventional coding rule wisdom says that functions can be too long and then must be broken into shorter ones. If the only reason to break an outer function is its length, then the inner functions will be declared in the same class as the outer function, are private and only called once:

struct A {
private:
    R inner1(A a_, B b_) {
        …
    }
    S inner2(C c_, D d_) {
        …
    }

public: // or whatever outer wants to be
    void outer() {
        …
        auto r = inner(a, b);
        auto s = inner(c, d);
        …
    }
};

What do we gain?

  • inner1/2 get names, which are hopefully helpful to the reader of the code.

  • The parameters can be renamed (in this case from a/b/c/d to a/b/c/d). This is a two-edged sword: the renaming may be inadvertent, and using the same names in outer and inner may actually improve readablity of the code.

  • The code of outer is more concise. The reader can get an overview of what outer does without delving into details.

Besides these advantages, in our work at think-cell, we also discovered distinct disadvantages when it comes to reading and refactoring which I have not seen discussed elsewhere:

When refactoring, single-call functions are special because there they do not need to accommodate the needs of another caller. But finding out that inner is single-call takes effort, in particular because the code of inner is located away from outer.

Also, when deciding how to repackage code into multiple functions, successful code reuse is empirical evidence for having picked good packages. This evidence is missing for functions that are only called once. So there is a good chance that the chosen packaging turns out to be the wrong one for the future code reuse that we did not know about when deciding on the packaging.

To summarize, single-call functions make it easy to miss possible refactoring opportunities, but at the same time are likely to require refactoring later. And indeed, we have seen that this is not a good combination and resulted in overly complicated and/or duplicated code in our codebase.

So we made the decision that functions which are only called once from the same class they are declared in (otherwise data hiding will provide some evidence for a good packaging), have to be inlined.

But what about the good things that separation into functions brings? We can still have them:

  • We can name functional blocks by putting them into a code block with a comment:

struct A {
    void outer() {
        …
        R r;
        { /*inner*/
            …
            r=…;
        }
        …
    }
};

  • To hide the details of these inner code blocks when wanting to get an overview of the outer function, both editors we use, Visual Studio and XCode, support collapsing of blocks, with only the comment remaining visible.

  • If we want to hide the local variables of the surrounding scope from the inner code block, or if we want to initialize a result variable, we can put the code block into an immediately executed lambda:

struct A {
    void outer() {
        …
        auto r = /*inner*/[this, &a_=a, &b_=b]() -> R {
            …
        }();
        …
    }
};

inner only captures this and an explicit list of variables. Other local variables are hidden. The variables can still be renamed, but doing so requires extra effort and thus is unlikely to be inadvertent.

To some, our inline single-call functions rule is heresy. But I believe it actually makes for better code.

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


December 19th, 2022

Poor Man's Exception

Let's say we are performing a sequence of operations:

OpA();
OpB();
OpC();

Now, let's assume each operation can fail, and returns false if it does (in reality, the condition may be more complicated). And if an operation fails, we want to abort and skip all subsequent operations.

The most straight-forward way to implement this is by nesting if statements:

if(OpA()) {
    if(OpB()) {
        if(OpC()) {
            …
        }
    }
}

If the control flow of the operations is more complex, nested ifs do not work so well anymore:

OpA();
if( cond() ) OpB();
OpC();
…

would turn into

if(OpA()) {
    if(!cond() || OpB()) {
        if(OpC()) {
            …
        }
    }
}

The condition in the second if handles the failure as well as the regular control flow of the operations. Mixing these two concerns is less than ideal. If we throw a loop into the mix, we cannot make the transformation at all:

OpA();
while( cond() ) OpB();
OpC();
…

has no straightforward aborting equivalent with nested ifs.

The classic solution to this problem are exceptions:

try {
    if(!funcA()) throw abort();
    while( cond() ) {
        if(!funcB()) throw abort();
    }
    if(!funcC()) throw abort();
    …
} catch(abort const&) {}

Is this the best we can do?

It does have some disadvantages. One is performance: exceptions as they are implemented in today's C++ compilers are slow when thrown. Another is that for the reader of the code, there is no guarantee that the throw abort()s are the only source of abort exceptions. They could come from inside OpA, OpB or OpC. To assure our reader that this is not the case, we must declare abort locally:

struct abort{};
try {
    if(!OpA()) throw abort();
    while( cond() ) {
        if(!OpB()) throw abort();
    }
    if(!OpC()) throw abort();
    …
} catch(abort const&) {}

Now the reader still has to ensure that we did not slip in a different exception somewhere that would not be caught by the catch:

struct abort{};
try {
    if(!OpA()) throw abort();
    while( cond() ) {
        if(!OpB()) throw abort2();
    }
    if(!OpC()) throw abort();
    …
} catch(abort const&) {}

If the body of the catch is empty (and only then), there is an alternative that has none of these problems: If the operations are the only thing happening in a function, we can use returns:

void operations() noexcept {
    if(!OpA()) return;
    while( cond() ) {
        if(!OpB()) return;
    }
    if(!OpC()) return;
…
}

operations is declared noexcept so it is evident that returns are the only codepaths exiting the function. And returns all return to the same place right outside the function, unlike exceptions, which can be caught in different places.

If the operations are not already isolated in a function, we can wrap them into an immediately executing lambda:

[&]() noexcept {
    if(!OpA()) return;
    while( cond() ) {
        if(!OpB()) return;
    }
    if(!OpC()) return;
    …
}();

I call this the poor man's exception. For certain situations, it is actually a good solution!

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


December 7th, 2022

The Value of Canonical Code

As part of our internal bug reporting infrastructure, we had a piece of code to draw a bar chart which in essence was

DrawRect(x, x+nBarSize*n/nMax, yMin, yMax);

to draw a bar for value n relative to some maximium value nMax. I omitted some parameters which are irrelevant for this post. We also have a datatype for rectangles that is passed instead of passing the coordinates one by one, but that is not today's topic.

The code worked most of the time, but sometimes nMax was 0 and we got a division by zero exception.

The first fix was this:

DrawRect(x, x+nBarSize*n/std::max(1,nMax), yMin, yMax);

In the review, we had a discussion how this should be written best. If nMax is 0, we do not want to draw any bar, so we may as well not call DrawRect at all:

if(0<nMax) {
    DrawRect(x, x+nBarSize*n/nMax, yMin, yMax);
}

But nMax is actually always greater or equal to n, and if n is 0, we are also not drawing anything. So we can also write

if(0<n) {
    DrawRect(x, x+nBarSize*n/nMax, yMin, yMax);
}

Which should we pick? In terms of performance, they are indistinguishable, because the case of n being 0 is pretty rare and drawing an empty rectangle is not a big penalty. In terms of code complexity, they are all comparable as well. The last two both need one branch more than the original code. The first one needs a std::max, which is a branch and in this case a constant, but it is 0, so again, no big difference.

I would still argue that in a given codebase, it is still a good idea to have agreement on which code to pick. At think-cell, we call this the canonical solution. If you do not have agreement and everyone does what they please, the reader of the code (and we all know we write code primarily for human readers and only secondarily for the compiler) may ask herself if there is a deeper reason why the first variant was picked over the second or vice versa:

  • The first variant allows n to be greater than nMax. Is it ever and that is why it is not written as 0<n?

  • The second variant does not draw anything for negative n. Does that mean that negative n occur?

My point here is that different variants may raise different questions, and only if there is a canonical variant, found through some agreed rules, the reader will be more inclined to think that none of these complexities play a role here: n is always positive and nMax is greater or equal to n, as the name suggests.

At think-cell, if there is no other factor, we prefer simpler code over more complex one, which is probably something we can all agree on. But in this given example, the complexities of the three variants are quite similar, in particular of the last two. So we need another tie breaker. At think-cell, we pick "less work done", which favors the last variant. We do not do this to make our program faster. Only profiling can tell if it really does. We only do it to have a sensible tie breaker that lets us agree on what is canonical.

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !


November 30th, 2022

Properties (2)
The Hidden State of Hidden Lines

Last time I introduced you to SFont, a simple yet useful representation of partially defined ("mixed") fonts:

struct SFont {
    std::optional m_ostrName;
    std::optional m_onSize;
    std::optional m_obBold;
};
Listing (1)

In our teminology SFont is called a "property", and its members m_ostrName, m_onSize and m_obBold are called "aspects" of the property. We discussed how SFont can be seen as representing a set of fonts, or a subspace in the space of all possible fonts, and leveraged those insights to define some useful methods and functions.

The whole point of that last blog article was to provide you with some concepts and notions that would help you design your own properties, not just fonts. In our software, we have a lot of properties that follow the same pattern: fonts, colors, fills, lines, bullets, markers (used in scatter charts and line charts to mark data points), and even precisions (formatting of decimal numbers). Today we will find out how line formats are different from fonts, and how we can amend our simple data structure to make it fit for line formats. Let's start with a straight-forward definition of SLineFormat, based on PowerPoint's LineFormat API:

enum MsoLineDashStyle {
    msoLineSolid,
    msoLineDash,
    ...
    // Refer to Microsoft Office VBA documentation for the full
    // definition of MsoLineDashStyle supported in PowerPoint.
}; 

enum MsoLineStyle {
    msoLineSingle,
    msoLineThickBetweenThin,
    ...
    // Refer to Microsoft Office VBA documentation for the full
    // definition of MsoLineStyle supported in PowerPoint.
};

struct SLineFormat {
    std::optional m_obVisible;
    std::optional m_onWeight; // line thickness
    std::optional m_omsolinedashstyle;
    std::optional m_omsolinestyle;
    // add MsoLineCapStyle, MsoLineJoinStyle, ... and some
    // representation of fill or color as needed
};
Listing (2)

Seems simple enough, but what's the meaning of the dash style of an invisible line? How is a "visible" line with weight 0 (zero) different from an "invisible" line? Meet the concept of hidden state: We define hidden state as state (information) that is represented and maintained in a data structure, but is not discernible by the user. Everybody who designs data structures frequently meets hidden state, whether they like it or not. In many cases it is tempting to ignore hidden state in favor of some superficial simplicity and apparent symmetrical beauty, or simply because the developer failed to recognize what later turns out to be hidden state.

Hidden state is not necessarily a bad thing: Consider typing text into a PowerPoint textbox, in some deliberately chosen font. When for some reason you decide to re-write your text, a likely first step is to delete the existing text and return to an empty textbox. An empty textbox does not have any visual representation of font (leaving aside the size of the blinking cursor) and yet you expect that when you start typing, your text will be using the same font that was used before. That behavior is useful, and implementing it requires a deliberate usage of hidden state.

In most scenarios though, hidden state is more confusing than helpful. Typically, hidden state surfaces in the form of remnants of a data model's history. In our product domain, if we are not careful about hidden state, two charts may look identical "pixel-wise", but may exhibit very different behavior when the user, e.g., changes the underlying data. This is particularly irritating if the user received those charts from a third party and has no way of knowing how they were created.

Instances of "useless hidden state" are often referred to as "redundancy", and a general rule of thumb for the design of data structures is that redundancy is bad and should be avoided. Did we overlook any hidden state in SFont? No we didn't: All aspects of our definition of the SFont property are independent from each other. Each aspect is meaningful in its own right, regardless of the values of the other aspects. Borrowing a notion from vector spaces, we say that name, size and boldness of a font are orthogonal to each other.

Getting back to SLineFormat, we find that there are multiple representations of an invisible line. Given that all invisible lines look the same to the user, we can safely say that this is an instance of hidden state. At this point we need to decide if this particular hidden state is useful or harmful. I can immediately name some common use cases where an ambiguous representation of an invisible line is harmful:

  • When displaying a list of available line formats in the user interface, we want to show at most one instance of an invisible line format in the list. An invisible line format must compare "equal" with other invisible line formats for this purpose.

  • When displaying the current line format of a multi-selection of lines, none of which are visible, the user interface should reflect "invisible" and not "mixed", see method SFont::Union(SFont).

  • When determining whether the application of one line format on top of another line format would have any effect, we should not come to the conclusion that applying an invisible line format to another invisible line format results in a meaningful change, see method SFont::IsSupersetOf(SFont).

At the same time I find it hard to come up with any use case that would justify the maintenance of hidden state in an invisible line format. Thus, for the sake of this article, let's say that SLineFormat should not contain any hidden state.

How can we achieve that? For a start, there is a redundancy in members m_obVisible and m_onWeight. Either of them can indicate an invisible line format, so ideally only one of them should be needed. Obviously, m_onWeight is indispensible to represent a visible line format, but m_obVisible does not contribute any information that isn't also available from m_onWeight, thus m_obVisible should be dropped:

struct SLineFormat {
    std::optional m_onWeight; // line thickness, 0 means invisible
    std::optional m_omsolinedashstyle;
    std::optional m_omsolinestyle;
}
Listing (3)

Microsoft's MsoLineDashStyle and MsoLineStyle enums don't have values for "invisible" or "no line", but there are still redundant representations of an invisible line. How is that? Well, if a line format has m_onWeight==0, i.e., it represents an invisible line, m_omsolinedashstyle and m_omsolinestyle may still have values. Thus there are a lot of different combinations of values in the members of SLineFormat, that can all represent "no line".

How can we get those meaningless combinations of values under control? There are at least two possible ways:

  1. We can ignore whatever values there are in m_omsolinedashstyle and m_omsolinestyle whenever we encounter m_onWeight==0 (see listing 4),

  2. or we can ensure that whenever m_onWeight==0, the other members always have one and the same unique value (see listing 5).

#define IS_EQUAL_ASPECT(member) ( \
    static_cast(lhs.member)==static_cast(rhs.member) \
    && (!lhs.member || *lhs.member==*rhs.member) \
) // same as for SFont

bool SLineFormat::IsInvisible() const& noexcept {
    // Note that !IsInvisible() does not necessarily mean "visible":
    // If !m_onWeight, then visibility is undefined/mixed.
    return m_onWeight && 0==*m_onWeight;
}

bool operator==(SLineFormat const& lhs, SLineFormat const& rhs) noexcept {
    return IS_EQUAL_ASPECT(m_onWeight)
        && (
            lhs.IsInvisible()
            || (
                IS_EQUAL_ASPECT(m_omsolinedashstyle)
                && IS_EQUAL_ASPECT(m_omsolinestyle)
            )
        );
}
Listing (4)
void SLineFormat::AssertInvariant() const& noexcept {
    if( m_onWeight ) {
        if( 0==*m_onWeight ) {
            std::assert( !m_omsolinedashstyle );
            std::assert( !m_omsolinestyle );
        } else {
            std::assert( 0 <= *m_onWeight );
        }
    }
    // For more details about how we deal with unexpected conditions in our
    // code, watch Arno's talk "A Practical Approach to Error Handling".
}

bool operator==(SLineFormat const& lhs, SLineFormat const& rhs) noexcept {
    AssertInvariant();
    return IS_EQUAL_ASPECT(m_onWeight)
        && IS_EQUAL_ASPECT(m_omsolinedashstyle)
        && IS_EQUAL_ASPECT(m_omsolinestyle);
}
Listing (5)

While ignoring the values of irrelevant aspects sounds simple and most canonical in theory, in practice it turns out that ensuring specific values for meaningless aspects allows for cleaner, simpler, more canonical code. Clean, simple, canonical code means fewer bugs to begin with, and better maintainability, therefore we mostly use the latter approach in our software. In particular, due to the way our properties are designed, each aspect conveniently has an "undefined" (std::nullopt) state, anyway. Using the "undefined" state for this purpose saves us from dealing with arbitrary magic values (which would also work in principle, of course).

With this basic understanding of dependent (non-orthogonal) property aspects, let's look at the implementations of some other key methods and functions for SLineFormat:

#define SET_ASPECT(member) if(rhs.member) member = rhs.member // same as for SFont

void SLineFormat::operator<<=(SLineFormat const& rhs) & noexcept {
    SET_ASPECT(m_onWeight);
    if( IsInvisible() ) {
        m_omsolinedashstyle = std::nullopt;
        m_omsolinestyle = std::nullopt;
    } else {
        SET_ASPECT(m_omsolinedashstyle);
        SET_ASPECT(m_omsolinestyle);
    }
    AssertInvariant();
}

SLineFormat operator<<(SLineFormat lhs, SLineFormat const& rhs) noexcept {
    lhs <<= rhs;
    return lhs;
} // analogous to SFont
Listing (6)

With (6) we can now use operator<<(...) to write chained expressions, just as with SFont, which are simple, elegant and, as we will see below, wrong (quote attributed to H. L. Mencken). For the sake of an example, let's look at our product domain again: We have a global default line format for visible lines, and we have a hierarchy of partially defined default line formats for specific purposes. How do we compose the default for the outline of a highlighted segment in a bar chart?

auto const lineHighlight = lineDefaultGlobal
    << lineDefaultSegment << lineDefaultHighlight; // WRONG!
Listing (7)

This expression worked great for SFont, what's wrong with using it for SLineFormat? Let's assume we start out with some sensible global default line format. Some modern styles for charts use colored areas without outlines, so lineDefaultSegment may be set to the unambiguous invisible outline. Highlighting a segment may justify a particularly thick outline (maybe even a red one, but we don't support color in our toy example), and it shall use whatever MsoLineDashStyle and MsoLineStyle is defined in the global default. Thus, our defaults may look like this:

SLineFormat const lineDefaultGlobal{
    /*m_onWeight*/ 6,
    /*m_omsolinedashstyle*/ msoLineSolid,
    /*m_omsolinestyle*/ msoLineSingle
};

SLineFormat const lineDefaultSegment{
    /*m_onWeight*/ 0,
    /*m_omsolinedashstyle*/ std::nullopt,
    /*m_omsolinestyle*/ std::nullopt
};

SLineFormat const lineDefaultHighlight{
    /*m_onWeight*/ 12,
    /*m_omsolinedashstyle*/ std::nullopt,
    /*m_omsolinestyle*/ std::nullopt
};
Listing (8)

If we feed the values from listing (8) into expression (7), the result is a partially defined line format: {/monWeight/ 12, /momsolinedashstyle/ std::nullopt, /m_omsolinestyle/ std::nullopt}. Obviously, this is not useful: In order to display a line around a highlighted segment, knowing the thickness is not enough. We need some well-defined MsoLineDashStyle and MsoLineStyle, too. Where exactly did msoLineSolid and msoLineSingle of lineDefaultGlobal get lost? The answer is: Associativity.

Operator precedence rules of C++ state that << is evaluated from left to right. If we hit one invisible line format in our chain of << operations, then according to listing (6) we clean out all hidden state. When we then hit another line format, that is partially defined visible, we cannot recover the missing aspects from the left-most operand. This can easily be fixed: We need to evaluate the chain from right to left:

auto const lineHighlight = lineDefaultGlobal
	<< (lineDefaultSegment << lineDefaultHighlight); // correct
Listing (9)

That's it for today! If there are only two things that you take away from this article, it should be that whenever you design a data structure, be aware of hidden state. And whenever you encounter hidden state, make a conscious decision how you want to deal with it, because that decision will affect the complexity of your code and the pitfalls you'll encounter when using your data structure. As always, if you have any feedback, don't hesitate to get in touch!

— by Volker Schöch

Do you have feedback? Send us a message at devblog@think-cell.com !


November 23rd, 2022

Properties (1)
Modeling the Universe of Fonts

Modeling the universe of fonts

Let's talk about a useful data structure for fonts. For the sake of this article, let's assume that a font consists of just a font face, a size and a flag for boldness. Turns out that even this simplified view of fonts is very useful in many use cases we encounter in our software:

// don't want to go into the details of possible string representations
struct MyString;

struct SFont {
    MyString m_strName;
    int m_nSize;
    bool m_bBold;
    // add italic, underline, strike-through, baseline offset,
    // spacing, allcaps, ... as needed
};
Listing (1)

In our PowerPoint add-in, we use a font representation like this for many different purposes, e.g.,

  • displaying the font settings of selected text,

  • applying font settings to selected text, in response to some user interaction,

  • applying default font settings to initialize some text that was inserted by our add-in.

This sounds easy enough, but there are some practically relevant limitations:

  • How can we represent a mixed font selection, e.g., where only some of the selected text is bold?

  • How can we represent a font for application when, e.g., the user wants to change only the size while leaving the boldness alone?

For display, we could use std::optional<SFont>, with std::nullopt representing a mixed font. This would work in principle, but even if only boldness is mixed in the selected text, and size and name are consistent, we could only show "Mixed font" without being able to display the size or the name. Not the greatest user interface on earth.

For application, we could use separate functions for the name, and the size, and the boldness setting. Given that in most practically relevant software, the user request for a font change would probably be processed by a hierarchy of nested function calls, the complexity of our SFont struct would proliferate throughout the entire call hierarchy. Want to add support for italic? Add a call tree of SetItalic(...) functions. Not the greatest software architecture on earth.

To solve both problems (and more, as we will see below), let's try to use std::optional inside of SFont:

struct SFont {
    std::optional m_ostrName;
    std::optional m_onSize;
    std::optional m_obBold;
};
Listing (2)

Let's call SFont a "property", and let's call its members m_ostrName, m_onSize and m_obBold "aspects" of the property. Aspects may be "mixed" or "undefined" (std::nullopt).

How does this solve our problem for display? When picking up the font aspects from text selection, we can fill SFont with well-defined information for aspects that are consistent throughout the selection. Aspects that have varying values within the selection can be represented by std::nullopt. When displaying the resulting font, we can display the values of consistent aspects, while for inconsistent aspects, we can show some representation of "mixed": "Arial ... pt bold", "... 12 pt" (implicitly non-bold) or "Arial 12 pt (bold)" are much more meaningful to a user than just "Mixed font".

How does this solve our problem with application? Whether we want to set size, boldness or name, or any combination thereof, we now pass an SFont object (or a reference thereof) through the entire call hierarchy. We call this a partially defined font. Functions that are just passing it on, do not have to care about which aspects of SFont actually carry values and which are std::nullopt.

With partially defined fonts, we can do some interesting things. For instance, in our software, we have hierarchical default settings: When determining the default font for a sum label in a stacked chart, we start with a fully defined global default font, which we infer from the PowerPoint Master Slide. We then apply, e.g., a general default font for chart labels, and finally a specific default font for sum labels. Except for the global default font, all fonts can be partially defined, so the chart default font may set the font size to 10 pt while leaving name and boldness alone, and the sum label default font may set the font to bold without affecting size and name. We write this neatly as:

auto const fontSum = fontDefaultGlobal << fontDefaultChart << fontDefaultSum;
Listing (3)

To facilitate this expression, we define the following operator overloads:

#define SET_ASPECT(member) \if(rhs.member) member = rhs.member
void SFont::operator<<=(SFont const& rhs) & noexcept {
    SET_ASPECT(m_ostrName);
    SET_ASPECT(m_onSize);
    SET_ASPECT(m_obBold);
}
SFont operator<<(SFont lhs, SFont const& rhs) noexcept {
    lhs <<= rhs;
    return lhs;
}
Listing (4)

You may argue that you don't want to overload operator<< with functionality that is semantically unrelated to bit shifting. You are free to rename SFont::operator<<=(...) to, e.g., SFont::Set(...), but you are trading it for the conciseness of expression (3). Also, when using operator<< for this purpose, there is a potential issue with operator associativity, and we'll get to that in an upcoming blog post.

For now, while we are at it, let's look into some other useful methods for SFont. How can we pick up mixed font from selected text? If only we had a method to create something like a "union of fonts"...

#define UNION_ASPECT(member) \
    if(member && rhs.member && *member!=*rhs.member) member = std::nullopt

void SFont::Union(SFont const& rhs) & noexcept {
    UNION_ASPECT(m_ostrName);
    UNION_ASPECT(m_onSize);
    UNION_ASPECT(m_obBold);
}

void Union(std::optional& ofont, SFont const& font) noexcept {
    if( ofont ) {
        ofont->Union(font);
    } else {
        ofont = font;
    }
}

template
std::optional CollectFont(RngFont const& rngfont) noexcept {
    // see our public domain range library at https://github.com/think-cell/range
    return tc::accumulate(
        rngfont,
        std::optional(),
        [&](auto& ofontAccu, auto const& font) noexcept { Union(ofontAccu, font); }
    );
}
Listing (5)

We are using std::optional<SFont> again, but this time it serves a different purpose. As the function name Union(...) suggests, we like to think of properties and aspects in terms of set theory. Our universe is the set of all possible fonts that could be represented by SFont. An SFont object represents a subset in this universe: If all members are well-defined, it represents a singleton. If no members are defined, the SFont object represents the universe. If a font has name "Arial" and size "12 pt" with boldness undefined, it represents the set with the two members "Arial 12 pt non-bold" and "Arial 12 pt bold".

Now that we are thinking in terms of set theory, we can say that by making some aspects undefined, method SFont::Union(...) enlarges the subset represented by *this. Specifically, it removes all aspects that *this and rhs do not agree upon. If there is no agreement at all, the resulting subset is the entire universe. But wait, if we want to iteratively calculate the union of n singleton sets, as is the case in our text selection example, how do we start? An accumulating iteration needs to start with an identity element of the respective operation. We conveniently use std::nullopt as an identity element for our Union(...) function. The identity element for union is the empty set and thus you can think of std::nullopt as the empty set in this example.

With std::nullopt serving as a convenient, generic identity element for any accumulating algorithms, and with the base type X for std::optional<X> being implicitly provided as the range value type, we can wrap this approach into a generic algorithm. We call it tc::accumulatewithfront(...), because the iteration starts with the first element of the range, rather than with an explicit start element. If the range is empty, std::nullopt is returned. Note that we do not need the helper function Union(std::optional<SFont>&, SFont) anymore, because it is implicit in the definition of tc::accumulatewithfront(...):

template
std::optional CollectFont(RngFont const& rngfont) noexcept {
    // see our public domain range library at
    // https://github.com/think-cell/range
    return tc::accumulate_with_front(rngfont, TC_MEM_FN(.Union));
}
Listing (6)

The notion of SFont being a representation of a subset of fonts can also be useful when we want to know whether applying one font on top of another would make any difference: We can now phrase this question as an IsSupersetOf(...) predicate. Here are two implementations that are equivalent with regard to their results, although one uses more memory and more operations than the other:

#define IS_SUPERSET_OF_ASPECT(member) \
    (!member || rhs.member && *member==*rhs.member)

bool SFont::IsSupersetOf(SFont const& rhs) const& noexcept {
    return IS_SUPERSET_OF_ASPECT(m_ostrName)
        && IS_SUPERSET_OF_ASPECT(m_onSize)
        && IS_SUPERSET_OF_ASPECT(m_obBold);
}
Listing (7)
#define IS_EQUAL_ASPECT(member) ( \
    static_cast(lhs.member)==static_cast(rhs.member) \
    && (!lhs.member || *lhs.member==*rhs.member) \
)

bool operator==(SFont const& lhs, SFont const& rhs) noexcept {
    return IS_EQUAL_ASPECT(m_ostrName)
        && IS_EQUAL_ASPECT(m_onSize)
        && IS_EQUAL_ASPECT(m_obBold);
}

bool SFont::IsSupersetOf(SFont const& rhs) const& noexcept {
    return rhs==rhs << *this;
}
Listing (8)

While IsSupersetOf(...) in itself may already be useful in some situations, in our software we had one problem to solve that was closely related but a bit more tricky: We needed to extract the relevant information from one font relative to another, in order to store it, e.g., as a default or for quick access. A typical example would be a user applying an arbitrary partially defined font to some label that already has a font. We wanted to avoid storing any redundant, unnecessary information, because that would then inhibit our hierarchical font composition, see expression (3). Similarly, you may want to eliminate any unnecessary aspects from the font before application to (as it happens to be the case in our software) PowerPoint, because calls to PowerPoint can be expensive. Enter Minimize:

#define MINIMIZE_ASPECT(member) \
   if( IS_SUPERSET_OF_ASPECT(member) ) member = std::nullopt

void SFont::Minimize(SFont const& rhs) & noexcept {
    MINIMIZE_ASPECT(m_ostrName);
    MINIMIZE_ASPECT(m_onSize);
    MINIMIZE_ASPECT(m_obBold);
}
Listing (9)

Note: Minimize is not set difference!

You probably noticed that the result of the SFont::Union(...) method is not actually the set-union of the two subsets that were passed as parameters. Rather, it is the smallest set that contains both subsets and can be represented by our definition of SFont. That's the beauty of the data structure as presented in this acticle: It is very simple yet very useful. We achieve the simplicity by dropping the actual values of anything "mixed". For our practical purposes, that is a reasonable trade-off.

When trying to grasp this slightly peculiar behavior of SFont::Union(...), I find it helpful and illustrative to look at SFont in terms of an n-dimensional space (n==3 in our toy example), that is spanned by its aspects "name", "size" and "boldness". A fully defined font is equivalent to a point. If one aspect of the font is "mixed" or "undefined", the partially defined SFont object can be seen as a line in the space of all possible fonts. Remove another aspect, and you're left with a plane. When no aspects are left, the resulting SFont object represents the entire space. In this space, the operator<<(lhs, rhs) as introduced above (4) calculates the projection of lhs onto rhs.

If you have any feedback, I'd be glad to hear from you. Don't hesitate to let me know what you think about partially defined properties!

Next up: Applying the same ideas to the design of SLineFormat turns out to be less straight-forward than you may expect.

— by Volker Schöch

Do you have feedback? Send us a message at devblog@think-cell.com !


November 16th, 2022

tc::change

Hello,

welcome to this blog of our development work at think-cell. It will be mainly about programming in general and more specifically about C++. Our platforms are mainly Windows and macOS, with a bit of web development sprinkled in. We will write about anything that comes to our mind, architectural decisions, little nuggets of wisdom we found or rants about bugs in 3rd party software that leave us frustrated.

At think-cell, we are writing mainly in C++, with some in-house Python scripts mixed in. We built our own in-house library which builds on top of Boost and the C++ Standard Library, and strives to follow the C++ Standard Library in conventions such as names, so that new users find it easy to get started. It is on GitHub and free to use under the Boost license.

I want to get started with a little utility that proved surprisingly useful for us and that has a bit more thinking behind it than is apparent at first sight.

We all know the concept of a dirty flag, which is set somewhere indicating some work to be done, and then queried and reset elsewhere where the work actually happens:

…
dirty=true;
…
if(dirty) {
    dirty=false;
    … do work …
}

Easy enough. There is a degree of freedom here though: do you reset the flag before or after the work is done? You may favor resetting it afterwards:

…
dirty=true;
…
if(dirty) {
    … do work …
    dirty=false;
}

This seems more expressive: you only say you are done with the work when you actually are. But is it practical? Clearly, if busy is not checked while doing work, it does not matter. What if it is checked? In particular, it could be that during the work, the part of the code doing work is being reentered. If you reset dirty early, the work will be skipped. If you reset it late, it won’t. What’s better?

…
dirty=true;
…
if(dirty) {
    … work part 1 …
    reenter_myself();
    … work part 2 …
    dirty=false;
}

At the time of reentrance, only part 1 of the work will be done, no matter how often you reenter the code doing work. Repeating part 1 on reentrance is likely at best not going to help, and at worst leads to an infinite recursion. To be correct, in any case, you must structure the work such that part 1 is sufficient for the code running inside the reentering code path.

If we accept that reasoning, resetting dirty early is always better. If part 1 is not sufficient, the code is incorrect anyway. If it is, we avoid redundant work and possibly an infinite recursion.

In our library available at GitHub, we packaged this insight into a little utility, tc::change, that is an assignment with the additional feature to return if something has changed:

…
dirty=true;
…
if(tc::change(dirty, false)) {
    … do work …
}

Besides for boolean flags, it also useful for other values where changing them entails some dependent work:

if(tc::change(size, screen_size())) {
    … consequences of screen size change …
}

We use tc::change sooo much. Try it!

— by Arno Schödl

Do you have feedback? Send us a message at devblog@think-cell.com !

Melden Sie sich für Blog-Updates an!

Verpassen Sie keine neuen Beiträge! Melden Sie sich an, um eine Benachrichtigung zu erhalten, wenn wir einen neuen Artikel veröffentlichen.

Geben Sie einfach unten Ihre E-Mail-Adresse an. Deine Informationen werden von uns nicht an Dritte weitergegeben.

Hinweise zum Schutz Ihrer personenbezogenen Daten finden Sie in unserer Datenschutzrichtlinie.

Teilen