James Santucci's (mostly) software development blog

This post covers fancier plotting than the two previous posts. The Hello, plots post covered simple scatter plots. The configuration post in part two covered how to change a bunch of plot display attributes like scales and titles. This post will cover interactivity, small multiples, and labeling specific interesting points on your plots.

These examples will be more complicated than either of the previous post’s examples, so I’ll explain more of the plotting code than in previous posts. Because this post would be really long if I included more than one library’s worth of examples, I’ll only focus on hvega here.

Both examples below can be found in my goofing-off repo.

Interactivity and annotation

This plot shows an estimate of pi based on the proportion of random points that fall inside a quarter of the unit circle. It’s based on this example from Vega’s tutorials.

hvegaPiMonteCarlo :: IO ()
hvegaPiMonteCarlo =
  let nRows = 10000
      points = samplePointsDf nRows
      rows =
        -- map the columns from the dataframe into objects hvega expects
        V.dataFromColumns []
          . V.dataColumn "idx" (V.Numbers (fromIntegral @Int <$> DT.columnAsList @"idx" points))
          . V.dataColumn "x" (V.Numbers (DT.columnAsList @"x" points))
          . V.dataColumn "y" (V.Numbers (DT.columnAsList @"y" points))

      -- add Vega data to each row for whether the point is inside
      -- or outside the unit circle, a rolling count of how many
      -- points are inside the unit circle, and an estimate of pi
      -- based on that count.
      randomPointsTransform =
        V.transform $
          ( V.filter (V.FCompose (V.Expr "num_points_idx >= datum.idx"))
              . V.calculateAs "datum.x * datum.x + datum.y * datum.y < 1 ? 1 : 0" "inside"
              . V.window [([V.WAggregateOp V.Sum, V.WField "inside"], "insideCount")] []
              . V.calculateAs "datum.insideCount * 4 / datum.idx" "piEstimate"
          )
            []
      -- color the points based on whether they're inside or outside the unit circle
      enc =
        V.encoding
          . V.position V.X [V.PName "x", V.PmType V.Quantitative]
          . V.position V.Y [V.PName "y", V.PmType V.Quantitative]
          . V.color [V.MName "inside", V.MmType V.Nominal]
          . V.opacity [V.MNumber 0.15]
      -- slider to choose a value bound to "num_points_idx" (combination of the
      -- selection field name and slider field name, for reasons I don't really
      -- understand); used to filter for rows with idx < slider value, i.e. to choose
      -- how many points are used in the Monte Carlo estimation of pi
      slider = V.IRange "idx" [V.InMin 100, V.InMax (fromIntegral nRows), V.InStep 10]
      selection =
        V.selection
          . V.select
            "num_points"
            V.Single
            [V.Fields ["idx"], V.SInit [("idx", V.Number 1000)], V.Bind [slider]]
      pi_ = V.transform . V.filter (V.FCompose (V.Expr "num_points_idx == datum.idx"))
   in V.toHtmlFile "plots/vegaCalculatePi.html" $
        V.toVegaLite
          [ rows [],
            randomPointsTransform,
            V.layer
              [ V.asSpec
                  [ V.mark V.Point [V.MFilled True],
                    enc [],
                    selection []
                  ],
                V.asSpec
                  [ pi_ [],
                    V.mark V.Text [V.MFontSize 18, V.MFontWeight V.Bold],
                    V.encoding
                      . V.position V.X [V.PmType V.Quantitative, V.PDatum (V.Number 1)]
                      . V.position V.Y [V.PmType V.Quantitative, V.PDatum (V.Number 0.5)]
                      . V.text
                        [ V.TName "piEstimate",
                          V.TFormatAsNum,
                          V.TFormat ".3f"
                        ]
                      $ []
                  ]
              ]
          ]

The biggest difference between this plot and previous plots is that this plot responds to user input with a slider for choosing how many points you want to use to calculate the value of pi. That happens via the IRange and selection functions. Additionally, I deviated from the example in Vega’s tutorials by just writing the current estimate of pi onto the main plot instead of adding it to a different plot. I made this choice to show adding text at a particular location in the plot and to keep the example from taking on all the complexity of the version in the tutorial.

Putting the annotation on the same plot works with layers (the two asSpec values passed to V.layer). The first layer shows only the points colored by whether they’re inside or outside the unit circle. The second layer shows only the current estimate of pi. pi_ in the second layer is a filtered view of the data from the first layer containing only the last row of the input for the number of points you choose.

If you pick a low value for the number of points then toggle back and forth, you can see that the data used to estimate pi are fixed – each increment/decrement adds and removes the same 10 points to the plot. The reason for that is that I started with a DataFrame instead of letting Vega do all of the work of generating the data, then calculated the data to estimate pi from the random points in Vega.¹

Faceting and freeform drawing

This plot shows another scatter plot, but with two modifications. First, instead of all the points for every tag in the same plot, I’ve separated the plots for each tag. Second, I’ve drawn a weird little chevron-ish shape on top of the mean position for each tag.

hvegaFacetAnnotatedMeans :: DT.TypedDataFrame LabeledDfSchema -> IO ()
hvegaFacetAnnotatedMeans df =
  let x = DT.columnAsList @"x" df
      y = DT.columnAsList @"y" df
      tag = DT.columnAsList @"tag" df
      vegaData =
        V.dataFromColumns []
          . V.dataColumn "x" (V.Numbers x)
          . V.dataColumn "y" (V.Numbers y)
          . V.dataColumn "tag" (V.Strings . (Text.pack . pure <$>) $ tag)
      means =
        V.transform
          . V.aggregate
            [ V.opAs V.Mean "x" "xBar",
              V.opAs V.Mean "y" "yBar"
            ]
            ["tag"]
      colorMarkProperties =
        [ V.MName "tag",
          V.MmType V.Nominal,
          V.MScale [V.SScheme "blues" [5]]
        ]
      enc =
        V.encoding
          . V.position V.X [V.PName "x", V.PmType V.Quantitative]
          . V.position V.Y [V.PName "y", V.PmType V.Quantitative]
          . V.color colorMarkProperties

      meansEnc =
        V.encoding
          . V.position V.X [V.PName "xBar", V.PmType V.Quantitative]
          . V.position V.Y [V.PName "yBar", V.PmType V.Quantitative]
          -- use the svgText shape defined above to mark the means
          . V.shape [V.MSymbol $ V.SymPath svgText]
   in V.toHtmlFile "plots/hvegaSimpleFacet.html" $
        V.toVegaLite
          [ -- facet by tag values
            V.facetFlow [V.FName "tag", V.FmType V.Nominal],
            -- define the facet layout; 4 columns, as many rows as necessary
            V.columns 4,
            vegaData [],
            V.specification . V.asSpec $
              [ V.layer
                  [ V.asSpec
                      [ V.mark V.Point [V.MFilled True],
                        enc []
                      ],
                    V.asSpec
                      [ means [],
                        meansEnc $ [],
                        (V.mark V.Point [V.MColor "firebrick"])
                      ]
                  ]
              ]
          ]

To create a custom marker shape, I used SymPath. SymPath takes a set of SVG path instructions to use as a marker.² In principle I think this means you can draw anything you want for a marker, but I didn’t try anything other than the little chevron.³ A custom marker shape isn’t quite drawing whatever you want wherever you want, but it’s something.

Faceting turns out to be pretty simple.

[
  V.facetFlow [V.FName "tag", V.FmType V.Nominal], -- 1
  V.columns 4, -- 2
  V.specification . V.asSpec $ [ ... ] -- 3
]

I declared I wanted to facet by the tag value, which holds nominal data.
I laid out the facets with four columns, letting Vega figure out how many rows that requires.
I provided additional plotting instructions that declare how each faceted plot should be drawn.

At 3, I didn’t have to provide data to each of the subplots. I don’t know exactly how it works in Vega’s javascript, but as an approximation, the faceted plots get their data from the root of the specification, filtered only to the part of the data that’s relevant for each subplot.

Haskell data visualization part 3: Bonjour, plots (fancy plotting)

Interactivity and annotation

Faceting and freeform drawing

Other posts