Papertrail

The revolution will be verbosely {,b}logged

Main Menu Changes (or, Where'd My Profile Go?)

Posted by @rpheath on

We’ve made a few small changes to the main navigation in the header. Here’s what it used to look like:

Old Navigation Menu

It now looks like this:

Updated Navigation

The old “Account” tab is now a more accurate “Settings” tab, and we got rid of the “Me” tab altogether, replacing it with a logout link.

And finally, your Profile can now be found in the Settings area:

New Profile Location

Temporarily Mute Log Senders

Posted by @lmarburger on

Papertrail now has a way to stop processing logs from a sender. A sender can be muted for an hour during maintenance or to run a load test.

This augments the filtering improvements released last week.

Background

In the middle of an incident, planned or unplanned, it’s important to have quality logs. A down database, misconfigured logger, or running a load test could all produce a flood of useless log messages. If it’s a database that’s down for maintenance, being overwhelmed with connection errors from web servers is unhelpful. Not only are these errors pure noise, they consume log data transfer without adding any value. Nobody benefits.

Mute the log sender before it pollutes your log repository with useless messages. Let Papertrail enable the sender after the chosen amount of time or come back and enable it manually.

This change is part of our broader effort to provide the logging flexibility you need to handle unexpected circumstances:

When you encounter ways which centralized control or filtering would make your logs more powerful, please let us know.

Flexible Log Filtering

Posted by @lmarburger on

The set of interesting log messages changes depending on the context. Log messages which are useful while in the middle of an outage, or to debug a kernel error, can be noise during day-to-day operations. As of today, it’s easier to choose the messages which are currently useful to you.

Flexible, Centralized Filters

It’s common to use systems and services with logging behavior that can’t be modified, like systems with strict change control, managed services, and closed-source apps. Even when senders can be changed, Papertrail offers a central place to apply flexible filters to many log streams. Log filters are a way to drop unnecessary messages and pay only for the messages you find useful.

Over time, the filters (regular expressions) became long and hard to read. Log filters can now be broken into their component parts and each can have a note describing its purpose.

Disable or Enable Filters

A new feature in this release is a way to toggle an individual filter. This is especially useful for logs that are noise during normal operation but are critical during an incident or to debug a system. It effectively becomes a way to change log verbosity without touching the systems themselves.

For instance, the JVM garbage collection logs can be very noisy, but are very helpful in tuning the GC. Filtering these logs in Papertrail means GC log collection can be enabled while only seeing those log messages at the times they’re needed. No restart of the JVM process required.

Head to your Papertrail account and click Filter logs and get rid of those useless logs today.

Making Group Info Pages Faster For Very Large Groups

Posted by @lmarburger on

Some customers have thousands or even tens of thousands of log senders. Viewing the details of these large groups of senders can be quite slow. To speed up page loads, we’ve added pagination and changed how filtering log senders works.

The syntax for filtering a group’s log senders now matches the syntax for adding dynamic log senders to a group.

  • ? matches a single character
  • * matches any number of characters
  • Combine multiple searches with a comma

Welcoming Ryan Heath to Papertrail

Posted by @troyd on

It started with an email through Flickr way back in

  1. Papertrail’s co-founder, Eric Lindvall, saw a portfolio which caught his eye. Eric sent an unsolicited message to the designer, Ryan Heath, using the only method that Eric could find — Flickr. After a few interactions, Ryan began collaborating with the team, and a step that’s felt inevitable has finally happened: we’re thrilled to officially welcome Ryan Heath as Papertrail’s UI caretaker.

Ryan has had a hand in Papertrail’s design since day one, but was always limited by his day job, so we offered him a new one :-)

Until recently, Ryan provided design and Web development consulting, contributing to dozens of sites, Web services, and mobile apps. While client work intrigues him, solving UI/UX problems through product design is his true passion.

As Eric’s reaction to his design work shows, he’s an incredibly talented designer. Ryan also has a few (computer, electrical, software) engineering degrees, and that engineering background helps him understand the problems that Papertrail’s customers use Papertrail to solve. If “Design engineer” was a title, Ryan would have it.

Ryan resides in Morgantown, WV with his wife and two kids. His home office houses, among other things, a plethora of design books, a poster of Don Draper, a printmaker’s desk lamp from the 1950s, a Herman Miller chair, and the infamous contour bottle from Coca-Cola. He’s a collector of objects, to say the least. He enjoys thoughtful design, photography, and golf.

We’ve been fortunate to work with Ryan for years now, and we’re even more excited that he’s joining the team on a full-time basis.

Retiring SSL 3.0

Posted by @leonsodhi on

Summary

On Friday, June 12, 2015, Papertrail will remove support for the outdated security protocol SSL 3.0, which was released in 1996 and has since been superseded by TLS.

TLS is automatically used by nearly all modern loggers that can send encrypted log messages, so for the vast majority of customers this change will have no impact. However, there are exceptions to this which are explained in the next section.

In addition to this blog post, next week, we’ll directly email all customers who we believe will or are likely to be affected.

Update: On May 29, 2015, the SSL 3.0 retirement date was changed from June 5 to June 12.

What action do I need to take?

For clear text logging (which includes all UDP logging and some TCP logging), no changes are necessary.

For those sending log messages in an encrypted form using nxlog, an upgrade to 2.9.1347 will be needed. Other loggers may also be affected, but at this time we aren’t aware of any. Next week, we’ll directly email all customers who we believe will or may be affected, and will work with anyone that needs help upgrading or switching to an alternative logger that supports at least TLS 1.0.

If you’re concerned that you may be impacted, won’t be able to upgrade affected senders by June 5th, or have other questions, please email us.

Why is this happening now?

On October 14, 2014, the POODLE vulnerability was publicly disclosed. It described how a man-in-the-middle attack could be performed that would reveal plain text data from an encrypted log packet transmitted via SSL 3.0. This attack illustrates a fundamental flaw in the protocol which cannot be properly patched. As a result, most vendors released updates which disabled it.

In accordance with best security practices, Papertrail applied this patch to all web servers and log ingestion points on the same day that POODLE was announced. However, due to a misconfiguration, SSL 3.0 remained enabled on the latter and was deactivated in the last few weeks as part of an unrelated patch.

This 2nd update was applied to each ingestion point over several days, which meant that some syslog endpoints were patched while others weren’t. Due to DNS round robin, some nxlog clients would successfully connect to the unpatched endpoints while others would fail to connect to the patched.

After every ingestion point had been updated, it was discovered that recent versions of nxlog only support encrypted logging via SSL 3.0 and thus could not establish a secure connection to patched endpoints.

After speaking with customers, we decided to re-enable SSL 3.0 to provide a reasonable amount of time for loggers to be upgraded. We will be disabling SSL 3.0 again on June 5th.

If you have any questions, please email us.

Papertrail joins SolarWinds and accelerates growth

Posted by @troyd on

On behalf of all of us at Papertrail, I’m thrilled to announce that Papertrail is now part of SolarWinds. Read more here.

As I said in the announcement, joining the SolarWinds family gives us more resources to make Papertrail better and to do so faster. As a first example, SolarWinds helped Papertrail acquire the domain name papertrail.com, which the team has wanted to obtain for years.

Details: SolarWinds Adds Cloud-based Log Management Capabilities with Acquisition of Papertrail

Why did Papertrail join SolarWinds? How will this benefit customers?

Eric and I started Papertrail to solve a problem we had: create the log management service that we wanted to use as developers and operations engineers ourselves, then make it as easy and as powerful as we can.

We’ve done that: Papertrail is a thriving, profitable business trusted by tens of thousands of customers. That’s been fulfilling enough that we’re ready to “double down” and deliver amazing log management and infrastructure visibility at an even larger scale.

Papertrail chose to work with SolarWinds because we share an appreciation for simple, practical products that shine in everyday use. As you’d expect, you’ll continue to work with the same Papertrail team (and we’ll be growing). SolarWinds sees tremendous potential in Papertrail and this combination enables Papertrail to get better, faster.

New search alerts dashboard

Posted by @lmarburger on

Search alerts are my favorite feature of Papertrail. Almost every organization considers at least one subset of logs worthy of human attention, whether as a daily summary or an instant notification. That’s precisely why alerts exist. We’ve released a number of improvements to how alerts are managed. Below are a few of our favorites.

Alerts Dashboard

Search alerts and their details are listed on the new alerts dashboard and when editing a saved search. This helps answer questions like “Who’s this emailing?” or “What’s the metric name in Librato?” Creating a new alert lists all the available services so it’s clear what you can do with alerts and how.

Multiple Alerts

Add multiple alerts to a saved search using the same alert service. For example, a saved search could ping several webhooks or email individual members of your team. Thresholds and frequencies are configured independently of other alerts.

Get Started

Pick a saved search that you or someone on your team might want to skim once a day and choose to receive it in an email. You’ll soon find yourself with a small “robot army” of alerts, from a daily email of failed CSRF requests to a graph of how frequently a Ruby VM segfaults. Enjoy!

DNS Outage on Monday, December 1, 2014

Posted by @troyd on

Summary

Most DNS requests for papertrailapp.com timed out for 6 hours, affecting many Web visitors and a small number of log senders. The outage was absolutely unacceptable. Personally and on behalf of Papertrail, I’m sorry.

Although the outage was caused by a DDoS against our DNS service provider, it’s our problem. We’re implementing complete DNS provider redundancy so that similar problems do not affect Papertrail.

What happened?

For about 6 hours on Monday, December 1, most DNS requests for papertrailapp.com timed out. Between approximately 19:15 UTC and 00:45 UTC the following day (11:15 - 16:45 Pacific), nearly all non-cached DNS requests timed out. As the attack pattern changed, a much smaller percentage of requests continued to time out until about 06:00 UTC (22:00 Pacific).

This DNS outage mostly impacted users trying to reach Papertrail’s site. Comparing Papertrail’s log volume during this outage to typical volume on a Monday, we did not see a noticeable change. The outage impacted few log senders because the resolved log destination was cached by most sending daemons as well as local and remote DNS servers. If your site was impacted, please contact us so we can understand and mitigate the impact of future problems. If you already reported this, we’ll be in touch.

The outage was caused by a very large distributed denial of service (DDoS) attack against a company which Papertrail purchases DNS service from. While the attack was destined for one of our service providers, we’re responsible for delivering Papertrail to you and this outage is our responsibility.

We posted frequent updates to Papertrail’s status site and @papertrailops, though for reasons discussed below, the updates were not as accessible as they should have been.

What we’re doing

While a DDoS against a service provider was the cause, Papertrail’s responsibility is to ensure reliable resolution of our own zone. In doing that, we made 2 mistakes:

  • Underestimating the duration of the longest foreseeable DNS outage
  • Based on that underestimation, depending on a single DNS infrastructure

Our estimate was based on many past large attacks that this service provider has handled in the 2 years we have used it. This attack was too large and too distributed for their anycast network and filtering the attack took much longer than we had considered.

While our service provider will make their network more resilient to attacks, making any single DNS infrastructure more resilient is not the solution. An attacker will always be able to generate a larger DDoS (and service problems other than DoSes will still impact it). For Papertrail and other maintainers of mission-critical DNS zones, the solution is to not depend on any single DNS infrastructure for functioning authoritative DNS.

Authoritative DNS has one very thoughtful design property: it’s inherently distributed. Client resolvers (like typical Web browsers and log senders) will try to query multiple servers. Nothing about the service we purchase from our current provider prevents using additional providers or our own DNS infrastructure. Using multiple DNS infrastructures takes more effort to design and maintain – effort that, because we underestimated the impact of a severe outage, we invested elsewhere. Monday showed us that the extra effort is worthwhile and necessary.

Relying on one DNS infrastructure, no matter how large or distributed, is an unnecessary risk that we unintentionally took. On behalf of Papertrail, I apologize for our mistake and the impact it had on your operations.

Also, during the outage we realized that our status updates were not as accessible as they should be. Our status site used a hostname under the papertrailapp.com domain name, so it was also inaccessible during the outage.

We’ve registered papertrailstatus.com for Papertrail’s status site. The status site does not depend on any of the same DNS infrastructures that Papertrail itself does (and was already hosted independently). We added a reference to @papertrailops on the Twitter bio for @papertrailapp and started to retweet significant updates from the more visible @papertrailapp account.

We’ve started planning how Papertrail can use multiple, independent anycast DNS infrastructures so that a DDoS or outage of one DNS service does not affect Papertrail. If we can go into more detail about any of this, please contact us.

Update (2014-12): This change is complete and Papertrail does not depend on any single DNS infrastructure. Papertrail wrote and released dnsync to synchronize a zone between two providers, one of whom did not support AXFR.

Hide Searches from Dashboard

Posted by @troyd on

Two small but meaningful tweaks make Papertrail’s Dashboard slightly more flexible.

Hide alert-oriented searches

Have lots of searches, some of which are mostly to generate alerts? It’s now possible to omit those searches from the Dashboard.

To hide a search, scroll to it on the Dashboard. Click its edit icon and un-check “Show search on Dashboard”:

Saved search settings - Show search on dashboard

The search will only appear on the associated group’s page (such as “All Systems”).

Navigating from the Dashboard to create or update alerts is easier. Click the alert icon:

Dashboard - Edit Settings and Manage Alerts icons

Enjoy!

Announcing remote_syslog2: aggregate text log files with one dependency-free binary

Posted by @troyd on

We’re pleased to announce a new daemon for transmitting log messages from text log files to Papertrail: remote_syslog2. remote_syslog2 replaces its predecessor, remote_syslog, and provides 2 major improvements:

  • no dependencies, so setup is painless. We rewrote the daemon in Go so it doesn’t require Ruby or any other VM. While we love the Ruby language, we love dependency-free apps a lot more.

    The tarball contains a single binary. “Installation” is simply copying that file and an optional configuration file. remote_syslog2 runs on almost any modern Unix with a working C library.

  • less code (read: potential idiosyncrasies) between the daemon and the OS. Most of this daemon’s work is file or network I/O. Because Go’s libraries are closer to direct mappings to the corresponding libc calls, they’re less likely to introduce bugs or implementation idiosyncrasies.

    Those which do occur are generally easier to reproduce and diagnose, since any two systems are much closer to identical. In contrast, the older remote_syslog has a known TLS bug that we believe was introduced by a dependency (rather than in remote_syslog itself), but which was effectively impossible to reproduce consistently enough to diagnose.

We’ll continue to support the older remote_syslog, so there’s no pressure to upgrade. For new environments, we strongly recommend remote_syslog2. To learn more or download a release tarball, visit the README.

10 Minute Alert Frequency

Posted by @troyd on

Papertrail now offers a fourth time frequency for search alerts: every 10 minutes.

10 minutes is short enough for many operational problems, yet long enough to capture multiple occurrences of many problems. 10 minutes fits 2 common situations particularly well:

  • Time-sensitive problems which check for a certain numer of occurrences (using Papertrail’s minimum alert threshold). For example, 3 segfaults in 10 minutes.

  • Urgent problems which may continue for some time once they start. Waiting an hour is too long, yet there’s no need for a notification every minute while a problem is occurring.

To change the frequency of a search alert, visit Papertrail’s Dashboard, hover over the related search, and click “Edit”:

Search alert frequencies

Enjoy!

OpenSSL "Heartbleed" vulnerability summary

Posted by @troyd on

A vulnerability in OpenSSL called CVE-2014-0160 (nicknamed “Heartbleed”) was publicly announced on Monday, April 7. Papertrail:

  • Patched the HTTPS endpoint serving papertrailapp.com on Monday at 3:30 PM UTC-7 (see status blog).

  • Verified that our TLS-encrypted log endpoint is not vulnerable to the exploit.

  • Changed https://papertrailapp.com/ to use a new TLS certificate at 5:00 PM UTC-7. This certificate was generated by a different private key. Related internal passphrases were also changed.

  • Deployed forward secrecy As part of patching OpenSSL.

This vulnerability affects many, probably most, SSL-enabled Internet services in some form. We echo Tumblr’s recommendation, as reported in the LA Times: “take some time to change your passwords everywhere.” Be safe.

New Account Usage Overview

Posted by @lmarburger on

The new account usage page gives an overview of total log transfer over the past month and the senders with the highest volume. Find it linked from your account’s plan usage section.

Account Usage

Use these new graphs to see things about your usage that was previously unavailable. For example:

  • Top senders: Spot the senders logging the most data. Jump to a high-volume sender’s logs to look for unused messages which can be filtered.

  • Changes in log rate: Did a recent deploy affect log output? Maybe a debug message slipped through to production or an entire sender has been silent today.

  • Unexpected senders: Perhaps a daily job logs a good deal of unnecessary data. Unless someone is watching the events while the job runs, it’s possible no one would ever notice.

Along with log rate notifications, we hope you’re able to catch unnecessary logs quicker to trim noise and save money.

More consistent service intervals and 16/25 GB pricing

Posted by @troyd on

We’ve made 3 minor changes to make Papertrail’s service easier to understand and more predictable:

  1. The free plan now measures usage the same way as all other plans, which is from one day of a month (such as the 13th) to the same day in the following calendar month. Details are below.
  2. Existing plans with 16 GB/month of log data transfer are now less expensive. The new prices are more logical when compared to 8 and 25 GB/month plans. All existing customers on the 16 GB/month plans will automatically receive the lower pricing as well as individual emails about the change.
  3. New plans have been created with 25 GB of log data transfer per month. They fill an unintentional gap between 16 GB and 50 GB which previously required pay-as-you-go additional usage. As with all Papertrail plans, they are available with 1, 3, 7, 14, and 30 days of searchable retention and 1 year of archives.

In all cases, the changes are either less expensive or have essentially no impact. The old and new prices for Papertrail’s 16 GB/month plan are below. To change to the 25 GB/month plan or any other, see popular plans or build your own.

Free plan service interval

Until today, usage on Papertrail’s free service was measured in a slightly different way than usage on all other plans. Instead of using a fixed day of the month (such as the day that service began), the free plan used a sliding 30-day window.

This was intended to be more flexible, but it’s not. There’s no situation where a 30-day window is more accommodating than a fixed 1-month period. Also, after one has explained a sliding window 2 or 30 times, the novelty value wears off too.

As of today, all of Papertrail’s monthly plans work the same: usage is measured for a 1-month period beginning on a specific day of the month. Usually this is the day that you signed up or that the current service began.

This change has almost no impact. The log data transfer in a fixed 30-day measurement period is equally likely to be slightly higher as it is slightly lower (compared to a 30-day sliding window). All measurement information is shown on the Account page.

16 GB/month price change

Searchable retention Old New
1 day $90 $82
3 days $145 $103
7 days $175 $125
14 days $225 $150
30 days $260 $195

Questions

Please let us know if we can answer questions or clarify any of this. Here’s to simplicity.