Papertrail

The revolution will be verbosely {,b}logged

Making Group Info Pages Faster For Very Large Groups

Posted by @lmarburger on

Some customers have thousands or even tens of thousands of log senders. Viewing the details of these large groups of senders can be quite slow. To speed up page loads, we’ve added pagination and changed how filtering log senders works.

The syntax for filtering a group’s log senders now matches the syntax for adding dynamic log senders to a group.

  • ? matches a single character
  • * matches any number of characters
  • Combine multiple searches with a comma

Welcoming Ryan Heath to Papertrail

Posted by @troyd on

It started with an email through Flickr way back in

  1. Papertrail’s co-founder, Eric Lindvall, saw a portfolio which caught his eye. Eric sent an unsolicited message to the designer, Ryan Heath, using the only method that Eric could find — Flickr. After a few interactions, Ryan began collaborating with the team, and a step that’s felt inevitable has finally happened: we’re thrilled to officially welcome Ryan Heath as Papertrail’s UI caretaker.

Ryan has had a hand in Papertrail’s design since day one, but was always limited by his day job, so we offered him a new one :-)

Until recently, Ryan provided design and Web development consulting, contributing to dozens of sites, Web services, and mobile apps. While client work intrigues him, solving UI/UX problems through product design is his true passion.

As Eric’s reaction to his design work shows, he’s an incredibly talented designer. Ryan also has a few (computer, electrical, software) engineering degrees, and that engineering background helps him understand the problems that Papertrail’s customers use Papertrail to solve. If “Design engineer” was a title, Ryan would have it.

Ryan resides in Morgantown, WV with his wife and two kids. His home office houses, among other things, a plethora of design books, a poster of Don Draper, a printmaker’s desk lamp from the 1950s, a Herman Miller chair, and the infamous contour bottle from Coca-Cola. He’s a collector of objects, to say the least. He enjoys thoughtful design, photography, and golf.

We’ve been fortunate to work with Ryan for years now, and we’re even more excited that he’s joining the team on a full-time basis.

Retiring SSL 3.0

Posted by @leonsodhi on

Summary

On Friday, June 12, 2015, Papertrail will remove support for the outdated security protocol SSL 3.0, which was released in 1996 and has since been superseded by TLS.

TLS is automatically used by nearly all modern loggers that can send encrypted log messages, so for the vast majority of customers this change will have no impact. However, there are exceptions to this which are explained in the next section.

In addition to this blog post, next week, we’ll directly email all customers who we believe will or are likely to be affected.

Update: On May 29, 2015, the SSL 3.0 retirement date was changed from June 5 to June 12.

What action do I need to take?

For clear text logging (which includes all UDP logging and some TCP logging), no changes are necessary.

For those sending log messages in an encrypted form using nxlog, an upgrade to 2.9.1347 will be needed. Other loggers may also be affected, but at this time we aren’t aware of any. Next week, we’ll directly email all customers who we believe will or may be affected, and will work with anyone that needs help upgrading or switching to an alternative logger that supports at least TLS 1.0.

If you’re concerned that you may be impacted, won’t be able to upgrade affected senders by June 5th, or have other questions, please email us.

Why is this happening now?

On October 14, 2014, the POODLE vulnerability was publicly disclosed. It described how a man-in-the-middle attack could be performed that would reveal plain text data from an encrypted log packet transmitted via SSL 3.0. This attack illustrates a fundamental flaw in the protocol which cannot be properly patched. As a result, most vendors released updates which disabled it.

In accordance with best security practices, Papertrail applied this patch to all web servers and log ingestion points on the same day that POODLE was announced. However, due to a misconfiguration, SSL 3.0 remained enabled on the latter and was deactivated in the last few weeks as part of an unrelated patch.

This 2nd update was applied to each ingestion point over several days, which meant that some syslog endpoints were patched while others weren’t. Due to DNS round robin, some nxlog clients would successfully connect to the unpatched endpoints while others would fail to connect to the patched.

After every ingestion point had been updated, it was discovered that recent versions of nxlog only support encrypted logging via SSL 3.0 and thus could not establish a secure connection to patched endpoints.

After speaking with customers, we decided to re-enable SSL 3.0 to provide a reasonable amount of time for loggers to be upgraded. We will be disabling SSL 3.0 again on June 5th.

If you have any questions, please email us.

Papertrail joins SolarWinds and accelerates growth

Posted by @troyd on

On behalf of all of us at Papertrail, I’m thrilled to announce that Papertrail is now part of SolarWinds. Read more here.

As I said in the announcement, joining the SolarWinds family gives us more resources to make Papertrail better and to do so faster. As a first example, SolarWinds helped Papertrail acquire the domain name papertrail.com, which the team has wanted to obtain for years.

Details: SolarWinds Adds Cloud-based Log Management Capabilities with Acquisition of Papertrail

Why did Papertrail join SolarWinds? How will this benefit customers?

Eric and I started Papertrail to solve a problem we had: create the log management service that we wanted to use as developers and operations engineers ourselves, then make it as easy and as powerful as we can.

We’ve done that: Papertrail is a thriving, profitable business trusted by tens of thousands of customers. That’s been fulfilling enough that we’re ready to “double down” and deliver amazing log management and infrastructure visibility at an even larger scale.

Papertrail chose to work with SolarWinds because we share an appreciation for simple, practical products that shine in everyday use. As you’d expect, you’ll continue to work with the same Papertrail team (and we’ll be growing). SolarWinds sees tremendous potential in Papertrail and this combination enables Papertrail to get better, faster.

New search alerts dashboard

Posted by @lmarburger on

Search alerts are my favorite feature of Papertrail. Almost every organization considers at least one subset of logs worthy of human attention, whether as a daily summary or an instant notification. That’s precisely why alerts exist. We’ve released a number of improvements to how alerts are managed. Below are a few of our favorites.

Alerts Dashboard

Search alerts and their details are listed on the new alerts dashboard and when editing a saved search. This helps answer questions like “Who’s this emailing?” or “What’s the metric name in Librato?” Creating a new alert lists all the available services so it’s clear what you can do with alerts and how.

Multiple Alerts

Add multiple alerts to a saved search using the same alert service. For example, a saved search could ping several webhooks or email individual members of your team. Thresholds and frequencies are configured independently of other alerts.

Get Started

Pick a saved search that you or someone on your team might want to skim once a day and choose to receive it in an email. You’ll soon find yourself with a small “robot army” of alerts, from a daily email of failed CSRF requests to a graph of how frequently a Ruby VM segfaults. Enjoy!

DNS Outage on Monday, December 1, 2014

Posted by @troyd on

Summary

Most DNS requests for papertrailapp.com timed out for 6 hours, affecting many Web visitors and a small number of log senders. The outage was absolutely unacceptable. Personally and on behalf of Papertrail, I’m sorry.

Although the outage was caused by a DDoS against our DNS service provider, it’s our problem. We’re implementing complete DNS provider redundancy so that similar problems do not affect Papertrail.

What happened?

For about 6 hours on Monday, December 1, most DNS requests for papertrailapp.com timed out. Between approximately 19:15 UTC and 00:45 UTC the following day (11:15 - 16:45 Pacific), nearly all non-cached DNS requests timed out. As the attack pattern changed, a much smaller percentage of requests continued to time out until about 06:00 UTC (22:00 Pacific).

This DNS outage mostly impacted users trying to reach Papertrail’s site. Comparing Papertrail’s log volume during this outage to typical volume on a Monday, we did not see a noticeable change. The outage impacted few log senders because the resolved log destination was cached by most sending daemons as well as local and remote DNS servers. If your site was impacted, please contact us so we can understand and mitigate the impact of future problems. If you already reported this, we’ll be in touch.

The outage was caused by a very large distributed denial of service (DDoS) attack against a company which Papertrail purchases DNS service from. While the attack was destined for one of our service providers, we’re responsible for delivering Papertrail to you and this outage is our responsibility.

We posted frequent updates to Papertrail’s status site and @papertrailops, though for reasons discussed below, the updates were not as accessible as they should have been.

What we’re doing

While a DDoS against a service provider was the cause, Papertrail’s responsibility is to ensure reliable resolution of our own zone. In doing that, we made 2 mistakes:

  • Underestimating the duration of the longest foreseeable DNS outage
  • Based on that underestimation, depending on a single DNS infrastructure

Our estimate was based on many past large attacks that this service provider has handled in the 2 years we have used it. This attack was too large and too distributed for their anycast network and filtering the attack took much longer than we had considered.

While our service provider will make their network more resilient to attacks, making any single DNS infrastructure more resilient is not the solution. An attacker will always be able to generate a larger DDoS (and service problems other than DoSes will still impact it). For Papertrail and other maintainers of mission-critical DNS zones, the solution is to not depend on any single DNS infrastructure for functioning authoritative DNS.

Authoritative DNS has one very thoughtful design property: it’s inherently distributed. Client resolvers (like typical Web browsers and log senders) will try to query multiple servers. Nothing about the service we purchase from our current provider prevents using additional providers or our own DNS infrastructure. Using multiple DNS infrastructures takes more effort to design and maintain – effort that, because we underestimated the impact of a severe outage, we invested elsewhere. Monday showed us that the extra effort is worthwhile and necessary.

Relying on one DNS infrastructure, no matter how large or distributed, is an unnecessary risk that we unintentionally took. On behalf of Papertrail, I apologize for our mistake and the impact it had on your operations.

Also, during the outage we realized that our status updates were not as accessible as they should be. Our status site used a hostname under the papertrailapp.com domain name, so it was also inaccessible during the outage.

We’ve registered papertrailstatus.com for Papertrail’s status site. The status site does not depend on any of the same DNS infrastructures that Papertrail itself does (and was already hosted independently). We added a reference to @papertrailops on the Twitter bio for @papertrailapp and started to retweet significant updates from the more visible @papertrailapp account.

We’ve started planning how Papertrail can use multiple, independent anycast DNS infrastructures so that a DDoS or outage of one DNS service does not affect Papertrail. If we can go into more detail about any of this, please contact us.

Update (2014-12): This change is complete and Papertrail does not depend on any single DNS infrastructure. Papertrail wrote and released dnsync to synchronize a zone between two providers, one of whom did not support AXFR.

Hide Searches from Dashboard

Posted by @troyd on

Two small but meaningful tweaks make Papertrail’s Dashboard slightly more flexible.

Hide alert-oriented searches

Have lots of searches, some of which are mostly to generate alerts? It’s now possible to omit those searches from the Dashboard.

To hide a search, scroll to it on the Dashboard. Click its edit icon and un-check “Show search on Dashboard”:

Saved search settings - Show search on dashboard

The search will only appear on the associated group’s page (such as “All Systems”).

Navigating from the Dashboard to create or update alerts is easier. Click the alert icon:

Dashboard - Edit Settings and Manage Alerts icons

Enjoy!

Announcing remote_syslog2: aggregate text log files with one dependency-free binary

Posted by @troyd on

We’re pleased to announce a new daemon for transmitting log messages from text log files to Papertrail: remote_syslog2. remote_syslog2 replaces its predecessor, remote_syslog, and provides 2 major improvements:

  • no dependencies, so setup is painless. We rewrote the daemon in Go so it doesn’t require Ruby or any other VM. While we love the Ruby language, we love dependency-free apps a lot more.

    The tarball contains a single binary. “Installation” is simply copying that file and an optional configuration file. remote_syslog2 runs on almost any modern Unix with a working C library.

  • less code (read: potential idiosyncrasies) between the daemon and the OS. Most of this daemon’s work is file or network I/O. Because Go’s libraries are closer to direct mappings to the corresponding libc calls, they’re less likely to introduce bugs or implementation idiosyncrasies.

    Those which do occur are generally easier to reproduce and diagnose, since any two systems are much closer to identical. In contrast, the older remote_syslog has a known TLS bug that we believe was introduced by a dependency (rather than in remote_syslog itself), but which was effectively impossible to reproduce consistently enough to diagnose.

We’ll continue to support the older remote_syslog, so there’s no pressure to upgrade. For new environments, we strongly recommend remote_syslog2. To learn more or download a release tarball, visit the README.

10 Minute Alert Frequency

Posted by @troyd on

Papertrail now offers a fourth time frequency for search alerts: every 10 minutes.

10 minutes is short enough for many operational problems, yet long enough to capture multiple occurrences of many problems. 10 minutes fits 2 common situations particularly well:

  • Time-sensitive problems which check for a certain numer of occurrences (using Papertrail’s minimum alert threshold). For example, 3 segfaults in 10 minutes.

  • Urgent problems which may continue for some time once they start. Waiting an hour is too long, yet there’s no need for a notification every minute while a problem is occurring.

To change the frequency of a search alert, visit Papertrail’s Dashboard, hover over the related search, and click “Edit”:

Search alert frequencies

Enjoy!

OpenSSL "Heartbleed" vulnerability summary

Posted by @troyd on

A vulnerability in OpenSSL called CVE-2014-0160 (nicknamed “Heartbleed”) was publicly announced on Monday, April 7. Papertrail:

  • Patched the HTTPS endpoint serving papertrailapp.com on Monday at 3:30 PM UTC-7 (see status blog).

  • Verified that our TLS-encrypted log endpoint is not vulnerable to the exploit.

  • Changed https://papertrailapp.com/ to use a new TLS certificate at 5:00 PM UTC-7. This certificate was generated by a different private key. Related internal passphrases were also changed.

  • Deployed forward secrecy As part of patching OpenSSL.

This vulnerability affects many, probably most, SSL-enabled Internet services in some form. We echo Tumblr’s recommendation, as reported in the LA Times: “take some time to change your passwords everywhere.” Be safe.

New Account Usage Overview

Posted by @lmarburger on

The new account usage page gives an overview of total log transfer over the past month and the senders with the highest volume. Find it linked from your account’s plan usage section.

Account Usage

Use these new graphs to see things about your usage that was previously unavailable. For example:

  • Top senders: Spot the senders logging the most data. Jump to a high-volume sender’s logs to look for unused messages which can be filtered.

  • Changes in log rate: Did a recent deploy affect log output? Maybe a debug message slipped through to production or an entire sender has been silent today.

  • Unexpected senders: Perhaps a daily job logs a good deal of unnecessary data. Unless someone is watching the events while the job runs, it’s possible no one would ever notice.

Along with log rate notifications, we hope you’re able to catch unnecessary logs quicker to trim noise and save money.

More consistent service intervals and 16/25 GB pricing

Posted by @troyd on

We’ve made 3 minor changes to make Papertrail’s service easier to understand and more predictable:

  1. The free plan now measures usage the same way as all other plans, which is from one day of a month (such as the 13th) to the same day in the following calendar month. Details are below.
  2. Existing plans with 16 GB/month of log data transfer are now less expensive. The new prices are more logical when compared to 8 and 25 GB/month plans. All existing customers on the 16 GB/month plans will automatically receive the lower pricing as well as individual emails about the change.
  3. New plans have been created with 25 GB of log data transfer per month. They fill an unintentional gap between 16 GB and 50 GB which previously required pay-as-you-go additional usage. As with all Papertrail plans, they are available with 1, 3, 7, 14, and 30 days of searchable retention and 1 year of archives.

In all cases, the changes are either less expensive or have essentially no impact. The old and new prices for Papertrail’s 16 GB/month plan are below. To change to the 25 GB/month plan or any other, see popular plans or build your own.

Free plan service interval

Until today, usage on Papertrail’s free service was measured in a slightly different way than usage on all other plans. Instead of using a fixed day of the month (such as the day that service began), the free plan used a sliding 30-day window.

This was intended to be more flexible, but it’s not. There’s no situation where a 30-day window is more accommodating than a fixed 1-month period. Also, after one has explained a sliding window 2 or 30 times, the novelty value wears off too.

As of today, all of Papertrail’s monthly plans work the same: usage is measured for a 1-month period beginning on a specific day of the month. Usually this is the day that you signed up or that the current service began.

This change has almost no impact. The log data transfer in a fixed 30-day measurement period is equally likely to be slightly higher as it is slightly lower (compared to a 30-day sliding window). All measurement information is shown on the Account page.

16 GB/month price change

Searchable retention Old New
1 day $90 $82
3 days $145 $103
7 days $175 $125
14 days $225 $150
30 days $260 $195

Questions

Please let us know if we can answer questions or clarify any of this. Here’s to simplicity.

Customize receipts with VAT ID, address or other text

Posted by @troyd on

You may have already seen Papertrail’s PDF receipts, which are shown on the Purchases tab within Account.

You’ll see a new section on that page: “Customize Receipts.” Perhaps a tax authority requires a VAT identification number be on every receipt, or an accounting department requires a mailing address or vendor code.

Enter anything you want in this form, one line or multiple, and Papertrail will include it in the header of PDF receipts. While Papertrail can’t protect you from tax authorities or accounting departments, we can sure make it easier to satisfy them! Enjoy.

Constrain searches with attributes (like severity)

Posted by @troyd on

As of today, Papertrail has a new search operator: an attribute. Use an attribute-specific keyword to examine only a specific part of a log message.

Here’s a live example returning messages with the severities emergency, critical, or error.

Example

This search will search only the program/tag element of log messages, rather than the whole message.

program:sshd

These attributes are available: severity, facility, sender, program, and message.

You’re probably already familiar with AND (Papertrail’s default), OR, negation (-), and parenthesis. Attribute constraints can be used in the same searches as other constraints, like this:

"something bad" program:ssh -noise

Why are these useful?

The program, sender, and message attributes permit more granular searches against data which Papertrail already examined. The severity and facility attributes allow new portions of log messages to be examined.

We added these to:

  • Build more powerful searches. program:(apache OR sshd) severity:crit is useful on its face.

  • Incorporate fields which weren’t previously searchable. Until today, the facility and severity were only included in archive exports and API responses. They’re now searchable.

  • Permit faster searches. Papertrail is subject to the laws of physics, as much as we try to optimize around them. When you know what you want and can codify it, Papertrail can examine less data and return results faster.

  • Eliminate false positives. A search for sshd finds all logs containing the string sshd anywhere, whether the message was sent from the sshd program or the body simply contained sshd. Use attributes to search for either specific case.

Play it out

In designing the grammar for attributes, we tried to maintain our starting point that “it just works.” A user would rightly expect this search to work, and it does:

("something bad" program:ssh -noise)
OR severity:error

We took that further: values can be nested to reduce typing, and without reducing the readability of the search. For example:

("something bad" program:ssh -noise)
OR program:(raid5tools ethtool)

Negation (-) works uniformly:

-("something bad" program:ssh)
OR -program:(raid5tools ethtool)
OR -noise

Grammar police

We debated how to handle multiple constraints within an attribute which can have only one. In Papertrail, the search a b means a AND b. That’s logical because AND is possible.

However, that doesn’t make sense for attributes which a log message can only have one of. program:(abc AND def) is almost never true. (The “almost” is if abc and def are different portions of a single program value, a search which would not intentionally be performed.)

4 of the 5 attributes - all except message - can only have a single value per message, so AND is never relevant. Because of this, we chose to default all attributes to OR. program:(a b) means program:(a OR b).

Saving typing

Portions of severity or facility names work fine too. For example, this works:

severity:(crit emerg)

No need to type all of critical and emergency. One argument can match multiple values, too. For example, this will match logs with any facility containing local (like local0 or local7):

facility:local

All operators, and all searches, work with all Papertrail Search methods (Web, command-line, API, and inbound links). Finally, a simple query like sshd still works just like it did: Papertrail searches the message itself, sender, and program.

Finally, since senders often represent different things, the attributes host, system, and source all work as aliases for sender. Use the one(s) which make sense to you.

Suggestions?

We hope these operators allow more granular searches. As a fringe benefit, because your search is more tightly constraining, Papertrail is able to search less data and can often return results faster.

If you try these and have ideas, please send them over. Enjoy!

Log Rate Notifications

Posted by @lmarburger on

Log rate spikes are common and often go unnoticed. They could be an indication that something went terribly wrong or a high-traffic system was unintentionally configured with verbose logging. As of today, Papertrail can notify you if your log rate is higher than expected.

How is this useful?

Here are a few real situations Papertrail customers have experienced which we realized could be solved by this single, simple notification.

  • Drastic changes in usage patterns: A database went offline and all connected clients generated error messages.
  • Middle-of-the-night jobs: An unattended background job unintentionally logged dozens of messages. Thousands of these jobs ran every night, and it went unnoticed for months because no one happened to look at the logs while it was running.
  • Forgotten verbose logging: One customer was troubleshooting a firewall problem and enabled iptables logging of every packet. Great idea, but it didn’t get disabled until they found it a few hours later.

In any of these situations, it would have been helpful to know at the time the change occurred instead of being surprised much later.

Putting it into practice

Set a log rate on your account and Papertrail will notify everyone interested in usage notifications if it’s been exceeded for at least 10 minutes. System reboots or other momentary spikes won’t trigger this notification. We’ll send an email at most every 6 hours linking directly to the logs at that point in time so you can see what was happening.

Papertrail is in a unique position to call attention to things that are clearly unusual. It’s our goal to give you better visibility into your logs and usage.