What is the relationship between $Y$ and $X$ in the following plot? In my view there is negative linear relationship, But because we have a lot of outliers, the relationship is very weak. Am I right? I want to learn how can we explain scatterplots.

What is the relationship between $Y$ and $X$ in the following plot? In my view there is negative linear relationship, But because we have a lot of outliers, the relationship is very weak. Am I right? I want to learn how can we explain scatterplots.

The question deals with several concepts: how to evaluate data given only in the form of a scatterplot, how to summarize a scatterplot, and whether (and to what degree) a relationship looks linear. Let's take them in order.
Use principles of exploratory data analysis (EDA). These (at least originally, when they were developed for pencil-and-paper use) emphasize simple, easy-to-compute, robust summaries of data. One of the very simplest kinds of summaries is based on positions within a set of numbers, such as the middle value, which describes a "typical" value. Middles are easy to estimate reliably from graphics.
Scatterplots exhibit pairs of numbers. The first of each pair (as plotted on the horizontal axis) gives a set of single numbers, which we could summarize separately.
In this particular scatterplot, the y-values appear to lie within two almost completely separate groups: the values above $60$ at the top and those equal to or less than $60$ at the bottom. (This impression is confirmed by drawing a histogram of the y-values, which is sharply bimodal, but that would be a lot of work at this stage.) I invite sceptics to squint at the scatterplot. When I do--using a large-radius, gamma-corrected Gaussian blur (that is, a standard rapid image processing result) of the dots in the scatterplot I see this:

The two groups--upper and lower--are pretty apparent. (The upper group is much lighter than the lower because it contains many fewer dots.)
Accordingly, let's summarize the groups of y-values separately. I will do that by drawing horizontal lines at the medians of the two groups. In order to emphasize the impression of the data and to show we're not doing any kind of computation, I have (a) removed all decorations like axes and gridlines and (b) blurred the points. Little information about the patterns in the data is lost by thus "squinting" at the graphic:

Similarly, I have attempted to mark the medians of the x-values with vertical line segments. In the upper group (red lines) you can check--by counting the blobs--that these lines do actually separate the group into two equal halves, both horizontally and vertically. In the lower group (blue lines) I have only visually estimated the positions without actually doing any counting.
The points of intersection are the centers of the two groups. One excellent summary of the relationship among the x and y values would be to report these central positions. One would then want to supplement this summary by a description of how much the data are spread in each group--to the left and right, above and below--around their centers. For brevity, I won't do that here, but note that (roughly) the lengths of the line segments I have drawn reflect the overall spreads of each group.
Finally, I drew a (dashed) line connecting the two centers. This is a reasonable regression line. Is it a good description of the data? Certainly not: look how spread out the data are around this line. Is it even evidence of linearity? That's scarcely relevant because the linear description is so poor. Nevertheless, because that is the question before us, let's address it.
A relationship is linear in a statistical sense when either the y values vary in a balanced random fashion around a line or the x values are seen to vary in a balanced random fashion around a line (or both).
The former does not appear to be the case here: because the y values seem to fall into two groups, their variation is never going to look balanced in the sense of being roughly symmetrically distributed above or below the line. (That immediately rules out the possibility of dumping the data into a linear regression package and performing a least squares fit of y against x: the answers would not be relevant.)
What about variation in x? That is more plausible: at each height on the plot, the horizontal scatter of points around the dotted line is pretty balanced. The spread in this scatter seems to be a little bit greater at lower heights (low y values), but maybe that's because there are many more points there. (The more random data you have, the wider apart their extreme values will tend to be.)
Moreover, as we scan from top to bottom, there are no places where the horizontal scatter around the regression line is strongly unbalanced: that would be evidence of non-linearity. (Well, maybe around y=50 or so there may be too many large x values. This subtle effect could be taken as further evidence for breaking the data into two groups around the y=60 value.)
We have seen that
It makes sense to view x as a linear function of y plus some "nice" random variation.
It does not make sense to view y as a linear function of x plus random variation.
A regression line can be estimated by separating the data into a group of high y values and a group of low y values, finding the centers of both groups by using medians, and connecting those centers.
The resulting line has a downward slope, indicating a negative linear relationship.
There are no strong departures from linearity.
Nevertheless, because the spreads of the x-values around the line are still large (compared to the overall spread of the x-values to begin with), we would have to characterize this negative linear relationship as "very weak."
It might be more useful to describe the data as forming two oval-shaped clouds (one for y above 60 and another for lower values of y). Within each cloud there is little detectable relationship between x and y. The centers of the clouds are near (0.29, 90) and (0.38, 30). The clouds have comparable spreads, but the upper cloud has far fewer data than the lower one (maybe 20% as much).
Two of these conclusions confirm those made in the question itself that there is a weak negative relationship. The others supplement and support those conclusions.
One conclusion drawn in the question that does not seem to hold up is the assertion that there are "outliers." A more careful examination (as sketched below) will fail to turn up any individual points, or even small groups of points, that validly could be considered outlying. After sufficiently long analysis, one's attention might be drawn to the two points near the middle right or the one point at the lower left corner, but even these are not going to change one's assessment of the data very much, whether or not they are considered outlying.
Much more could be said. The next steps would be to assess the spreads of those clouds. The relationships between x and y within each of the two clouds could be evaluated separately, using the same techniques shown here. The slight asymmetry of the lower cloud (more data seem to appear at the smallest y values) could be evaluated and even adjusted by re-expressing the y values (a square root might work well). At this stage it would make sense to look for outlying data, because at this point the description would include information about typical data values as well as their spreads; outliers (by definition) would be too far from the middle to be explained in terms of the observed amount of spreading.
None of this work--which is quite quantitative--requires much more than finding middles of groups of data and doing some simple computations with them, and therefore can be done quickly and accurately even when the data are available only in graphical form. Every result reported here--including the quantitative values--could easily be found within a few seconds using a display system (such as hardcopy and a pencil :-)) which permits one to make light marks on top of the graphic.
Let's have some fun!
First of all, I scraped the data off your graph.
Then I used a running line smoother to produce the black regression line below with the dashed 95% CI bands in gray. The graph below shows a span in the smooth of one half the data, although tighter spans revealed more or less precisely the same relationship. The slight change in slope around $X=0.4$ suggested a relationship that could be approximated using a linear model and adding linear hinge function of the slope of $X$ in a a nonlinear least squares regression (red line):
$$Y = \beta_{0} + \beta_{X}X + \beta_{\text{c}}\max\left(X-\theta,0\right) + \varepsilon$$
The coefficient estimates were:
$$Y = 50.9 -37.7X -26.74436\max\left(X-0.46,0\right)$$
I would note that while the redoubtable whuber asserts that there are no strong linear relationships, the deviation from the line $Y = 50.9 - 37.7X$ implied by the hinge term is on the same order as the slope of $X$ (i.e. 37.7), so I would respectfully disagree that we see no strong nonlinear relationship (i.e. Yes there are no strong relationships, but the nonlinear term is about as strong as the linear one).

Interpretation
(I have proceeded assuming that you are only interested in $Y$ as the dependent variable.) Values of $Y$ are very weakly predicted by $X$ (with an Adjusted-$R^{2}$=0.03). The association is approximately linear, with a slight decrease in slope at about 0.46. The residuals are somewhat skewed to the right, probably because the is a sharp lower bound on values of $Y$. Given the sample size $N=170$, I am inclined to tolerate violations of normality. More observations for values of $X>0.5$ would help nail down whether the change in slope is real, or is an artifact of decreased variance of $Y$ in that range.
Updating with the $\ln(Y)$ graph:
(The red line is simply a linear regression of ln(Y) on X.)

In comments Russ Lenth wrote: "I just wonder if this holds up if you smooth $\log Y$ vs. $X$. The distribution of $Y$ is skewed right." This is quite a good suggestion, as the $\log Y$ transform versus $X$ also gives a slightly better fit that a line between $Y$ and $X$ with residuals that are more symmetrically distributed. However, both his suggested $\log(Y)$ and my linear hinge of $X$ share a preference for a relationship between (untransformed) $Y$ and $X$ that is not described by a straight line.
Here's my 2¢ 1.5¢. To me the most prominent feature is that the data abruptly stop and 'bunch up' at the bottom of the range of Y. I do see the two (potential) 'clusters' and the general negative association, but the most salient features are the (potential) floor effect and the fact that the top, low-density cluster only extends across part of the range of X.
Because the 'clusters' are vaguely bivariate normal, a parametric normal mixture model may be interesting to try. Using @Alexis' data, I find that three clusters optimize the BIC. The high-density 'floor effect' is picked out as a third cluster. The code follows:
library(mclust)
dframe = read.table(url("http://doyenne.com/personal/files/data.csv"), header=T, sep=",")
mc = Mclust(dframe)
summary(mc)
# ----------------------------------------------------
# Gaussian finite mixture model fitted by EM algorithm
# ----------------------------------------------------
#
# Mclust VVI (diagonal, varying volume and shape) model with 3 components:
#
# log.likelihood n df BIC ICL
# -614.4713 170 14 -1300.844 -1338.715
#
# Clustering table:
# 1 2 3
# 72 72 26

Now, what shall we infer from this? I do not think that Mclust is merely human pattern recognition gone awry. (Whereas my read of the scatterplot may well be.) On the other hand, there is no question that this is post-hoc. I saw what I thought might be an interesting pattern and so decided to check it. The algorithm does find something, but then I only checked for what I thought might be there so my thumb is definitely on the scale. Sometimes it is possible devise a strategy to mitigate against this (see @whuber's excellent answer here), but I have no idea how to go about such a process in cases like this. As a result, I take these results with a lot of salt (I've done this sort of thing sufficiently often that someone is missing a whole shaker). It does give me some material to think about and discuss with my client when next we meet. What are these data? Does it make any sense that there could be a floor effect? Would it make sense that there could be different groups? How meaningful / surprising / interesting / important would it be if these were real? Do independent data exist / could we get them conveniently to perform an honest test of these possibilities? Etc.
Let me describe what I see as soon as I look at it:
If we're interested in the conditional distribution of $y$ (which if often where interest focuses if we see $x$ as IV and $y$ as DV), then for $x\leq 0.5$ the conditional distribution of $Y|x$ appears bimodal with an upper group (between about 70 and 125, with mean a bit below 100) and a lower group (between 0 and about 70, with mean around 30 or so). Within each modal group, the relationship with $x$ is nearly flat. (See red and blue lines below drawn roughly where I guess some rough sense of location to be)
Then by looking at where those two groups are more or less dense in $X$, we can go on to say more:
For $x>0.5$ the upper group disappears completely, which makes the overall mean of $x$ fall, and below about 0.2, the lower group is much less dense than above it, making the overall average higher.
Between these two effects, it induces an apparent negative (but nonlinear) relationship between the two, as $E(Y|X=x)$ seems to be decreasing against $x$ but with a broad, mostly flat region in the center. (See purple dashed line)

No doubt it would be important to know what $Y$ and $X$ were, because then it might be clearer why the conditional distribution for $Y$ might be bimodal over much of its range (indeed, it might even become clear that there are indeed two groups, whose distributions in $X$ induce the apparent decreasing relationship in $Y|x$).
This what I saw based on purely "by-eye" inspection. With a bit of playing around in something like a basic image manipulation program (like the one I drew the lines with) we could start to figure out some more accurate numbers. If we digitize the data (which is pretty simple with decent tools, if sometimes a little tedious to get right), then we can undertake more sophisticated analyses of that sort of impression.
This kind of exploratory analysis can lead to some important questions (sometimes ones that surprise the person who has the data but has only shown a plot), but we must take some care over the extent to which our models are chosen by such inspections - if we apply models chosen on the basis of the appearance of a plot and then estimate those models on the same data, we'll tend to encounter the same problems we get when we use more formal model-selection and estimation on the same data. [This is not to deny the importance of exploratory analysis at all - it's just we must be careful of the consequences of doing it without regard to how we go about it. ]
Response to Russ' comments:
[later edit: To clarify -- I broadly agree with Russ' criticisms taken as a general precaution, and there's certainly some possibility I've seen more than is really there. I plan to come back and edit these into a more extensive commentary on spurious patterns we commonly identify by eye and ways we might start to avoid the worst of that. I believe I'll also be able to add some justification about why I think it's probably not just spurious in this specific case (e.g. via a regressogram or 0-order kernel smooth, though of course, absent more data to test against, there's only so far that can go; for example, if our sample is unrepresentative, even resampling only gets us so far.]
I completely agree we have a tendency to see spurious patterns; it's a point I make frequently both here and elsewhere.
One thing I suggest, for example, when looking at residual plots or Q-Q plots is to generate many plots where the situation is known (both as things should be and where assumptions don't hold) to get a clear idea how much pattern should be ignored.
Here's an example where a Q-Q plot is placed among 24 others (which satisfy the assumptions), in order for us to see how unusual the plot is. This kind of exercise is important because it helps us avoid fooling ourselves by interpreting every little wiggle, most of which will be simple noise.
I often point out that if you can change an impression by covering a few points, we may be relying on an impression generated by nothing more than noise.
[However, when it's apparent from many points rather than few, it's harder to maintain that it's not there.]
The displays in whuber's answer supports my impression, the Gaussian blur plot seems to pick up the same tendency to bimodality in $Y$.
When we don't have more data to check, we can at least look at whether the impression tends to survive resampling (bootstrap the bivariate distribution and see if it's nearly always still present), or other manipulations where the impression shouldn't be apparent if it's simple noise.
1) Here's one way to see if the apparent bimodality is more than just skewness plus noise - does it show up in a kernel density estimate? Is it still visible if we plot kernel density estimates under a variety of transformations? Here I transform it toward greater symmetry, at 85% of default bandwidth (since we're trying to identify a relatively small mode, and the default bandwidth is not optimized for that task):

The plots are $Y$, $\sqrt{Y}$ and $\log(Y)$. The vertical lines are at $68$, $\sqrt{68}$ and $\log(68)$. The bimodality is diminished, but still quite visible. Since it's very clear in the original KDE it seems to confirm it's there - and the second and third plots suggest its at least somewhat robust to transformation.
2) Here's another basic way to see if it's more than just "noise":
Step 1: perform clustering on Y

Step 2: Split into two groups on $X$, and cluster the two groups separately, and see if it's quite similar. If there's nothing going on the two halves shouldn't be expected to split all that much alike.

The points with dots were clustered differently from the "all in one set" cluster in the previous plot. I'll do some more later, but it seems like perhaps there really might be a horizontal "split" near that position.
I'm going to try a regressogram or Nadaraya-Watson estimator (both being local estimates of the regression function, $E(Y|x)$). I haven't generated either yet, but we'll see how they go. I'd probably exclude the very ends where there's little data.
3) Edit: Here's the regressogram, for bins of width 0.1 (excluding the very ends, as I suggested earlier):

This is entirely consistent with the original impression I had of the plot; it doesn't prove my reasoning was correct, but my conclusions arrived at the same result the regressogram does.
If what I saw in the plot - and the resulting reasoning - was spurious, I probably should not have succeeded at discerning $E(Y|x)$ like this.
(Next thing to try would be a Nadayara-Watson estimator. Then I might see how it goes under resampling if I have time.)
4) Later edit:
Nadarya-Watson, Gaussian kernel, bandwidth 0.15:

Again, this is surprisingly consistent with my initial impression. Here's The NW estimators based on ten bootstrap resamples:

The broad pattern is there, though a couple of the resamples don't as clearly follow the description based on the whole of the data. We see that the case of the level of the left is less certain than on the right - the level of noise (partly from few observations, partly from the wide spread) is such that it's less easy to claim the mean is really higher at the left than at the center.
My overall impression is that I probably wasn't simply fooling myself, because the various aspects stand up moderately well to a variety of challenges (smoothing, transformation, splitting into subgroups, resampling) that would tend to obscure them if they were simply noise. On the other hand, the indications are that the effects, while broadly consistent with my initial impression, are relatively weak, and it may be too much to claim any real change in expectation moving from the left side to the center.
OK folks, I followed Alexis's lead and captured the data. Here is a plot of $\log y$ versus $x$. 
And the correlations:
> cor.test(~ x + y, data = data)
Pearson's product-moment correlation
data: x and y
t = -2.6311, df = 169, p-value = 0.009298
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.33836844 -0.04977867
sample estimates:
cor
-0.1983692
> cor.test(~ x + log(y), data = data)
Pearson's product-moment correlation
data: x and log(y)
t = -2.8901, df = 169, p-value = 0.004356
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.35551268 -0.06920015
sample estimates:
cor
-0.2170188
The correlation test does indicate a likely negative dependence. I remain unconvinced of any bimodality (but also unconvinced that it's absent).
[I removed a residual plot I had in an earlier version because I overlooked the point that @whuber was trying to predict $X|Y$.]
Russ Lenth wondered how the graph would look if the Y axis were logarithmic. Alexis scraped the data, so it is easy to plot on with a log axis:

On a log scale, there is no hint of bimodality or trend. Whether a log scale makes sense or not depends, of course, on the details of what the data represent. Similarly, whether it makes sense to think that the data represent sampling from two populations as whuber suggests depends on the details.
Addendum: Based on the comments below, here is a revised version:

Well, you are right, the relationship is weak, but not zero. I would guess positive. However, don't guess, just run a simple linear regression (OLS regression) and find out! There you will get a slope of xxx which tells you what the relationship is. And yes, you do have outliers that might bias the results. That can be dealt with. You could use Cook's distance or create a leverage plot to estimate the outliers' effect on the relationship.
Good luck
You already provided some intuition to your question by looking at the orientation of the X/Y data points and their dispersion. In short you're correct.
In formal terms orientation can be referred to as correlation sign and dispersion as variance. These two links will give you more information on how to interpret the linear relationship between two variables.
This is a home work. So, the answer to your question is simple. Run a linear regression of Y on X, you'll get something like this:
Coefficient Standard Er t Stat
C 53.14404163 6.522516463 8.147781908
X -44.8798926 16.80565866 -2.670522684
So, the t-statistics is significant on X variable at 99% confidence. Hence, you can declare the variables as having some kind of relationship.
Is it linear? Add a variable X2 = (X-mean(X))^2, and regress again.
Coefficient Stand Err t Stat
C 53.46173893 6.58938281 8.11331508
X -43.9503443 17.01532569 -2.582985779
X2 -44.601130 114.1461801 -0.390736951
The coefficient at X is still significant, but X2 is not. X2 represents nonlinearity. So, you declare that teh relationship appears to be linear.
The above was for a home work.
In real life, things are more complicated. Imagine, that this was the data on a class of students. Y - bench press in pounds, X - time in minutes of holding one's breath before the bench press. I'd ask for the gender of the students. Just for fun of it, let;s add another variable, Z, and let's say that Z=1 (girls) for all Y<60, and Z=0 (boys) when Y>=60. Run the regression with three variables:
Coefficient Stand Error t Stat
C 92.93031357 3.877092841 23.969071
X -6.55246715 8.977138488 -0.72990599
X2 -43.6291362 59.06955097 -0.738606194
Z -63.3231270 2.960160265 -21.39179009
What happened?! The "relationship" between X and Y has disappeared! Oh, it seems that the relationship was spurious due to confounding variable, gender.
What is the moral of the story? You need to know what is the data to "explain" the "relationship", or even to establish it in the first place. In this case, the moment I'm told that the data on students' physical activity, I'll immediately ask for their gender, and will not even bother analyzing the data without getting the gender variable.
On the other hand, if you're asked to "describe" the scatter plot, then anything goes. Correlations, linear fits etc. For your home work, first two steps above should be sufficient: look at coefficient of X (relationship), then X^2 (linearity). Make sure you de-mean the X variable (subtract the mean).
What is $Y$?
What process do you produced outliers? What makes you think that they are not real measurements? What is the theory?
– abaumann Sep 07 '14 at 15:47